Predicting 2018 Pakistani Election using a Novel Rigged Model

Dr. Saeed Ul Hassan
ITU, Scientometrics Lab

*Comparison of results as Dot-Map representation on map of Pakistan*

Note: Paper accepted in Springer's JAIHC's special issue on Big Data Research for Politics with impact factor of 1.423.

Abstract

In this project, we devised a novel machine learning based election forecasting model that predicted Pakistan’s 2018 General Election with highest accuracy and won a nation-wide datascience competition. Model predicts probaiity of win for each candidate contesting election. To capture this probability for individual candidates in a constituency, the model taps an array of statistics from different data sources. Past election data is employed to mine demographic trends of each party across the districts, Twitter, and approval polls are exploited to snap current popularity levels. Then, leveraging Bayesian optimization, the model combines probabilities from different sources by ‘rigging’ the results for ten seats, where competition was expected to be one-sided. In contrast to the existing models that only predict aggregate share of votes for different political parties at national level, our model also effectively predicts the winning candidates on each and every national assembly seat. Seat share of political parties in national assembly seats were predicted with 83% accuracy. In 230 out of 270 constituencies, original winners were among the top two candidates predicted by the proposed technique. Our model produced most accurate results of 2018 election in Pakistan compared to all the opinion polls and surveys, and was acknowledged by a leading public sector agency working in this domain.

Main idea

Our main objective was to predict the winner of each constituency, therefore, we developed a model that outputs a vector of probabilities of the win for each constituency. This vector shows the likelihood of win for each candidate in a constituency. For instance, if a constituency has five candidates then output of the model might look like: $[0.2, 0.32,0.43,0.02,0.03]$. Each data source gives one such probability vector for each constituency. We assumed results for certain constituencies based on domain knowledge and employed Bayesian optimization to combine these vectors to have the final result. Following the tradition of election forecasting models, we considered win probability for a particular candidate as a function of three variables; election history, surveys, and popularity based on social media, $$\vec{p_c} = f(\text{election history, surveys, social media}).$$ But contrary to the traditional models, we have predicted results for each constituency, a considerably more challenging problem than finding overall vote share of major political parties. We can formulate our model as follows \begin{equation}\label{eq:main_model} \begin{split} \vec{p_c} &= \overset{J}{\underset{j=1}{\sum}} \alpha[j] {h(j,c)} + \overset{K}{\underset{k=1}{\sum}} \beta[k] {s(k,c)} + \gamma \vec{t} + \delta \vec{q} \\ w_c &= \text{arg max}({\vec{p_c}}), \end{split} \end{equation} where

$\vec{p_c}$: probability vector for $c$-th constituency, where $\vec{p_c} \in \mathbb{R}^{n_c}$, $n_c$: total number of candidates in $c$-th constituency
$J$: Total number of past elections used in the model
$K$: Total number of surveys used in the model
$\vec{\alpha}$,$\vec{\beta}$,$\gamma$, $\delta$: hyper-parameters where $0 \leq \alpha [i], \beta [j], \gamma, \delta \leq 1$ and $\vec{\alpha} \in \mathbb{R}^J$, $\vec{\beta} \in \mathbb{R}^K$
${h(j,c)}$: function which returns probability vector for a particular constituency $c$ based on one past election $j$
${s(k,c)}$: function which returns probability vector for a particular constituency $c$ based on one poll $k$
$\vec{t}$: probability vector from Twitter data
$\vec{q}$: overall likelihood of candidates based on all the previous elections
$w_c$: wining candidate

In this model, each data source produces a probability vector for each constituency. The process for the computation of this probability vector is explained in next sections. Bayesian optimization is then employed to find optimal values of hyper-parameters such as $\vec{\alpha}$,$\vec{\beta}$,$\gamma,\delta$ to combine these vectors for a final result. Following subsections explains proposed model in detail. In this approach, we first tracked constituencies where the election was one-sided. For this, we used common knowledge that prominent leaders of major political parties always choose `safe constituencies'. We rigged the results in the model for these strong candidates and declared the winners. After the rigging, we defined a function $g(\cdot)$ based on Equation 1 which returns $\ell$1 -normed difference between predicted and real results for rigged seats given values of hyper-parameters ($\vec{\alpha}, \vec{\beta}, \gamma, \delta$). We propose finding the values of the hyper-parameters by minimizing the following $\ell_1$ objective \[ \hat{\vec{x}} = \underset{\vec{x}}{\text{argmin}}\left[\sum_{c=1}^{l_{r}} \| p_c(\vec{x}) - \vec{r_c} \|_1\right], \] where $p_c(\vec{x})$ is a new function which returns $\vec{p_c}$ defined in Equation 1 given hyper-parameters: $\vec{x} = [\vec{\alpha}, \vec{\beta}, \gamma, \delta]$, $l_{r}$ is the number of rigged seats and $r_c$ is one-hot encoded, rigged probability vector with one for rigged winner and zeros for all other candidates. \[ r_c[i]= \begin{cases} 1,& \text{if } i = \text{rigged winner candidate}\\ 0, & \text{otherwise} \end{cases} \] So ultimately our goal is to, \[ \underset{\vec{x} \in \mathcal{A}}{\text{maximize}} \quad -g(\vec{x}), \] where $g(\vec{x})$ is a continuous objective function with unknown structure and expensive to evaluate. Here $\vec{x} \in \mathbb{R}^d$ is $d$-dimensional vector containing hyper-parameters and $\mathcal{A}$ is a search space for $\vec{x}$ defined as $\mathcal{A} = \{ \vec{x} \in \mathbb{R}^d: a_i \leq \vec{x}[i] \leq b_i\}$. Since $\vec{x}$ is bounded in all $d$ dimensions so our search space is a $d$ dimensional hyper-rectangle.

Data

We have leveraged three different types of data in this model: i) Results of past four elections ii) Public poll data of last two years and ii) Tweets of three weeks before the election. Past elections data consists of information about each party's vote share in each constituency along with region's information. It is important to note that constituency names and boundaries change in every election so it is not useful for finding party's influence in a particular constituency. Therefore, we have converted this data into district level first using regional information and then used it in the model.

Comparison with other methods

These two tables show performance of proposed model compared to original results and some polls performed by traditional polling agencies.

Table 1: Comparison of predicted vs real result.
	PTI	PML-N	PPPP	MMA	IND	Others	Total
Predicted	115	88	34	6	11	18	270
Original	116	64	43	12	13	22	270
Error %	0.37	8.88	3.33	2.22	0.74	1.48	17.03

Table 2: Table shows a comparison of major political parties seat share as forecasted by polls and proposed model vs original.
	PTI	PML-N	PPPP	MMA	Others
Original	43	23.7	15.9	4.4	8.1
Proposed Model	42.6	32.6	12.6	2.2	6.7
Gallulp Survey 2017 1	26	34	15	4	21
Gallulp Survey 2017 2	23	36	15	4	22
Gallup Survey 2018 1	25	26	16	3	30
Gallup Survey 2018 2	30	27	17	4	22
IPOR 2018	29	32	13	3	23

Winner of Data Science Competition

This model was winner of electin preidiction challenge of Pakistan held by Ignite, RedBuffer and DeepLinks.