GENERAL ELECTION FORECASTING MODEL FOR PAKISTAN: LEVERAGING MACHINE LEARNING FOR POLITICS

Ali Ehtsham

GENERAL ELECTION FORECASTING MODEL FOR PAKISTAN: LEVERAGING MACHINE LEARNING FOR POLITICS

Files

MS Thesis Ali Ehtsham.pdf (2.25 MB)

Date

2022

Authors

Ali Ehtsham

Publisher

UMT, Lahore

Abstract

The purpose of this work was to test the extent of current machine learning models’ application to election data in Pakistan. It evaluates the forecasting approaches in practice globally and in Pakistan. This work conducts analysis models and the parameters used to forecast elections globally. The aggregation models were most effective in forecasting elections. The election forecasting models based on sentiment analysis performed below average. The lack of effectiveness in sentiment analysis is due to use of complete tweets data instead of targeted tweets in the geographical constituency area. The dataset used for conducting the research is of Pakistan General Elections. The General Election held in year 2002, 2008, and 2013 were selected because of uniform constituency delimitations. This data was cross-verified with the Gazette of Pakistan. This work proposes a methodology to predict the election of Pakistan. This work presents a proposal towards a forecasting model to forecast the winner or loser at the constituency level on past election data. It uses the classification to differentiate candidates into the winner or loser for a particular constituency. The supervised machine learning algorithms were used for classification. The algorithms used are Logistic Regression and Support Vector Machine. Multiple experiments were conducted with changing parameters and manipulation of data being added as an input to the model. In this work first, experiment used Logistic Regression model with 25,000 iterations. The experiment achieved 99.82 percent accuracy. In the second experiment Logistic Regression model was used after reducing the iterations to 15,000. The experiment achieved 99.82 percent accuracy. There was no difference in accuracy from first experiment. In the third experiment. The input values of independent variables were scaled for Logistic Regression model and iterations were kept at 15,000. The accuracy decreased to 91.01 percent. In the fourth experiment. The scaled features were passed in a pipeline function as input for Logistic Regression model. The accuracy was increased to 98.63 percent. The fifth experiment had training data as input to Support Vector Machine Model with the linear kernel. The accuracy was 1.0. The sixth experiment had a Support Vector Machine with a radial basis function as the kernel. The accuracy reduced to 0.91 with radial based kernel from 1.0 of linear kernel. The dataset comprised of all constituencies from Pakistan for all three General Elections. Therefore, the proposed model is generalizable for whole country. The proposed model conveys the evolution of voter intentions as training data had the whole of Pakistan data. The experiment in this work validate the proof of concept. This methodology can be extended to all elections for creating the complete dataset and election model.

URI

https://escholar.umt.edu.pk/handle/123456789/7711

Collections

2022

Full item page