Sentiment analysis on code-mixed transgender tweets and their comparison using ML, DL and BERT
Loading...
Files
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
UMT, Lahore
Abstract
The field of sentiment analysis has become a crucial area of study for comprehending public
opinion and the sentiment conveyed on social networking sites platforms. The primary objective
of this research is to conduct sentiment analysis on code-mixed multilingual tweets pertaining to
the transgender act of 2018 in Pakistan. The aim is to obtain a deeper understanding of the
prevailing public sentiment towards this notable legislative advancement. Furthermore, a
comparative examination is undertaken to assess the efficacy of different machine learning and
deep learning models. This evaluation involves the utilization of TF-IDF vectorization, GloVe
embeddings, and multilingual BERT embedding as input features. The dataset consists of a mixture
of English, Urdu, and Roman Urdu tweets, which poses distinct linguistic difficulties because of
the code-mixing present in the data. In order to tackle these challenges, various machine learning
models such as logistic regression (LR), naive Bayes model, along with support vector machine
(SVM) were utilized. The findings indicate that the performance of TF-IDF vectorization is
competitive, as it achieves significant accuracies, recall, precision, and F1 scores. Nevertheless,
the application of multilingual BERT embeddings greatly improved the efficacy of logistic
regression and SVM models, underscoring the significance of utilizing sophisticated language
representations models for emotion analysis in code-mixed multilingual scenarios.
Various deep learning models, such as recurrent neural networks (RNN), long short-term memory
(LSTM), and bidirectional LSTM (Bi-LSTM), were utilized in the study. These models
incorporated both GloVe embeddings and multilingual BERT embeddings. The deep learning
models that integrated multilingual BERT embeddings demonstrated superior performance
compared to the models that employed GloVe embeddings, achieving remarkable levels of
precision, recall, accuracy, and F1 scores. The findings of this study highlight the efficacy of multi-
lingual BERT embeddings in capturing subtle variations in sentiment within code-mixed multi-
lingual tweets, outperforming conventional machine learning models.The comparative analysis
emphasizes the benefits of utilizing deep learning models, specifically those that utilize multi-
lingual BERT embeddings, in effectively capturing sentiment with precision. The enhanced
efficacy of these models can be ascribed to their capacity to apprehend contextualized information
and semantic associations among words, which holds significant importance in code-mixed
multilingual scenarios. The results of this research make a valuable contribution to the domain of
emotion analysis in code-mixed multilingual settings by offering valuable insights into the public
sentiment surrounding the transgender act of 2018 in Pakistan. The comparative analysis highlights
the advantages of employing deep learning models that utilize multilingual BERT embeddings.
This emphasizes the significance of utilizing developed language representations models for tasks
related to sentiment analysis. The implications of these findings are relevant for scholars,
policymakers, and social activists involved in the field of transgender rights, as they offer a
thorough comprehension of public attitudes towards the legislation.