Sentiment analysis on code-mixed transgender tweets and their comparison using ML, DL and BERT

BABAR ALI KHAN2025-10-022025-10-022023https://escholar.umt.edu.pk/handle/123456789/7710The field of sentiment analysis has become a crucial area of study for comprehending public opinion and the sentiment conveyed on social networking sites platforms. The primary objective of this research is to conduct sentiment analysis on code-mixed multilingual tweets pertaining to the transgender act of 2018 in Pakistan. The aim is to obtain a deeper understanding of the prevailing public sentiment towards this notable legislative advancement. Furthermore, a comparative examination is undertaken to assess the efficacy of different machine learning and deep learning models. This evaluation involves the utilization of TF-IDF vectorization, GloVe embeddings, and multilingual BERT embedding as input features. The dataset consists of a mixture of English, Urdu, and Roman Urdu tweets, which poses distinct linguistic difficulties because of the code-mixing present in the data. In order to tackle these challenges, various machine learning models such as logistic regression (LR), naive Bayes model, along with support vector machine (SVM) were utilized. The findings indicate that the performance of TF-IDF vectorization is competitive, as it achieves significant accuracies, recall, precision, and F1 scores. Nevertheless, the application of multilingual BERT embeddings greatly improved the efficacy of logistic regression and SVM models, underscoring the significance of utilizing sophisticated language representations models for emotion analysis in code-mixed multilingual scenarios. Various deep learning models, such as recurrent neural networks (RNN), long short-term memory (LSTM), and bidirectional LSTM (Bi-LSTM), were utilized in the study. These models incorporated both GloVe embeddings and multilingual BERT embeddings. The deep learning models that integrated multilingual BERT embeddings demonstrated superior performance compared to the models that employed GloVe embeddings, achieving remarkable levels of precision, recall, accuracy, and F1 scores. The findings of this study highlight the efficacy of multi- lingual BERT embeddings in capturing subtle variations in sentiment within code-mixed multi- lingual tweets, outperforming conventional machine learning models.The comparative analysis emphasizes the benefits of utilizing deep learning models, specifically those that utilize multi- lingual BERT embeddings, in effectively capturing sentiment with precision. The enhanced efficacy of these models can be ascribed to their capacity to apprehend contextualized information and semantic associations among words, which holds significant importance in code-mixed multilingual scenarios. The results of this research make a valuable contribution to the domain of emotion analysis in code-mixed multilingual settings by offering valuable insights into the public sentiment surrounding the transgender act of 2018 in Pakistan. The comparative analysis highlights the advantages of employing deep learning models that utilize multilingual BERT embeddings. This emphasizes the significance of utilizing developed language representations models for tasks related to sentiment analysis. The implications of these findings are relevant for scholars, policymakers, and social activists involved in the field of transgender rights, as they offer a thorough comprehension of public attitudes towards the legislation.enSentiment analysis on code-mixed transgender tweets and their comparison using ML, DL and BERTThesis