2025

Permanent URI for this collection

https://escholar.umt.edu.pk/handle/123456789/4465

Browse

Now showing 1 - 5 of 5

Reduction of imbalanced data to improve the accuracy of deep learning algorithms for federated learning techniques
(UMT, Lahore, 2025) Momina Shaheen
Federated learning is a leading machine learning paradigm that facilitates collaborative model training across decentralized nodes while ensuring data privacy and security. In edge computing environments addressing imbalanced training data is a critical challenge due to its non-independent and identically distributed form and variable size. This research explores the impact of global data imbalance on Federated Learning (FL) model accuracy, revealing complexities in mitigating its negative effects. Through empirical analysis and theoretical investigations, new insights into the mechanisms degrading FL accuracy are uncovered, leading to the proposal of a novel method tailored for FL networks. The proposed framework employs two strategies: global distribution data augmentation and synthesis for rebalancing training data, and client rescheduling by mediators for partial equilibrium among edge devices. Experiments on various distributed datasets reveal significant improvements in learning accuracy. This study's main contribution is its analysis of the negative impact of imbalanced training data on federated learning (FL) model accuracy and the development of effective strategies to mitigate this issue. By integrating AI techniques like data augmentation and class estimation into the FL framework, the approach enhances accuracy with minimal computational overhead. This innovative approach utilizes advanced artificial intelligence methodologies within the federated learning (FL) framework to address imbalanced training data and improve the robustness of FL systems in edge computing. Rigorous experimental validation on two datasets— Fashion-MNIST and a dataset stock data—shows that the method achieves nearly 92% accuracy across both types, highlighting its effectiveness in FL for edge computing viii environments. This experimentation on distinct type of datasets including image classification and financial predictive analytics, the method shows significant enhancements in FL model accuracy, underscoring its potential to revolutionize FL methodologies and foster resilient machine learning (ML) systems in edge computing.
An intelligent diagnostic model to predict disease associated biomarkers in genomic sequences
(UMT, Lahore, 2025) Ayesha Karim
Objective: Cell mutation refers to changes in the genetic material (DNA or RNA) of a cell that can disrupt normal protein synthesis and cell function. While some mutations have minimal effect, others can lead to the production of abnormal or dysregulated proteins, causing disruptions like genetic disorder. The objective of this study is to develop a computational model that predicts driver genes causing such disruptions in body in the early stages using genomic data, aiming to enhance early diagnosis and intervention. Methods: This study utilized a benchmark genomic dataset, which was processed using feature extraction techniques to identify relevant genetic patterns. Several ensemble classification methods, including XGBoost, Random Forest, LightGBM, ExtraTrees, Bagging, and a stacked ensemble of classifiers, were applied to assess the predictive power of the genomic features. The model, eNSMBL-PRED, was rigorously validated using multiple performance metrics such as accuracy, sensitivity, specificity, and Mathew’s correlation coefficient. Results: The proposed model demonstrated superior performance across various validation techniques. The self-consistency test achieved 100% accuracy, while the independent set and cross-validation tests yielded 96% and 96% accuracy, respectively. These results highlight the model's robustness and reliability in predicting Genetic disorder-related genes. Conclusion: The eNSMBL-PRED model provides a promising tool for the early detection of genetic biomarkers associated with the disorder. In the future, this model has the potential to assist healthcare professionals, particularly doctors, and psychologists, in diagnosing and formulating treatment plans for Genetic Disorder at its earliest stages
A framework for the improvement of distributed agile software development based on blockchain
(UMT, Lahore, 2025) Junaid Nasir Qureshi
The goal in today’s global software industry is software development using Agile in a distributed environment where teams work across different geographic locations. However, the traditional framework, which is responsible for coordinating, communicating and collaborating in Distributed Agile Software Development (DASD) hasn't encountered the areas such as security, transparency, traceability, and strong teamwork of individuals over distributed geographic locations. Typically, these deficiencies result in immense problems, that include delays in software development and deployment, project failures, unsuccessful contracts with clients and developers, and clients’ dissatisfaction with the way the software was developed in a distributed environment. This research study therefore introduces a novel framework which implements Blockchain technology as a Distributed Agile Software Development (DASD) approach to overcoming these challenges. On a private Ethereum blockchain, smart contracts are used to automate and secure different aspects of the Distributed Agile Software Development processes. Processes such as verifying requirements, organizing tasks as per priority, managing sprint backlogs, developing and creating user stories, testing for acceptance against user stories, transaction security automation and payments that are disseminated to development teams through digital wallets are managed by these smart contracts. Moreover, smart contracts automatically impose penalties on customers who aren't paying on time or for missing payments, and developers for failing to meet their deadlines. However, to address the scalability constraints typically presented in blockchain technology, this research describes the use of the Interplanetary File System (IPFS) as an off-chain storage solution. This integration of IPFS allows for efficient management of large amounts of data without overloading the blockchain. Furthermore, vii the experimental results of the research indicate that this innovative method significantly enhances teamwork synchronization, communication, traceability, transparency, security, and confidence among clients or customers and developers involved in Distributed Agile Software Development. (DASD).
Language resource and model for intrinsic plagiarism detection for urdu language
(UMT, Lahore, 2025) Muhammad Faraz Manzoor
In the evolving field of natural language processing (NLP), plagiarism detection has become an essential task, particularly for low-resource languages like Urdu. This PhD research addresses the critical challenge of intrinsic plagiarism detection in Urdu texts by employing a novel framework that combines machine learning, deep learning, and language models. The study conducts a comprehensive analysis at both the paragraph and sentence levels to advance the detection of intrinsic plagiarism. At the paragraph level, a set of 43 stylometry features across six granularity levels was meticulously curated to capture linguistic patterns indicative of plagiarism. The selected models include traditional machine learning techniques such as Logistic Regression, Decision Trees, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naive Bayes, Gradient Boosting, and Voting Classifier, alongside deep learning models like GRU, BiLSTM, CNN, LSTM, and MLP, as well as Large Language Models (LLMs) such as BERT and GPT-2. Two distinct experiments were conducted: the first utilized the entire dataset for classification into intrinsic plagiarized and non-plagiarized documents, while the second categorized the dataset into three topical types—Moral Lessons, National Celebrities, and National Events. The Random Forest Classifier achieved an exceptional accuracy of 98.81% in the first experiment, while the Extreme Gradient Boosting Classifier reached an overall accuracy of 99.00% in the second experiment, demonstrating superior capability in distinguishing nuanced stylistic features across different topics. At the sentence level, the study focuses on leveraging various embeddings, including TF IDF, Word2Vec, FastText, and GloVe, in conjunction with machine learning and ensemble learning classifiers. A dataset comprising 2520 balanced documents was used to evaluate the efficacy of these models. The experiments showed promising results, with FastText embeddings combined with Support Vector Classifier and Random Forest emerging as top performers, achieving accuracy viii scores of 0.89. While BiLSTM also demonstrated competitive performance with an accuracy of 0.75, the BERT model underperformed with an accuracy of 0.65, highlighting the challenges of applying LLMs in low-resource languages like Urdu. This research highlights the effectiveness of tailored stylometry features and traditional machine learning models over deep learning and LLMs for intrinsic plagiarism detection in Urdu. The findings underscore the potential for further advancements through the expansion of datasets and the development of more sophisticated language models tailored to the linguistic characteristics of Urdu.
An Intelligent Technical and Vocational Education and Training (TVET) Course Recommendation System based on the Trainee’s Aptitude
(UMT,Lahore, 2025-01-20) Rana Hammad Hassan
Personality encompasses the distinct patterns of thoughts, emotions, and behaviors that differentiate individuals and typically remain stable throughout one's life. Aligning these individual traits with learning aptitudes holds promise for improving course outcomes, maximizing returns on investment, and reducing dropout rates significantly. This interdisciplinary research bridges insights from Computer Science (CS) and Human Psychology by analyzing data from Technical and Vocational Education and Training (TVET) programs, focusing on the Big Five Personality traits (BFI). This study marks a pioneering effort in both the Pakistani and global TVET sectors, linking TVET learning skills with individual personalities. We have addressed important ethical considerations, including data privacy and informed consent, ensuring the responsible use of human subjects in this study.

Browse

Recent Submissions