MS DEPARTMENT OF INFORMATION SYSTEM
Permanent URI for this community
Browse
Browsing MS DEPARTMENT OF INFORMATION SYSTEM by Title
Now showing 1 - 20 of 61
Results Per Page
Sort Options
Item A Comparative Analysis of Machine Learning Algorithms for Price Prediction of Educational Supplies(UMT, Lahore, 2025) Muhammad Usman AmeerThis study conducts a comprehensive comparison of Linear Regression, Decision Tree, and Random Forest algorithms to predict educational supply prices, evaluated using key performance metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R² score. Among these, the Random Forest algorithm demonstrated a slight advantage in predictive accuracy. However, the results suggest that all models possess significant potential for enhancement. Future research should delve into the integration of additional features, such as market trends and economic indicators, and explore the adoption of more sophisticated algorithms, including hybrid models, to further refine predictive performance. These findings offer critical insights and guidance for improving financial planning processes within educational institutions.Item A comparison of Deep and Classical approaches in the outcome prediction of Business Process Monitoring(UMT, Lahore, 2020) Muhammad Usman KhanPrescient cycle checking targets determining the conduct, execution, and results of business measures at runtime. It recognizes issues before they happen and re-apportion assets before they are squandered. Albeit Direct learning (DL) has yielded discoveries, most existing methodologies expand on classical machine learning (ML) procedures, especially with regards to result arranged prescient cycle checking. This situation mirrors an absence of comprehension about which occasion log properties encourage the utilization of DL methods. To address this hole, the creators thought about the exhibition of DL (i.e., straightforward feedforward profound neural organizations and long transient memory organizations) and ML strategies (i.e., arbitrary backwoods and backing vector machines) in view of five freely accessible occasion logs. It could be seen that DL by and large beats traditional ML strategies. Besides, three explicit suggestions could be induced from further perceptions: First, the outperformance of DL procedures is especially solid for logs with a high variation to-case proportion (i.e., numerous non-standard cases).Item A METHODOLOGY FOR GLAUCOMA DISEASE DETECTION USING DEEP LEARNING TECHNIQUES(UMT, Lahore, 2020) FATIMA GHANIThe main source of the glaucoma is irreversible impairment of vision. In literature we reviewed many methods to machine learning used on fundus pictures by different researchers. Any current machine learning solutions include C4.5, the Naïve Bayes Classifier, and Random Wood. Many methods cannot more reliably diagnose glaucoma disorder. We developed an architecture focused on the methodology of Deep Learning ( DL) which is a Convolution Neural Network (CNN) for the classification of Glaucoma diseases. We used numerous deep learning neural networks such as the Inception-V3 and the Vgg16 model for Glaucoma classification and identification purposes. We have obtained 508 fundus photos belonging to 25 groups from the JSIEC, Shantou City, Guangdong Province , China, Joint Shantou Foreign Eye Centre. Since uploading the photos, we've applied the increase to the provided dataset and rendered the 1563 training and testing data collection pictures. The downloaded dataset is not labelled, so we wanted a named picture dataset for our research in deep learning. But we have labelled both photos with the class name of the disease after the augmentation. We also used two deep neural network models Inception V-3 and Vgg16 in this paper which are supervised learning methods for classification arrangements. Such structures require operating processes that need to learn to use previous knowledge , make judgments about it and fix it if any errors arise. Taking into consideration the success findings collected, it is shown that the pre-trained Inception V-3 model has the best classification efficiency with 90.01% accuracy for two other models suggested (90.01% accuracy for InceptionV3 and 83.46% accuracy for Vgg16).Item A Methodology for Power Forecasting in Pakistan Using Different Machine Learning Techniques(UMT, Lahore, 2020) Zoya ZahidOver the last decade, the energy sector has experienced a major modernization cycle. Its network is undergoing accelerated upgrades. The instability of production, demand, and markets is far less stable than ever before. Also, the corporate concept is profoundly questioned. Many decision- making processes in this competitive and complex setting depend on probabilistic predictions to measure unpredictable futures. In recent years, the interest in probabilistic energy forecasting analysis has rapidly begun, even though many articles in the energy forecasting literature focus on points or single-valuation forecasting. In Pakistan, the bulk of early studies require various kinds of econometric modeling. However, the simulation of time series appears to deliver stronger results given the projected economic and demographic parameters usually deviate from the achievements. We used machine learning methods, such as ARIMA and Long-Short - Term Memory (LSTM), to calculate Pakistan's future primary energy demand from 2019 to 2030. In this study, we used the methods used in machine learning. We have accessed the dataset of the electricity sector for forecasting purposes from the hydrocarbon development institute of Pakistan (HDIP). The dataset of HDIP is from 1999 to 2019 with different attributes like Electricity Installed Capacity (Hydel Thermal (WAPDA), Thermal (K-Electric), Thermal (IPPs), Nuclear), Energy Consumption by Sector (Domestic, Commercial), Resource Production (Oil, Gas, Coal, Electricity), and Resource Consumption (Oil, Gas, Coal, Electricity). We have forecast the energy demand of each attribute till 2030 with ARIMA technique, and LSTM. Predicting overall primary energy demand using machine learning appears to be more accurate than summing up the individual forecasts. Tests have shown that specific energy sources exceed annual growth levelsItem AI AND DEEPFAKE SYNTHETIC MEDIA(UMT, Lahore, 2021) Wasim AbbasFrom the last two to three year has marked as a fast growth of DeepFake synthetic videos. The biggest challenge for the research community to the detections of DeepFake videos. The aim of this research is to classify the videos whether they are real or fake that can be used to robustly identify the face image in videos. A deep convolutional neural network (CNN) reframed model Multitask-Cascading Neural network (MTCNN) and trained on face area of image getting from videos frames. each videos have 300 frames of face images. A pre-trained model structure similarity is used for classification. On Training model results shows the accuracy of 80% by using 400 videos. A dataset must be larger needed to overcome the overfitting of model and increase the accuracy of model. When sufficient classification accuracies are reached, smart picking methods can be implemented to efficiently handle DeepFake videos.Item Air Pollution Mitigation in Islamabad: A Data-Driven Approach Using Air Quality Index (AQI) And Climate Trends(UMT, Lahore, 2025) MUQADDAS SATTARThis study examines air pollution outcomes from an analytics framework and specifically looks at air quality index (AQI) from 2020-2024, one of the climate variables, temperature, and humidity. The primary pollutants specialist study was identified as PM2.5- and NO₂- and three models were developed using machine learning (random forest, ARIMA, and LSTM) to forecast air quality. The exploratory data analysis identified seasonal increase in pollution with climate variability, winter, and higher pollution levels due to temperature inversions. The optimal long-term prediction based AQI model was LSTM, although a kind of random forest model added a few predictor variables. This research found that combining air quality data with meteorological data did improve forecasting and potentially improved policies. The study also procured recommendations for real-time monitoring, sustainable transport, and greener public urban planning design to mitigate changes in air quality. The intent of this research was to generate a pragmatic model for operationalizing the way environmental scientists could align both predictive modelling and practice strategies into a sensible approach to better address urban air pollution challenges as demonstrated in regard to Islamabad and beyond.Item An E-Commerce Based Loan Prediction Through User Profiling(UMT, Lahore, 2022) Ammara IhsanOnline shopping is trending and convenient in recent years, and the online business is rapidly developing in the retail industry. Where the user has to pay before trying, whether the product meets their satisfaction criteria or not, that can cause customer churn. Therefore, some eCommerce stores offer a “try before you buy and pay overtime facility” ( n.d.) to mitigate the churn rate. In such cases authenticating the credibility of the customer is very important and crucial if we have no personal information about the customer, no credit history, zip code, salary or any bank details provided, even if we have no label information about whether the customer can pay back or not. Hence, the retail industry is looking for ways to automate the process and make it more efficient to predict the credibility of the customer. Credit scoring has shown to be an effective technique for eCommerce companies to identify prospective churn customers and default debtors. The purpose of this research is to combine unsupervised and supervised techniques with analytics to get the most accurate possible results. In this thesis, we introduced two risk-scoring ensemble prediction models that combine different algorithms to analyze various hypotheses and make a new hypothesis for credit assessment. Firstly, the model predicts the retention score of the customers by using the TabNet classification model and then uses these probabilities scores to predict the customer's credit scores by user profiling. Customers who have a low predicted probability value are likely to be not satisfied customers and have a low level of credibility. To predict the credibility of the user we use the unsupervised GraphSAGE DBSCAN embedding model, and use these embeddings to map them into a Graph-network, and find the demographic-based similarities between users to segment them. Six popular evaluation metrics, consisting of accuracy, the area under the curve (ROC-AUC), F1 score, precision, recall and KS statistics are employed to evaluate the performance of the churn prediction model and achieve 96% accuracy on the test set. The Silhouette Coefficient score, Calinski-Harabasz Index, and Davies-Bouldin Index metrics are used to estimate the proposed unsupervised clustering approach, and results can be reviewed by human analysts. This research examines consumer purchasing, churning, and credibility patterns using graph-based embedding techniques. The study then analyses the trends behind the factors that contributed to the decline in consumer validation in the retail industry by comparing these to the different eCommerce datasets in place.Item An Implementable System for Detection and Identification of License Plates in Pakistan(UMT, Lahore, 2020) Muhammad Bilal NayyarAutomated License Plate Identification (ANPR) is a large-scale monitoring system that Photographs vehicles and recognizes their license numbers. The ANPR can help Detect stolen vehicles. Stolen vehicles can be traced effectively. This research provides a way to recognize the use of the ANPR system in highways. Using different vehicles, a rear- view image of the vehicle is captured and processed Algorithm. In this context, the license plate area is located using a new function how to detect license plates that contain multiple algorithms. Whose vehicle plate image is captured by cameras and processed to capture the image License plate information. This system is implemented not only to reduce human consumption but also to facilitate human labor because of the power and its potential use of development of automatic license plate. The identification system will result in greater efficiency in the vehicle monitoring system and number plate Identification systems are used commercially, abroad and locally. This is the system Implemented using the Python Image Processing Toolbox, which uses optical characters Image identification for reading vehicle license plates. The data is collected from safe city and collect by myself locally, where data in the imagery structure is presented. A corresponding model is developed for the purpose of identification and recognition of License Plates and attain a recognition accuracy of at least 95 percent. Significant computing power is required in the case of License Plate Recognition to achieve a satisfactory proficient of recognition in a neural network. This research is a step towards smart city plan of Pakistan. In today's world where basic electronics find their place in areas like home automation, automotive automation. Automatic water storage system and so on, it will take us a little further in the smart city plan.Item Analytical Modeling for Predicting Winning team combinations for Pakistan Super League (PSL)(UMT, Lahore, 2020) Mehak FatimaT20 cricket is the popular and most exciting form of the game. Since its creation, PSL has been very successful and has created a billion-dollar industry. This is of interest to researchers in various disciplines such as data science, economics and finance. Various statistical techniques have been used in sports that affect not only the audience but also the athletes. Using various data mining techniques, predictive models were created that players can choose from. However, no substantially accurate publication has been published until now. Furthermore, the T20 league of Pakistan (PSL) has not been targeted yet, based on individual players profiles and winning teams combination. Thus, considering this issue, the present study was conducted. Herein, research was performed to develop a model that can help franchise owners to bid for talented players and build a winning team with minimum spending. The framework comprised of three main aspects, i.e. data collection, data processing and player statistics calculation, and the probability calculations. The data was collected from ESPNcricinfo and was analyzed for various statistical analyses. Based on these analyses, the probabilistic model was developed. The model achieved 90% accuracy as it was validated through actual teams of 2019 PSL which were winner and runner-up. Thus, on the basis of these results, it is concluded that the proposed model can be a beneficial tool for PSL squad selection and bidding. This model supports the process of creating teams and selecting participants in PSL. Since this study is specifically targeted at the PSL field, which has not been previously selected as a target, it is beneficial for team managers to select and create winning team combinations. The results of this study will bring huge benefits to the cricket, T20 and PSL domains, and will open up new directions for the study of cricket prediction research.Item AUGMENTING PREDICTIVE ACCURACY THROUGH HYBRID INTELLIGENCE: A COMPARATIVE ANALYSIS OF ENSEMBLE LEARNING TECHNIQUES(UMT, Lahore, 2025) MUHAMMAD ZAIN ASHRAFA revolutionary strategy, hybrid intelligence improves prediction accuracy in a variety of fields by fusing artificial intelligence and human experience. This synthesis makes use of AI systems' computing efficiency, scalability, and pattern recognition skills in addition to human cognition's intuitive reasoning, contextual awareness, and ethical foundation. When used with ensemble learning frameworks, hybrid intelligence is very powerful since it allows for the creation of strong, multi-layered predictions by utilizing a variety of data streams from many fields. In order to support and enhance decision-making processes, this article investigates the potential of hybrid intelligence in combining cross-domain data from social media, healthcare, finance, and environmental systems. One of the primary use cases is the stock market, where significant volatility and the impact of numerous international factors have historically made predicted accuracy difficult. Hybrid ensemble models, which combine machine learning approaches, provide layered inference, with human domain experts evaluating, modifying, and contextualizing the results. This partnership makes dynamic markets more resilient, especially when there are regime shifts or other extraordinary disturbances. A paradigm shift in how businesses and analysts understand intricate, multifaceted data sets is represented by hybrid intelligence.Item BREAST CANCER PROGNOSIS USING DATA ANALYTICS(UMT, Lahore, 2021) SHAHZAD ALICancer is the second leading cause of women’s death worldwide as per World Health Organization. While understanding the best applications of Data Science and Machine Learning models, it is considered a best practice to build a model which can be helpful for early detection of breast cancer. As per clinical research, cancer is a vast study and can be best controlled if diagnosed and treated at the early stages. In this research, we thoroughly studied the cancerous tumor by obtaining the dataset of Fine Needle Biopsy, with perspective of machine learning algorithms. After obtaining the publicly available dataset, we did the feature selection and data cleaning. Feature selection was carried out to select those records of the dataset which may lead to the malignancy of the cancer. After feature selection, we did the data cleansing and removed the unwanted features from the dataset. While Keeping in knowledge the limitations of different machine learning models, we applied linear regression model and used Principle Component Analysis in order to obtain best accuracy and computation.Item Changing of objects into words using image captioning(UMT, Lahore, 2020) Muhammad Umair Tariq ChohanThe models of image captioning usually follow a design which is an encoder and a decoder design which use pictures and highlight vectors as an addition to the encoder. Some calculations utilizes include vectors removed from the district proposition got from an item identifier. This study uses Object Relation Transformer, expanding this methodology by expressly joining data about the spatial connection between input distinguished articles through mathematical consideration. The results obtained by qualitative and quantitative approaches show the significance of such mathematical consideration for picture subtitling, prompting enhancements for all basic captioning measurements on the MS-COCO dataset.Item COMPARATIVE ANALYSIS OF LINK PREDICATION TECHNIQUES(UMT, Lahore, 2018) Haseeb AhmadIn data mining, predication is the most attracting and beneficial in terms of making the right decision. Recently Link predication proofed its importance to the many researches in general and specially in the social network analysis, bio informatics, complex interconnected network, and chemical interconnection network. By finding the missing links many of the complex pattern in the big data had been found that had made the worth of the old data that is present in our archives, by finding the missing links many answer of complex patterns in big data are being answered. Different kind of algorithms had been purposed to find the links from the graph based data which are categories into three main categories maximum likelihood base algorithms, probability base algorithms and similarity base algorithms and each one is best in its own context, as many researchers had done research on link mining or link predication in each one of above mention algorithms category. So in that research I am going to survey purposed algorithms belong to these categories so from their survey result I will do a comparative analysis and will close the survey with the results and discussion and also on survey results will suggest about furthers directions.Item CUSTOMER CHURN PREDICTION OF PAKISTAN’S TELECOM INDUSTRY(UMT, Lahore, 2025) Nabeel AhmadIdentifying customer churn is a critical challenge in a highly competitive industry like telecom, where companies struggle to retain customers amidst market saturation, competitive pricing, and service dissatisfaction. Predicting and prevent churn is essential for the telecom providers to maintain revenue, optimize their operational costs, and enhance customer satisfaction leading to retaining of customers by turning churners to non-churners. This study aims to develop an accurate prediction model which is tailored to Pakistan’s telecom sector by leveraging machine learning techniques. This research employs an explanatory approach, a dataset of 9,760 customer records was taken from Kaggle. Key features for prediction include monthly usage, relationship duration, service type, and customer complaints. Machine learning models including Logistic Regression, Decision Trees, Random Forest, Gradient Boosting and Artificial Neural Networks (ANN)- were used and their performance was evaluated using accuracy, precision, recall, F1-score, and AUC-ROC. Results from the applied models indicate that deep learning model, ANN outperformed by achieving 97.4% accuracy, 100% recall, and 88.3% precision, which proves to be the most reliable model for churn prediction. Random forest also showed strong performance of 92.6% accuracy and 97.2% recall, balancing the interpretability with predictive power. This study highlights the importance of customer churns and also the importance of addressing the churns using SMOTE which improves model performance by making sure minority class are included. The results of research study shows that costumers who frequently use the service and have minimal relationship duration are more inclined to leave. This research offers practical insights to telecom organizations with recommendations for enhanced service quality, personalized retention methods and predictive analytics-based marketing campaignsItem Decoding Deception: Harnessing Machine Learning for Robust Phishing Detection(UMT, Lahore, 2024) MUHAMMAD ASAD AHMAD KHANIn the digital age, email remains a critical communication tool for individuals and businesses alike. However, the prevalence of email spam has escalated, posing significant threats through phishing attacks, malware dissemination, and fraudulent schemes. Effective spam detection is paramount in safeguarding sensitive information and maintaining the integrity of communication systems. This study presents a comprehensive evaluation of various “Machine Learning” algorithms for email spam classification using a public dataset. The primary objective was to compare the performance of different classifiers, specifically Naive Bayes, SVM, Logistic Regression, k-Nearest Neighbors (k-NN), and Random Forest, employing two distinct text vectorization techniques: “Bag of Words (BoW)” and “Term Frequency-Inverse Document Frequency (TF-IDF)”. Preprocessing steps, including tokenization, stop word removal, and stemming, were applied to enhance the text data quality. Extensive experiments were conducted on the full dataset and subsets of 40% and 70% to validate the robustness of the models across varying data volumes. The findings reveal that the SVM classifier with “TF-IDF” vectorization achieved the highest performance, boasting an accuracy of 98.57%, precision of 98.64%, recall of 99.09%, and an F1 score of 0.99. These results underscore the efficacy of SVM in capturing the nuanced patterns inherent in spam email detection. Furthermore, this research emphasizes the critical role of text vectorization techniques in augmenting classifier performance. The novelty of my work primarily lies in the dataset itself. As of the time of writing this thesis, and to the best of my knowledge, this dataset has not yet been explored or utilized in prior research. My analysis provides valuable insights into the optimal algorithmic approaches for spam email detection, contributing to the broader field of text classification and enhancing the accuracy of automated email filtering systems. This work is vital in the ongoing battle against email-based threats, ensuring a more secure and reliable digital communication environment.Item Education Content Quality Management: A Multimodal Approach(UMT, Lahore, 2025) Mishaal MaraalIn the digital age, educational content plays a crucial role in learning outcomes, yet ensuring its quality and effectiveness remains a challenge. This research presents a multimodal approach to measuring educational content quality using machine learning techniques. The study focuses on analyzing various content formats, including blogs, videos, and documents, to assess their readability, lexical complexity, and engagement levels. A dataset of 5000 URLs was collected and processed using natural language processing (NLP) techniques to extract key linguistic features. Machine learning models, including K-Means clustering, Support Vector Machines (SVM), multiple regression, and neural networks, were applied to identify patterns in content quality. The results highlight that content readability and lexical density significantly influence learner engagement. Neural networks and SVM models outperformed traditional regression methods, achieving high predictive accuracy for readability ease and lexical diversity. Findings also indicate a gap in content suitability, especially in developing regions, where students struggle with complex materials due to limited technological infrastructure. The study provides actionable insights for educational platforms, curriculum designers, and policymakers to optimize content delivery based on student learning patterns. Future research can expand the dataset, incorporate real-time OCR analysis, and integrate student performance metrics to enhance content recommendations. This research serves as a foundation for improving personalized learning experiences, ensuring that educational content is not only accessible but also effective in meeting diverse student needsItem EMOTIONS DETECTION FROM TEXTUAL DATA USING MACHINE LEARNING TECHNIQUES(UMT, Lahore, 2018) Muhammad YarEmotions detection from textual data is a comparatively new classification job. Peoples are identified through his expressions and emotions. Emotions can be expressed through various modalities including face, voice, body language, physiology, brain imaging and text. The objective of this thesis is to identify emotions from text using Machine Learning techniques. Text can be a sentence, a paragraph, a book, a news article, a written speech and any text can be detect through emotions. According to science, human have twenty seven different types of emotions. In this research we have made emotions vocabularies by itself because the available emotions vocabularies are only two to four types which is not sufficient for valuable results nor satisfactory for further research work. We have get text of ten speeches of different personalities. We have applied and compare of five different types of classifiers; Naïve Bayes Classifier (NBC), Support Vector Machine Classifier (SVMC), Linear Support Vector Machine Classifier (LSVMC), Logistic Regression Classifier (LRC) and Stochastic Gradient Descent Classifier (SGDC) on these speeches one by one. In this research we have found that the highest accuracy is the Linear Support Vector Machine Classifier (LSVMC) which is 59.70% and the second highest accuracy of Logistic Regression Classifier (LRC) which is 59.49%.Item Employing Deep Learning to Recognize Real from Fake Urdu Signatures(UMT, Lahore, 2020) SAMAN RIZWANUrdu is one of Pakistan's official languages. It is spoken and understood by about 100 million people around the world including Pakistan and many other countries where pakistani communities have settled down. The study of methods identifying text written in Urdu script is an active research area. An interesting study approach could be to identify signatures that are written in Urdu Language. Deep learning provides many methods that can be used to address numerous computer vision problems including image classification and object detection. The state of the art method in deep learning that provides good results in computer vision is “Convolutional Neural Networks” that were introduced in 1995. The deep convolutional neural network consists of multiple convolutional and pooling layers. These layers have the ability to learn the features of the images automatically which results in better accuracy. The research proposed here employs deep learning methods to identify urdu signature samples as real or forged. As there has not been much work in Urdu Script so there was no data available online for urdu signatures. The data set was created by collecting signature samples from high school students using an offline method. The model used is a convolutional neural network (CNN) that is trained and then evaluated using urdu signature images.Item Enhancing Sales Forecasting Accuracy Using Machine Learning Techniques(UMT, Lahore, 2024) Muhammad Haseeb ImranPrecise sales predictions are critical for successfully managing supply chains, controlling inventory, and making informed strategic decisions. Conventional prediction algorithms sometimes fail to capture complex sales trends, resulting in less accurate forecasts. This thesis examines how sophisticated machine learning techniques Linear regression, random forest, and k-means clustering can be used to increase the precision of sales forecasts. The study begins by doing a thorough assessment of current sales forecasting systems, identifying their strengths and flaws. The chosen machine learning algorithms are then developed and used to previous sales records from a retail organization, demonstrating the importance of data pretreatment procedures like as cleaning, normalization, and feature engineering in improving model performance. Hyperparameter optimization and cross-validation are used to improve models and reduce the risk of overfitting. The findings suggest that machine learning technologies outperform traditional sales forecasting methodologies. The random forest technique had the highest accuracy among all the models considered, followed by linear regression and k-means clustering. This thesis adds substantial value by giving a comparative examination of various machine learning models that leverage multiple independent variables for sales forecasting, as well as practical recommendations for firms wanting to use these methods. In the future, researchers will examine integrating sophisticated deep learning models and including other data sources to further boost the accuracy of projectionsItem Evaluating collaborative tendencies of research scholars working in Universities of Pakistan(UMT, Lahore, 2020) Umer SaeedThe quality of scientific research publications has increased as a result of increased scientific collaborations. Another area that is ripe for investigation is the effect of integrating research on collaboration with a theoretical lens. Accordingly this work, using the Network theory – degree, degree centrality, closeness centrality & betweenness centrality - focuses on existence of in-house co-authorship network in selected universities and its effect on the number of publications. It also addresses how such information can facilitate recruitment and retention policies of universities. The large number of author profiles of Scopus Scholar have been crawled and analyzed in this data driven study to find the list of publications. The number of authors working in in-house collaboration who are found to build an in- house co-authorship network in Pakistan Higher Education Institutions (HEIs) is 46513 (2015-2019). Results imply that only a few universities have strong in-house co-author networks