2024

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 5
  • Item
    Enhancing Sales Forecasting Accuracy Using Machine Learning Techniques
    (UMT, Lahore, 2024) Muhammad Haseeb Imran
    Precise sales predictions are critical for successfully managing supply chains, controlling inventory, and making informed strategic decisions. Conventional prediction algorithms sometimes fail to capture complex sales trends, resulting in less accurate forecasts. This thesis examines how sophisticated machine learning techniques Linear regression, random forest, and k-means clustering can be used to increase the precision of sales forecasts. The study begins by doing a thorough assessment of current sales forecasting systems, identifying their strengths and flaws. The chosen machine learning algorithms are then developed and used to previous sales records from a retail organization, demonstrating the importance of data pretreatment procedures like as cleaning, normalization, and feature engineering in improving model performance. Hyperparameter optimization and cross-validation are used to improve models and reduce the risk of overfitting. The findings suggest that machine learning technologies outperform traditional sales forecasting methodologies. The random forest technique had the highest accuracy among all the models considered, followed by linear regression and k-means clustering. This thesis adds substantial value by giving a comparative examination of various machine learning models that leverage multiple independent variables for sales forecasting, as well as practical recommendations for firms wanting to use these methods. In the future, researchers will examine integrating sophisticated deep learning models and including other data sources to further boost the accuracy of projections
  • Item
    Retail Sales Forecasting Using Machine Learning: A Comparative Analysis of Random Forest, Linear Regression and XGBoost Approaches
    (UMT, Lahore, 2024) Muhammad Awais
    Sales forecasting is critical in the retail industry since it influences strategic planning and decision-making processes. This thesis investigates how machine learning approaches can improve the accuracy of sales estimates in Pakistan's retail sector. The study employs a comprehensive dataset that includes sales data from 36 stores in three regions—North, South, and Central—from October 2022 to October 2023. Products are classified as summer, winter, or regular. The study looks into how many Machines learning models, such as Extreme Gradient Boosting (XGBoost), Linear Regression, and Random Forest Regression, can effectively forecast sales trends. Data visualization techniques such as box plots, bar charts, and correlation heatmaps are used to understand product category and area sales patterns. These models' performance is evaluated using measures such as 𝑅𝑅 2 and 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅. The data demonstrate considerable seasonal and geographical fluctuations in sales, providing useful information for inventory management and marketing strategies. This study shows how advanced machine learning algorithms may improve prediction accuracy, allowing businesses to optimize operations and avoid costs due to overstock or stockouts. The study also identifies limits and proposes directions for future research, such as incorporating new data sources and investigating more advanced models such as 𝑅𝑅𝑅𝑅𝑅𝑅 and 𝑇𝑇𝑇𝑇𝑇𝑇.
  • Item
    Decoding Deception: Harnessing Machine Learning for Robust Phishing Detection
    (UMT, Lahore, 2024) MUHAMMAD ASAD AHMAD KHAN
    In the digital age, email remains a critical communication tool for individuals and businesses alike. However, the prevalence of email spam has escalated, posing significant threats through phishing attacks, malware dissemination, and fraudulent schemes. Effective spam detection is paramount in safeguarding sensitive information and maintaining the integrity of communication systems. This study presents a comprehensive evaluation of various “Machine Learning” algorithms for email spam classification using a public dataset. The primary objective was to compare the performance of different classifiers, specifically Naive Bayes, SVM, Logistic Regression, k-Nearest Neighbors (k-NN), and Random Forest, employing two distinct text vectorization techniques: “Bag of Words (BoW)” and “Term Frequency-Inverse Document Frequency (TF-IDF)”. Preprocessing steps, including tokenization, stop word removal, and stemming, were applied to enhance the text data quality. Extensive experiments were conducted on the full dataset and subsets of 40% and 70% to validate the robustness of the models across varying data volumes. The findings reveal that the SVM classifier with “TF-IDF” vectorization achieved the highest performance, boasting an accuracy of 98.57%, precision of 98.64%, recall of 99.09%, and an F1 score of 0.99. These results underscore the efficacy of SVM in capturing the nuanced patterns inherent in spam email detection. Furthermore, this research emphasizes the critical role of text vectorization techniques in augmenting classifier performance. The novelty of my work primarily lies in the dataset itself. As of the time of writing this thesis, and to the best of my knowledge, this dataset has not yet been explored or utilized in prior research. My analysis provides valuable insights into the optimal algorithmic approaches for spam email detection, contributing to the broader field of text classification and enhancing the accuracy of automated email filtering systems. This work is vital in the ongoing battle against email-based threats, ensuring a more secure and reliable digital communication environment.
  • Item
    A Comparative Analysis of Machine Learning Algorithms for Price Prediction of Educational Supplies
    (UMT, Lahore, 2025) Muhammad Usman Ameer
    This study conducts a comprehensive comparison of Linear Regression, Decision Tree, and Random Forest algorithms to predict educational supply prices, evaluated using key performance metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R² score. Among these, the Random Forest algorithm demonstrated a slight advantage in predictive accuracy. However, the results suggest that all models possess significant potential for enhancement. Future research should delve into the integration of additional features, such as market trends and economic indicators, and explore the adoption of more sophisticated algorithms, including hybrid models, to further refine predictive performance. These findings offer critical insights and guidance for improving financial planning processes within educational institutions.
  • Item
    Machine Learning Based Multi-Variable Happiness Index Prediction Model
    (UMT, Lahore, 2024) Anzar Nawaiz
    Happiness index is one of the new way to understand the happiness quantitatively. The introduction of the happiness report in 2011 and its availability afterward every year have given the happiness more scope to be studied and to be further understood as well. The key goal of the thesis was to develop and suggest a machine learning based model which can be further utilized, in the prediction of Happiness Index. The datasets used in the thesis were relevant to the happiness report and the Global key indexes including the social, economic and educational factors. The data for 169 countries was considered. The thesis follows a systematic approach and including the steps of the initial data analysis, followed by the data preprocessing and further applying the models. Machine learning models including the regression models (Multiple, Gradient Boosting, SVM), classification models including the random forest and KNN has been utilized. Along with this a basic Deep learning model of ANN has been utilized to make the approach Key visualizations and metrics which were relevant to the output of the model were utilized to compare the techniques. Based on the results from the approaches applied, classification model of the KNN tends to give the most accurate results with the highest model accuracy and better predictions.