2024

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 2 of 2
  • Item
    Retail Sales Forecasting Using Machine Learning
    (UMT, Lahore, 2024-07-26) Muhammad Awais
    Sales forecasting is critical in the retail industry since it influences strategic planning and decision-making processes. This thesis investigates how machine learning approaches can improve the accuracy of sales estimates in Pakistan's retail sector. The study employs a comprehensive dataset that includes sales data from 36 stores in three regions—North, South, and Central—from October 2022 to October 2023. Products are classified as summer, winter, or regular. The study looks into how many Machines learning models, such as Extreme Gradient Boosting (XGBoost), Linear Regression, and Random Forest Regression, can effectively forecast sales trends. Data visualization techniques such as box plots, bar charts, and correlation heatmaps are used to understand product category and area sales patterns. These models' performance is evaluated using measures such as 𝑅𝑅2 and 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅. The data demonstrate considerable seasonal and geographical fluctuations in sales, providing useful information for inventory management and marketing strategies. This study shows how advanced machine learning algorithms may improve prediction accuracy, allowing businesses to optimize operations and avoid costs due to overstock or stockouts. The study also identifies limits and proposes directions for future research, such as incorporating new data sources and investigating more advanced models such as 𝑅𝑅𝑅𝑅𝑁𝑁 and 𝑇𝑇𝑇𝑇𝑇𝑇.
  • Item
    Decoding Deception
    (UMT, Lahore, 2024-08-16) MUHAMMAD ASAD AHMAD KHAN
    In the digital age, email remains a critical communication tool for individuals and businesses alike. However, the prevalence of email spam has escalated, posing significant threats through phishing attacks, malware dissemination, and fraudulent schemes. Effective spam detection is paramount in safeguarding sensitive information and maintaining the integrity of communication systems. This study presents a comprehensive evaluation of various “Machine Learning” algorithms for email spam classification using a public dataset. The primary objective was to compare the performance of different classifiers, specifically Naive Bayes, SVM, Logistic Regression, k Nearest Neighbors (k-NN), and Random Forest, employing two distinct text vectorization techniques: “Bag of Words (BoW)” and “Term Frequency-Inverse Document Frequency (TF IDF)”. Preprocessing steps, including tokenization, stop word removal, and stemming, were applied to enhance the text data quality. Extensive experiments were conducted on the full dataset and subsets of 40% and 70% to validate the robustness of the models across varying data volumes. The findings reveal that the SVM classifier with “TF-IDF” vectorization achieved the highest performance, boasting an accuracy of 98.57%, precision of 98.64%, recall of 99.09%, and an F1 score of 0.99. These results underscore the efficacy of SVM in capturing the nuanced patterns inherent in spam email detection. Furthermore, this research emphasizes the critical role of text vectorization techniques in augmenting classifier performance. The novelty of my work primarily lies in the dataset itself. As of the time of writing this thesis, and to the best of my knowledge, this dataset has not yet been explored or utilized in prior research. My analysis provides valuable insights into the optimal algorithmic approaches for spam email detection, contributing to the broader field of text classification and enhancing the accuracy of automated email filtering systems. This work is vital in the ongoing battle against email-based threats, ensuring a more secure and reliable digital communication environment.