Browsing by Author "MUHAMMAD ASAD AHMAD KHAN"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Decoding Deception(UMT, Lahore, 2024-08-16) MUHAMMAD ASAD AHMAD KHANIn the digital age, email remains a critical communication tool for individuals and businesses alike. However, the prevalence of email spam has escalated, posing significant threats through phishing attacks, malware dissemination, and fraudulent schemes. Effective spam detection is paramount in safeguarding sensitive information and maintaining the integrity of communication systems. This study presents a comprehensive evaluation of various “Machine Learning” algorithms for email spam classification using a public dataset. The primary objective was to compare the performance of different classifiers, specifically Naive Bayes, SVM, Logistic Regression, k Nearest Neighbors (k-NN), and Random Forest, employing two distinct text vectorization techniques: “Bag of Words (BoW)” and “Term Frequency-Inverse Document Frequency (TF IDF)”. Preprocessing steps, including tokenization, stop word removal, and stemming, were applied to enhance the text data quality. Extensive experiments were conducted on the full dataset and subsets of 40% and 70% to validate the robustness of the models across varying data volumes. The findings reveal that the SVM classifier with “TF-IDF” vectorization achieved the highest performance, boasting an accuracy of 98.57%, precision of 98.64%, recall of 99.09%, and an F1 score of 0.99. These results underscore the efficacy of SVM in capturing the nuanced patterns inherent in spam email detection. Furthermore, this research emphasizes the critical role of text vectorization techniques in augmenting classifier performance. The novelty of my work primarily lies in the dataset itself. As of the time of writing this thesis, and to the best of my knowledge, this dataset has not yet been explored or utilized in prior research. My analysis provides valuable insights into the optimal algorithmic approaches for spam email detection, contributing to the broader field of text classification and enhancing the accuracy of automated email filtering systems. This work is vital in the ongoing battle against email-based threats, ensuring a more secure and reliable digital communication environment.Item Decoding Deception: Harnessing Machine Learning for Robust Phishing Detection(UMT, Lahore, 2024) MUHAMMAD ASAD AHMAD KHANIn the digital age, email remains a critical communication tool for individuals and businesses alike. However, the prevalence of email spam has escalated, posing significant threats through phishing attacks, malware dissemination, and fraudulent schemes. Effective spam detection is paramount in safeguarding sensitive information and maintaining the integrity of communication systems. This study presents a comprehensive evaluation of various “Machine Learning” algorithms for email spam classification using a public dataset. The primary objective was to compare the performance of different classifiers, specifically Naive Bayes, SVM, Logistic Regression, k-Nearest Neighbors (k-NN), and Random Forest, employing two distinct text vectorization techniques: “Bag of Words (BoW)” and “Term Frequency-Inverse Document Frequency (TF-IDF)”. Preprocessing steps, including tokenization, stop word removal, and stemming, were applied to enhance the text data quality. Extensive experiments were conducted on the full dataset and subsets of 40% and 70% to validate the robustness of the models across varying data volumes. The findings reveal that the SVM classifier with “TF-IDF” vectorization achieved the highest performance, boasting an accuracy of 98.57%, precision of 98.64%, recall of 99.09%, and an F1 score of 0.99. These results underscore the efficacy of SVM in capturing the nuanced patterns inherent in spam email detection. Furthermore, this research emphasizes the critical role of text vectorization techniques in augmenting classifier performance. The novelty of my work primarily lies in the dataset itself. As of the time of writing this thesis, and to the best of my knowledge, this dataset has not yet been explored or utilized in prior research. My analysis provides valuable insights into the optimal algorithmic approaches for spam email detection, contributing to the broader field of text classification and enhancing the accuracy of automated email filtering systems. This work is vital in the ongoing battle against email-based threats, ensuring a more secure and reliable digital communication environment.