Data Science & AI

Permanent URI for this community

https://escholar.umt.edu.pk/handle/123456789/5326

Browse

Now showing 1 - 20 of 40

A Bidirectional Long Short Memory Network for Roman Urdu Using Novel Dataset
(UMT, Lahore, 2022) Muhammad Awais
The introduction of the internet made possible the quick and easy dissemination of information about a wide variety of topics, including products, administrations, events, and political hypotheses, among others. Although there has been a rapid increase in the number of research undertaken on sentiment analysis, the majority of these studies have focused on issues associated with the English dialect. It is more challenging to do sentiment analysis in Roman Urdu than it is in English for a number of different reasons. Due to Roman Urdu's lack of distinct lexical resources, there is a possibility that information might get mixed. The primary purpose of this study is to build a large dataset for doing sentiment analysis in Roman Urdu, and a secondary objective is to evaluate several approaches to implementing such analysis by making use of machine learning and deep learning models. The approaches for analysing Roman and Urdu sentiments that are highlighted in this research are the ones that are used most often and extensively. The findings of this research will enhance the resource that is Roman Urdu as well as the methods that are used in sentiment analysis. For the sake of study on Roman Urdu, a dataset is generated. In order to achieve the highest possible levels of accuracy and performance, a combination of machine learning and deep learning algorithms is used. Our proposed approach achieves an accuracy of 83% in machine learning and 70% in deep learning, respectively, on the test data
A Content CF Location based Recommendation system and Price Prediction with Zameen.com
(UMT, Lahore, 2023) Muhammad Usman Umar
Global real estate is one of the primary contributors to the economic prosperity and stability of any nation. In 2015, it had a value of $217 trillion, or roughly 2.7 times the world's GDP. It also represents approximately 60% of the world's total conventional resources on the globe. Real estate investors will be able to make better decisions and generate more revenue because of the availability of big data. By assessing user inclinations and preferences, which can then be retained or captured while a user engages in certain activities on zameen.com, customization can aid in the formation of judgments. A personalized real estate portal can use this information to recommend properties, assist homeowners, and provide informative real estate statistics. In this article, the foundation for recommending properties to consumers is presented. By monitoring user interactions on an online real estate site, the framework may deliver customized real estate recommendations based on content, cooperation, and region. The user feedback mechanism examined the usefulness of the recommendations using a hit ratio metric, and the findings indicate that 70% accurate suggestions were given, which indicates that customers were interested in at least three of the five options that were offered.
A Proposed Framework for the prediction of Breast cancer by using Federated Learning
(UMT, Lahore, 2024) MEHREEN ILYAS
The leading cause of death for women is breast cancer. Although genetic factors substantially assist in the growth of breast cancer, recent studies show that environmental factors are also essential in the occurrence and spread of the disease. The escalation of environmental factors has become a noteworthy worldwide concern that carries substantial consequences for human health, specifically in connection with breast cancer, resulting in a rise in the incidence and intensity of breast cancer. This study aims to assess Federated Learning's predictive accuracy for breast cancer. Several machine learning techniques, such as XG Boost, Random Forest, Support Vector Classifier, Artificial Neural Network, and stacking classifier, have been studied by researchers to forecast breast cancer issues. Facilitating local data collecting and analysis while maintaining privacy and eliminating the need for centralized data aggregation is one of FL's competitive advantages. Given its capacity to evaluate a variety of locally stored data without jeopardizing patient privacy, FL is the suggested approach for breast cancer prediction. The unique features of FL include privacy protection, local data collecting and analysis, and the removal of the requirement for a centralized data repository.
A Short Term Load Forecasting By Using GAN and LSTM model
(UMT, Lahore, 2024) MUHAMMAD JAMSHED, MAHAM ABDUL RAZZAQ and MUHAMMAD TALHA
This research project aims at using GANs or Generative adversarial Networks to create synthetic data for short term load forecasting (STLF). This project addresses the challenges we face in predicting datasets by using GANs and additional features related to solar flux. The proposed methodology involves the use of Time serial GANs (TSGANs) and Conditional GANs (CGANs),to generate diverse and realistic data. Our workflow starts with data collection, preprocessing, scaling and then implementation of GANs. Challenges such as difference between real and generated data were acknowledged. So we adopted a more empirical approach to correct them. The limitations include data quality, model complexity, computational resources and generalization. Assumptions were made based on empirical testing and many things were considered such as datasets size, geographic specificity, temporal scope and environmental policy. Our research also introduces novel features that are related to solar flux and ultimately aims to enhance short term load forecasting models’ accuracy and efficiency. The importance of this study can be judged by its potential to contribute greatly to energy management, improved forecasting models’ reliability and advanced artificial intelligence innovation. The results of this study, are expected to have a wider application in various fields even beyond Short Term Energy Usage prediction.
A TWO-STEP SEQUENCE-BASED METHOD FOR IDENTIFICATION AND CLASSIFICATION OF TRANSCRIPTION FACTORS AND THEIR FAMILIES BY USING ENSEMBLE AND DEEP LEARNING
(UMT, Lahore, 2024) SANNAN RIAZ
Transcription factors play a pivotal role in making critical cell-fate determinations by orchestrating the conformation of the 3D genome and regulating gene expression. While traditional research has focused on distinguishing transcription factors from non transcription factors, it has often overlooked the nuanced classification into specific family classes. Recognizing transcription factors and categorizing them according to their functional families is a crucial initial step in understanding their roles. This research introduces a two-step discrimination method designed to identify transcription factors and ascertain their respective family preferences solely based on sequence information. Our approach utilizes predictive algorithms that compute relevant statistical features, considering both position and composition, these elements are subsequently incorporated into the pseudo-amino-acid composition (PseAAC) model, which is based on Chou's 5-step rules. In the first step of our methodology, the proposed model excels at distinguishing transcription factors from non-transcription factors. Notably, our study reveals that ensemble model stacking yields the most accurate predictions. To validate the effectiveness of our approach, we subject it to rigorous testing through 5-fold cross validation, Self-Consistency, and independent set assessments, yielding impressive accuracy rates of 84.3%, 100%, and 88.5%, respectively.
AI Based Cyber Attacks Detection Model for IoT Networks
(UMT, Lahore, 2025) Hina Jabbar
With the Passage of time adoption of IoT continues rise the threat of cyber-attacks is also growing. It demanding the effective and accurate mechanism for detection. Traditional cyber attack detection mechanisms often suffer from the imbalanced attacks classification, underutilization of the datasets and high false negative rates especially for the minority attack categories. These limitations decrease the ability of models to detect less frequent but most critical types of attacks, compromising cybersecurity. This study addresses these challenges by familiarizes an optimized model based on the Gradient Boosting for cyber attacks detection. The model is design to enhance the minority attacks classes while maintaining the overall accuracy. To attain this, we employ CICIoT2023 dataset, utilizing its large-scale structure to ensure the comprehensive model training and robust generalization of it. The evaluation of large-scale dataset provides better generalization of model that improved the representation of the real-world pattern of attack and it help to reduce bias. It allowing the model to make classification more accurately.so, various preprocessing techniques including data resampling and dimensionality reduction applied to model learning and address the class imbalance challenge. However, the resampling methods help balance the classes of dataset but lead to overfitting, generate artificial pattern and decrease ability of the model to generalize to unseen the attacks. We introduce multiple variants of the Gradient Boosting model through tunning of hyperparameter. One optimized variant GB_10D4 demonstrating best performance both for binary and multiclassification.
AI – Driven MCQS Generation Using LLM
(UMT, Lahore, 2024) Muhammad Qamar Iqbal
Multiple choice questions (MCQs) in educational assessment, are essential because they provide a systematic way to gauge scholars' comprehension and knowledge of a wide range of subjects. This paper investigates the novel use of Artificial Intelligence( AI)- driven Large Language Models (LLMs) to automate the generation of multiple- choice questions (MCQs) for use in educational evaluations. A novel approach to expedite the creation of multiple choice questions (MCQs) while maintaining their validity and efficiency is presented, utilizing state of the art NLP techniques and merging them with LLMs. The AI- driven system is methodologically trained and validated by precisely utilizing a wide range of educational resources and sources. The system's ability to produce multiple- choice questions (MCQs) that are both accurate and contextually applicable is demonstrated through expansive trial, encompassing a wide range of subjects and difficulty situations. This framework is unique because of its personalized approach, which enables MCQs to be tailored to each student's specific requirements and learning preferences, fostering adaptable learning environments. The ramifications of AI- driven MCQ product on assessment procedures are also covered in the discussion, with a focus on how crucial it's to preserve inclusivity, availability, and fairness. In general, this study provides insightful information about the relationship between artificial intelligence (AI) and educational evaluation, opening doors for inventions in pedagogy and personalized learning.
An Efficient Fire Detection Model Based on Optimized YOLOv5 Leveraging Custom Data Approaches
(UMT, Lahore, 2024) EHTASHAM SARFRAZ
Fires pose significant threats across various contexts, emphasizing the critical need for accurate and efficient fire detection systems. Deep learning methodologies, particularly YOLO (You Only Look Once), have demonstrated promising capabilities in real-time fire detection applications. In this study, we present an advanced iteration of the YOLOv5s architecture tailored specifically for fire detection tasks. Our approach integrates state-of-the-art techniques to enhance the detection accuracy and efficiency of small-scale fire instances. We introduce novel components, including coordinate attention mechanisms and refined loss functions, to improve the YOLOv5s model's performance to capture intricate fire patterns and accurately identify fire targets.
AN INVESTIGATION OF RISK DECLARATIONS AMONG NON-FINANCIAL PAKISTANI FIRMS
(UMT, Lahore, 2019) Maria Gill
The fundamental reason for this proposition is to distinguish the total figure of risk declarations and also look at the connection between the administrative body as board qualities and ownership amount influence on risk revelation in yearly reports of non-financial organizations.
Aspect Based Sentiment Analysis Using Machine learning and Deep learning
(UMT, Lahore, 2022) Farhatul-Ain
In recent years, the consumption of digital channels, including smart pathways, for government and commercial sector services has been expanded. The government and the commercial sector are trying to create services that can be accessed quickly and conveniently online, leveraging user feedback to construct and expand services and provide the essential values. As life is accelerated and people are seeking a quick and efficient way to consume services, Business owners become concerned in providing easily accessible services through users’ feedback. Thus, it is essential for stakeholders to take into consideration these opinions and comments for the purpose of developing and improving their applications and providing the intended value to their customers. The textual data including reviews and feedback is examined to identify the purpose of emotions, attitudes, and behaviors. Sentiment Analysis (SA) is the process of analyzing textual information and understanding the intent related to emotions, feelings and behavior SA comprises document-level, sentence-level, and aspect-level analysis. Subject of this study is aspect-based sentiment analysis (ABSA) of Yelp reviews. ABSA takes into consideration the study of a number of aspects or components of the inquiry. According to this research, machine learning and deep learning approaches have been applied to increase ABSA's performance in the Yelp reviews domain. Multinomial and SVM are utilized in Machine Learning methods. Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) with word embeddings are employed in our deep learning technique.
AutoVisionBot: AI powered visually interactive robotic companion
(UMT, Lahore, 2024) Saud Butt, Ahad Yousaf and Ahmer Zia
AutoVisionBot aims to create an advanced Arduino-based self-driving car equipped with advanced sensing and communication technology. This smart car has numerous sensors, such as the ultrasonic sensor, GPS sensor, infrared sensor, and other camera modules, to sense or detect and respond in real time. It uses an LLM model to create a communication link between the machine and the human. The model that is being used is the YoLo-V8 model. Its key features are its multiple sensor integration, combining GPS with cameras, infrared, and ultrasonic sensors to let the car understand its surroundings for navigation and obstacle avoidance. By using the YoLo-V8 model. The machine is easily able to detect objects with high accuracy. The Auto Vision Bot can act as a friend or a companion, by communicating with its owner and also in assisting its owner. Moreover, it has the capability to move freely on its own and navigate without the assistance of a human. Its WIFI enables real-time communication, this is essential for applications requiring continuous data transmission and connectivity.
Classification of Pathogenic Bacteria Using Machine Learning and Deep Learning
(UMT, Lahore, 2024) Mahmood Ahmad
The present study has been designed for the classification of pathogenic bacteria species by using machine learning (ML) and deep learning (DL). The fourteen different pathogenic bacterial species included Porphyromonas gingivalis, Enterococcus faecium, Eschericia coli, Listera monocytogenes, Neisseria Gonorrhoea, Propionibacterium acnes, Clostridium perfringens, Proteus spp., streptococcus agalactiae, Staphylococcus epidermidis, Staphylococcus saprophyticus, Enterococcus faecalis, Pseudomonas aeruginosa, and Staphylococcus aureus. About of 10 thousand images of pathogenic bacteria were includedin the study with 80% training images and 20% testing images extracted from DIBaS dataset. From machine learning, Random forest, Decision tree, Naïve bayes and Support vector machinewere used, while from deep learning, VGG19, Resnet 101, Resnet, 34, Resnet 50, and Densenet 201 were used for classification purpose. For the training and testing purposes of the presented models, CNN architecture and PyTorch libraries based on Python programming language were used. All of the algorithms from machine learning and deep learningwere applied to bacteria images one by one and accuracies were recorded along with the number of iterations and average time taken by each algorithm during training and testing procedures. The results from both machine learning and deep learning architectures were then compared to find out the best method for classification purposes. In deep learning, we achieved 98.6%, 99.3%, 98.9%, 98.5%, and 98.6% accuracy produced by VGG19, Resnet 101, Resnet 34, Resnet 50 and Densenet 201 respectively. While we obtained the accuracy of 71.68%, 58.63%, 49.31%, 63.18% by using Support vector machine, Naïve bayes classifier, Decision tree, and random forest models respectively, form machine learning framework. The results depict that deep learning algorithms provided much higher accuracies than that of machine learning models. Here, deep learning architecture i.e. Resnet 101 is regarded as the best technique for automated identification of bacterial species. In addition, this is the first enhanced study on classification of pathogenic bacteria images using machine learning and deep learning.
Combining Automation and Analytics to Detect Anti-Money Laundering
(UMT, Lahore, 2022) Mariya Javaid
The methods of money laundering are evolving and getting sophisticated day by day. With the advent of cryptocurrency and other methods to easily transfer money around the world, the Anti Money Laundering (AML) professionals are becoming increasingly shorthanded. The financial industry is under pressure to detect and prevent Anti-Money Laundering (AML) due to increasing strictness by Financial Action Task Force (FATF) to increase transaction scanning. Hence they are looking for ways to automate the process and make it more efficient. One way to achieve this is by combining transaction analytics with machine learning. This approach can be used to identify patterns that may indicate money laundering. The machine learning algorithm can be trained to recognize suspicious activity, and then the results can be reviewed by human analysts. In this way, the majority of transactions can be processed quickly and efficiently. Almost all banks now use some form of automation to help detect Anti-Money Laundering (AML). The goal is to combine automation with analytics to get the most accurate results. Financial institutions have been trying to do this for a few years now, but it has been a challenge. The main reason it has been difficult is that the data is coming from different sources and needs to be cleansed and normalized before it can be analyzed. This study aims to detect anti money laundering and suspicious transactions from transaction logs using machine learning algorithms and neural network models to provide an optimal solution for suspicious transaction decisions in financial institutions (FIs). In this paper, we have detected suspicious transactions using machine learning and artificial neural network algorithms along with using some network analytics and natural language processing that has achieved more than 95% accuracy. Thus we have tried to implement a solution that can be used along with rule-based models and as a result, reduces false positive suspicious transactions.
COMPARATIVE ANALYSIS OF CUSTOMER SEGMENTATION AND RECOMENDATION SYSTEM USING UNSUPERVISED LEARNING
(UMT, Lahore, 2024) AMTUL ZAHRA
Enhancing user experience and marketing identity is mostly dependent on customer segmentation and suggestions. In the context of client segmentation and suggestions, this study compares two popular techniques: principal component analysis (PCA) and K-means clustering. While K-means clustering divides consumers into groups based on similarities, PCA is used to uncover underlying patterns in customer profiles and minimise dimensionality. The interpretation, size reduction, efficacy, and priority of the two approaches are all examined in this study. This technique reduces the remaining dataset while maintaining the original data using PCA transformation. After that, the altered data is subjected to K-means clustering, which separates it into predefined groups. Businesses that want to customise experiences for their customers must have a thorough understanding of customer behaviour, preferences and raising client contentment. In order to effectively segment and propose customers, traditional techniques frequently fail to grasp the complexity of customer interactions and to utilize cutting-edge methods like unsupervised learning. This gap offers the optimal method for employing unsupervised learning to provide consumer segmentation and suggestions.
CONTRASTIVE STUDY OF BEHAVIOR VIA TWEETS BY FIRST AND SECOND WAVE (COVID-19) ON A NOVEL DATASET
(UMT, Lahore, 2021) MUHAMMAD WASEEM TARIQ
Nowadays, recognizing sentiments of the people become a foremost challenge. To address this issue researchers conducts sentiment analysis in different domains. Social channels like twitter provide essential information for emotional analysis. In modern era, researcher perform behavior analysis by means of data sciences. In this thesis, sentiment analysis has been performed on tweets related to COVID-19. Our aim is to analysis two phases of coronavirus, targeted date are 1st April to 30th June for the 1st wave and 20th Oct to 20th Dec for the 2nd wave. The BERT model is utilized to process the data set. The key goal of this study to provided contrastive analysis among both phases. The results shows in comparison to the second, the first wave with elevated tweet frequency and the results reveal that as the time passes negativity increases.
Deep Inside Convolutional Neural Network (DICNN) for Text Classification
(UMT, Lahore, 2022) Sohaima Inam
The number of complicated text documents including its texts has grown exponentially in current history, necessitating the deeper comprehension of machine learning-based techniques in order to categorize texts in a number of applications effectively. In text processing, numerous deep learning techniques have shown astounding outcomes. Such learning techniques are effective and depend on the ability to comprehend intricate frameworks including non-linear correlations in the available data. Nevertheless, it can be difficult for researchers to locate appropriate text structures, topologies, and methods for textual classification. We provide the novel textual processing framework (Deep inside a convolutional neural network) that works exclusively at the level of the characters but only makes use of short convolutional operations as well as pooling processes. We are capable of demonstrating that the overall performance of the proposed model improves with depth by reporting enhancements above state-of-the-art on the number of open textual categorization activities utilizing up to 49 convolution layers. To the best of our information, deep inside convolutional nets have never been employed for textual processing before.
DEEP LEARNING BASED SOCIAL MEDIA RECOMMENDATION SYSTEM
(UMT, Lahore, 2022) ERSHA NISAR
In past few decades, with the advent of online social networking sites, the field of personalized proposals that take use of the feature of social interactions has emerged as a particularly intriguing issue for researchers to investigate. This trend is expected to continue in the foreseeable future. The classification and suggestion system that is deployed for the purpose of determining the interests of users of social networking sites (SNS) is an important component in a variety of different businesses, particularly advertising. Advertising that is personalized helps firms stand out from the sea of generic internet ads while simultaneously increasing their relevancy to customers and eliciting favorable reaction from those clients. Whereas the vast majority of studies on user interest classification have concentrated on textual data, in this experiment I utilizes the user-generated image posts the model will precisely anticipate the user’s interest. As a consequence, this study categorizes the interests of social networking service users by employing graphics An artificial neural network (ANN) was used to characterize the interests of consumers, and for our user interest classification system, a variety of convolutional neural network (CNN)-based models were evaluated. In this study, neural network (NN) model made use of CNN-based classification models in order to categories photographs taken from users' social networking posts.
Detecting Hate Speech in Roman Urdu Using Convolutional BiLSTM Based Deep Hybrid Neural Network
(UMT, Lahore, 2023) Muhammad Zohaib
The way we interact and engage with culture has been completely transformed by the Internet. It has altered how we obtain news, communicate with friends, and go about our daily lives. Because the Internet is decentralised, anybody may generate and exchange anything, including ideas, information, photos, movies, music, and more. Internet sites that incite hate towards certain racial, religious, racial, or sexuality inclined minorities, such as women, Jews, African-Americans, Muslims, and the Transgender community, are also present despite the fact that it is a democratic medium.. In recent years, political discourse has seen an increase in hateful and discriminatory messages. This thesis focuses on the current issue of hate speech in the specific context of Roman Urdu. Given the growing importance of online communication and the role of social media, this research aims to analyze tweets posted by users to detect hate speech. A linguistic-based approach has been adopted, without considering any legal or academic definition of hate speech. Critical discourse analysis and the definition of soft hate speech have been used to identify implicit forms of hate speech through linguistic tools. We propose a convolutional BiLSTM-based deep hybrid neural network for detecting hate speech in Roman Urdu.
ENSEMBLE OF PRETRAINED VISION TRANSFORMER (E-VT) MODELS FOR ANALYSIS OF MELANOMA SKIN CANCER
(UMT, Lahore, 2023) Hafiza Sania Kamal Pasha
Extrinsic Evaluation of Distributed Sentence Representation Through Recurrent Neural Networks
(UMT, Lahore, 2022) Farman Ali
There is an enormous amount of textual data on the internet because of the rise of social media and e-commerce. Consequently, the need for an intelligent model to evaluate and extract relevant information is significant. It is necessary to classify a series of texts into one or more specified categories to use NLP applications like sentiment analysis, web search, spam filtering, and information retrieval. The vanishing gradient problem makes learning long-term dependencies with gradient descent in neural network language models difficult. New strategies have been devised to overcome the limitations of current methods. As the number of parameters in the network grows, so does the computational cost, making it increasingly vulnerable to overfitting. As a result, Natural language processing (NLP) systems treat sentences as discrete atomic symbols, allowing the model to use modest amounts of information about the relationships between the made significant. IMDB reviews are being used in this study to test several deep learning algorithms to identify reviewers' opinions effectively. (NLP) Natural language processing and text analytics have a lot in common with the sentiment. It may be used to assess the reviewer's viewpoint toward various issues or the Review's overall polarity.

Browse

Browsing Data Science & AI by Title

Results Per Page

Sort Options