Cte-ml

Loading...
Thumbnail Image
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
UMT Lahore
Abstract
Transposable Elements (TEs) are the repeated DNA sequences that are mostly found in eukaryotic genomes. TEs can change their location within the genome and produce multiple copies of themselves throughout the genome. These sequences can produce both positive and negative effects on organisms. Many diseases are produced by TEs translocation as it cause to increase the rate of mutation, insertions, and deletions in the genome and may cause manyn cancer-related diseases. Meanwhile, the productive outcome of TEs translocation is the genetic variability and progression of genomes. For understanding their roles in genomic evolution and stability, accurate classification of TEs is needed. TEs can be classified into orders, classes, subclasses and superfamilies. Many conventional bioinformatics tools has been used for the classification of TEs but no one has achieved reliable results. In our study, we present a method for the classification of transposable elements to their orders and superfamilies level. For this, we have collected and used benchmark dataset in this study. For feature vectors, we have calculated statistical moments along with position and composition relative features. Later on, we have trained four machine learning models to classify TEs. We have conducted nine experiments to classify TEs into a deeper level of orders and superfamilies by using validation techniques, i.e. self-consistency testing, independent set testing and cross-validation testing technique. These validation techniques are applied to models to check the effectiveness and to measure performance metrics, i.e. specificity, sensitivity, accuracy and Mathew’s correlation coefficient (MCC). For self-consistency testing, CTE-DT has achieve higher results for experiments 1, 2 and 3. For experiment 1, 99.53% Acc, 99.96% Sn, 99.53% Sp and 0.99 MCC is observed. For experiment 2, 100% Acc, 100% Sn, 100% Sp and 1.0 MCC is observed. For experiment 3, 100% Acc, 100% Sn, 99.99% Sp and 0.99 MCC is observed. For independent set testing, CTE-RF has achieved higher results for experiments 4, 5 and 6. For experiment 4, 92.74% Acc, 92.08% Sn, 93.39% Sp and 0.85 MCC is observed. For experiment 5, 95.27% Acc, 98.45% Sn, 99.49% Sp and 0.93 MCC is observed. For experiment 6, 96.09% Acc, 93.01% Sn, 89.86% Sp and 0.95 MCC is observed. For cross-validation testing, CTE-RF has achieved higher results for experiments 7, 8 and 9. For experiment 7, 90.68% Acc, 89.07% Sn, 92.28% Sp and 0.81 MCC is observed. For experiment 8, 95.08% Acc, 97.25% Sn, 99.49% Sp and 0.93 MCC is observed. For experiment 9, 95.08% Acc, 97.25% Sn, 99.49% Sp and 0.93 MCC is observed. Based on validation and comparison of models, the proposed model can help in the classification of transposable elements in an efficient and accurate way
Description
Keywords
Citation
Collections