Classification of intra-genomic helitrons based on features extracted from different orders of FCGS

Helitrons, eukaryotic transposable elements (TEs), were discovered 18 years ago in various genomes. In the Cænorhabditis elegans (C.elegans) genome, helitron sequences have high variability in terms of size by base pairs (bp) varied from 11 to 8965 bp from one sequence to another. These TEs are not...

Full description

Bibliographic Details
Main Authors: Rabeb Touati, Imen Messaoudi, Afef Elloumi Oueslati, Zied Lachiri, Maher Kharrat
Format: Article
Language:English
Published: Elsevier 2020-01-01
Series:Informatics in Medicine Unlocked
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914819301935
Description
Summary:Helitrons, eukaryotic transposable elements (TEs), were discovered 18 years ago in various genomes. In the Cænorhabditis elegans (C.elegans) genome, helitron sequences have high variability in terms of size by base pairs (bp) varied from 11 to 8965 bp from one sequence to another. These TEs are not uniformly dispersed sequences, and they have the ability to mobilize within a genome by a rolling-circle mechanism. This ability to move and reproduce in genomes enables these elements to play a major role in genomic evolution. In order to follow the evolution, we predicted helitron families (10 classes) in the C.elegans genome using the combination of the features extracted from signals corresponding to DNA sequences and the Support Vector Machine (SVM) classifier. In our classification system, the features extracted from the signals were shown to be efficient to automatically predict helitronic sequences. As a result, the Gaussian radial kernel over 100-fold cross-validation gave the best accuracy rates, ranging from 68% to 97%, with an overall mean score of 83.7%, and we successfully identified the Helitron Y1A class for a specific value of c and gamma, reaching an accuracy rate of 100%. In addition, other notable helitrons (NDNAX2, NDNAX3 Helitron_Y2) were predicted with interesting accuracy rates. Keywords: Helitrons classification, Signal, FCGS coding technique, Machine learning, SVM
ISSN:2352-9148