DASSI: differential architecture search for splice identification from DNA sequences

Background: The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learnin...

Full description

Bibliographic Details
Main Authors: Amira, P.A (Author), Boughorbel, D.S (Author), Moosa, S. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03422nam a2200385Ia 4500
001 10.1186-s13040-021-00237-y
008 220427s2021 CNT 000 0 und d
020 |a 17560381 (ISSN) 
245 1 0 |a DASSI: differential architecture search for splice identification from DNA sequences 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s13040-021-00237-y 
520 3 |a Background: The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design. This has been fueled through the development of new DL architectures. Yet genomics possesses unique challenges that requires customization and development of new DL models. Methods: We proposed a new model, DASSI, by adapting a differential architecture search method and applying it to the Splice Site (SS) recognition task on DNA sequences to discover new high-performance convolutional architectures in an automated manner. We evaluated the discovered model against state-of-the-art tools to classify true and false SS in Homo sapiens (Human), Arabidopsis thaliana (Plant), Caenorhabditis elegans (Worm) and Drosophila melanogaster (Fly). Results: Our experimental evaluation demonstrated that the discovered architecture outperformed baseline models and fixed architectures and showed competitive results against state-of-the-art models used in classification of splice sites. The proposed model - DASSI has a compact architecture and showed very good results on a transfer learning task. The benchmarking experiments of execution time and precision on architecture search and evaluation process showed better performance on recently available GPUs making it feasible to adopt architecture search based methods on large datasets. Conclusions: We proposed the use of differential architecture search method (DASSI) to perform SS classification on raw DNA sequences, and discovered new neural network models with low number of tunable parameters and competitive performance compared with manually engineered architectures. We have extensively benchmarked DASSI model with other state-of-the-art models and assessed its computational efficiency. The results have shown a high potential of using automated architecture search mechanism for solving various problems in the field of genomics. © 2021, The Author(s). 
650 0 4 |a Arabidopsis thaliana 
650 0 4 |a article 
650 0 4 |a benchmarking 
650 0 4 |a Caenorhabditis elegans 
650 0 4 |a convolutional neural network 
650 0 4 |a Convolutional neural networks 
650 0 4 |a deep learning 
650 0 4 |a Deep learning 
650 0 4 |a DNA sequence 
650 0 4 |a Drosophila melanogaster 
650 0 4 |a genomics 
650 0 4 |a Genomics 
650 0 4 |a human 
650 0 4 |a human experiment 
650 0 4 |a molecular recognition 
650 0 4 |a Neural architecture search 
650 0 4 |a nonhuman 
650 0 4 |a Splice site 
650 0 4 |a transfer of learning 
700 1 |a Amira, P.A.  |e author 
700 1 |a Boughorbel, D.S.  |e author 
700 1 |a Moosa, S.  |e author 
773 |t BioData Mining