HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment

Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular p...

Full description

Bibliographic Details
Main Authors: Jordi Abante, Noushin Ghaffari, Charles D. Johnson, Aniruddha Datta
Format: Article
Language:English
Published: BMC 2017-09-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3965-2
id doaj-c96fe98fe2e341938aefebf316dee0dd
record_format Article
spelling doaj-c96fe98fe2e341938aefebf316dee0dd2020-11-24T21:40:04ZengBMCBMC Genomics1471-21642017-09-0118111310.1186/s12864-017-3965-2HiMMe: using genetic patterns as a proxy for genome assembly reliability assessmentJordi Abante0Noushin Ghaffari1Charles D. Johnson2Aniruddha Datta3Whitaker Biomedical Engineering Institute, Johns Hopkins UniversityCenter for Bioinformatics and Genomic Systems Engineering (CBGSE)Center for Bioinformatics and Genomic Systems Engineering (CBGSE)Center for Bioinformatics and Genomic Systems Engineering (CBGSE)Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. Methods Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. Results Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. Conclusions Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.http://link.springer.com/article/10.1186/s12864-017-3965-2Genome assembliesde novo assembliesSequence analysisHidden Markov modelsMarkov chainsStochastic processes
collection DOAJ
language English
format Article
sources DOAJ
author Jordi Abante
Noushin Ghaffari
Charles D. Johnson
Aniruddha Datta
spellingShingle Jordi Abante
Noushin Ghaffari
Charles D. Johnson
Aniruddha Datta
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
BMC Genomics
Genome assemblies
de novo assemblies
Sequence analysis
Hidden Markov models
Markov chains
Stochastic processes
author_facet Jordi Abante
Noushin Ghaffari
Charles D. Johnson
Aniruddha Datta
author_sort Jordi Abante
title HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_short HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_full HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_fullStr HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_full_unstemmed HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_sort himme: using genetic patterns as a proxy for genome assembly reliability assessment
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2017-09-01
description Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. Methods Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. Results Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. Conclusions Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.
topic Genome assemblies
de novo assemblies
Sequence analysis
Hidden Markov models
Markov chains
Stochastic processes
url http://link.springer.com/article/10.1186/s12864-017-3965-2
work_keys_str_mv AT jordiabante himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT noushinghaffari himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT charlesdjohnson himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT aniruddhadatta himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
_version_ 1725928337282433024