HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular p...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-09-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12864-017-3965-2 |
id |
doaj-c96fe98fe2e341938aefebf316dee0dd |
---|---|
record_format |
Article |
spelling |
doaj-c96fe98fe2e341938aefebf316dee0dd2020-11-24T21:40:04ZengBMCBMC Genomics1471-21642017-09-0118111310.1186/s12864-017-3965-2HiMMe: using genetic patterns as a proxy for genome assembly reliability assessmentJordi Abante0Noushin Ghaffari1Charles D. Johnson2Aniruddha Datta3Whitaker Biomedical Engineering Institute, Johns Hopkins UniversityCenter for Bioinformatics and Genomic Systems Engineering (CBGSE)Center for Bioinformatics and Genomic Systems Engineering (CBGSE)Center for Bioinformatics and Genomic Systems Engineering (CBGSE)Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. Methods Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. Results Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. Conclusions Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.http://link.springer.com/article/10.1186/s12864-017-3965-2Genome assembliesde novo assembliesSequence analysisHidden Markov modelsMarkov chainsStochastic processes |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jordi Abante Noushin Ghaffari Charles D. Johnson Aniruddha Datta |
spellingShingle |
Jordi Abante Noushin Ghaffari Charles D. Johnson Aniruddha Datta HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment BMC Genomics Genome assemblies de novo assemblies Sequence analysis Hidden Markov models Markov chains Stochastic processes |
author_facet |
Jordi Abante Noushin Ghaffari Charles D. Johnson Aniruddha Datta |
author_sort |
Jordi Abante |
title |
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_short |
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_full |
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_fullStr |
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_full_unstemmed |
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_sort |
himme: using genetic patterns as a proxy for genome assembly reliability assessment |
publisher |
BMC |
series |
BMC Genomics |
issn |
1471-2164 |
publishDate |
2017-09-01 |
description |
Abstract Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. Methods Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. Results Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. Conclusions Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis. |
topic |
Genome assemblies de novo assemblies Sequence analysis Hidden Markov models Markov chains Stochastic processes |
url |
http://link.springer.com/article/10.1186/s12864-017-3965-2 |
work_keys_str_mv |
AT jordiabante himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT noushinghaffari himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT charlesdjohnson himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT aniruddhadatta himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment |
_version_ |
1725928337282433024 |