Mash Screen: high-throughput sequence containment estimation for genome discovery
Abstract The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genom...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-11-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13059-019-1841-x |
id |
doaj-71114a92c58f48dcb8444e3159132ef3 |
---|---|
record_format |
Article |
spelling |
doaj-71114a92c58f48dcb8444e3159132ef32020-11-25T04:07:37ZengBMCGenome Biology1474-760X2019-11-0120111310.1186/s13059-019-1841-xMash Screen: high-throughput sequence containment estimation for genome discoveryBrian D. Ondov0Gabriel J. Starrett1Anna Sappington2Aleksandra Kostic3Sergey Koren4Christopher B. Buck5Adam M. Phillippy6Genome Informatics section, National Human Genome Research InstituteTumor Virus Molecular Biology section, National Cancer InstituteDepartment of Electrical Engineering and Computer Science, Massachusetts Institute of TechnologyDepartment of Computer Science, Princeton UniversityGenome Informatics section, National Human Genome Research InstituteTumor Virus Molecular Biology section, National Cancer InstituteGenome Informatics section, National Human Genome Research InstituteAbstract The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.http://link.springer.com/article/10.1186/s13059-019-1841-xMinHashMetagenomicsSequencingSRAViral DiscoveryPolyomavirus |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Brian D. Ondov Gabriel J. Starrett Anna Sappington Aleksandra Kostic Sergey Koren Christopher B. Buck Adam M. Phillippy |
spellingShingle |
Brian D. Ondov Gabriel J. Starrett Anna Sappington Aleksandra Kostic Sergey Koren Christopher B. Buck Adam M. Phillippy Mash Screen: high-throughput sequence containment estimation for genome discovery Genome Biology MinHash Metagenomics Sequencing SRA Viral Discovery Polyomavirus |
author_facet |
Brian D. Ondov Gabriel J. Starrett Anna Sappington Aleksandra Kostic Sergey Koren Christopher B. Buck Adam M. Phillippy |
author_sort |
Brian D. Ondov |
title |
Mash Screen: high-throughput sequence containment estimation for genome discovery |
title_short |
Mash Screen: high-throughput sequence containment estimation for genome discovery |
title_full |
Mash Screen: high-throughput sequence containment estimation for genome discovery |
title_fullStr |
Mash Screen: high-throughput sequence containment estimation for genome discovery |
title_full_unstemmed |
Mash Screen: high-throughput sequence containment estimation for genome discovery |
title_sort |
mash screen: high-throughput sequence containment estimation for genome discovery |
publisher |
BMC |
series |
Genome Biology |
issn |
1474-760X |
publishDate |
2019-11-01 |
description |
Abstract The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome. |
topic |
MinHash Metagenomics Sequencing SRA Viral Discovery Polyomavirus |
url |
http://link.springer.com/article/10.1186/s13059-019-1841-x |
work_keys_str_mv |
AT briandondov mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT gabrieljstarrett mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT annasappington mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT aleksandrakostic mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT sergeykoren mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT christopherbbuck mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery AT adammphillippy mashscreenhighthroughputsequencecontainmentestimationforgenomediscovery |
_version_ |
1724428093542629376 |