Mash Screen: high-throughput sequence containment estimation for genome discovery

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and pr...

Full description

Bibliographic Details
Main Authors: Ondov, Brian D. (Author), Starrett, Gabriel J. (Author), Sappington, Anna (Author), Kostic, Aleksandra (Author), Koren, Sergey (Author), Buck, Christopher B. (Author), Phillippy, Adam M. (Author)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: BioMed Central, 2020-07-22T18:14:15Z.
Subjects:
Online Access:Get fulltext
LEADER 01500 am a22002293u 4500
001 126316
042 |a dc 
100 1 0 |a Ondov, Brian D.  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
700 1 0 |a Starrett, Gabriel J.  |e author 
700 1 0 |a Sappington, Anna  |e author 
700 1 0 |a Kostic, Aleksandra  |e author 
700 1 0 |a Koren, Sergey  |e author 
700 1 0 |a Buck, Christopher B.  |e author 
700 1 0 |a Phillippy, Adam M.  |e author 
245 0 0 |a Mash Screen: high-throughput sequence containment estimation for genome discovery 
260 |b BioMed Central,   |c 2020-07-22T18:14:15Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/126316 
520 |a The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome. 
546 |a en 
655 7 |a Article 
773 |t Genome Biology