CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
Abstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivab...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-017-1967-3 |
id |
doaj-69a2a5e9d09a48608fe467930e5d1db7 |
---|---|
record_format |
Article |
spelling |
doaj-69a2a5e9d09a48608fe467930e5d1db72020-11-24T21:53:42ZengBMCBMC Bioinformatics1471-21052017-12-0118S1616117210.1186/s12859-017-1967-3CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precisionDamayanthi Herath0Sen-Lin Tang1Kshitij Tandon2David Ackland3Saman Kumara Halgamuge4Department of Mechanical Engineering, The University of MelbourneBiodiversity Research Center, Academia SinicaBiodiversity Research Center, Academia SinicaDepartment of Biomedical Engineering, The University of MelbourneResearch School of Engineering, College of Engineering and Computer Science, The Australian National UniversityAbstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.http://link.springer.com/article/10.1186/s12859-017-1967-3MetagenomicsBinningContig coverageContig compositionDBSCAN algorithm |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Damayanthi Herath Sen-Lin Tang Kshitij Tandon David Ackland Saman Kumara Halgamuge |
spellingShingle |
Damayanthi Herath Sen-Lin Tang Kshitij Tandon David Ackland Saman Kumara Halgamuge CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision BMC Bioinformatics Metagenomics Binning Contig coverage Contig composition DBSCAN algorithm |
author_facet |
Damayanthi Herath Sen-Lin Tang Kshitij Tandon David Ackland Saman Kumara Halgamuge |
author_sort |
Damayanthi Herath |
title |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
title_short |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
title_full |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
title_fullStr |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
title_full_unstemmed |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
title_sort |
comet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2017-12-01 |
description |
Abstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains. |
topic |
Metagenomics Binning Contig coverage Contig composition DBSCAN algorithm |
url |
http://link.springer.com/article/10.1186/s12859-017-1967-3 |
work_keys_str_mv |
AT damayanthiherath cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision AT senlintang cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision AT kshitijtandon cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision AT davidackland cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision AT samankumarahalgamuge cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision |
_version_ |
1725870591624347648 |