CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision

Abstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivab...

Full description

Bibliographic Details
Main Authors: Damayanthi Herath, Sen-Lin Tang, Kshitij Tandon, David Ackland, Saman Kumara Halgamuge
Format: Article
Language:English
Published: BMC 2017-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1967-3
id doaj-69a2a5e9d09a48608fe467930e5d1db7
record_format Article
spelling doaj-69a2a5e9d09a48608fe467930e5d1db72020-11-24T21:53:42ZengBMCBMC Bioinformatics1471-21052017-12-0118S1616117210.1186/s12859-017-1967-3CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precisionDamayanthi Herath0Sen-Lin Tang1Kshitij Tandon2David Ackland3Saman Kumara Halgamuge4Department of Mechanical Engineering, The University of MelbourneBiodiversity Research Center, Academia SinicaBiodiversity Research Center, Academia SinicaDepartment of Biomedical Engineering, The University of MelbourneResearch School of Engineering, College of Engineering and Computer Science, The Australian National UniversityAbstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.http://link.springer.com/article/10.1186/s12859-017-1967-3MetagenomicsBinningContig coverageContig compositionDBSCAN algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Damayanthi Herath
Sen-Lin Tang
Kshitij Tandon
David Ackland
Saman Kumara Halgamuge
spellingShingle Damayanthi Herath
Sen-Lin Tang
Kshitij Tandon
David Ackland
Saman Kumara Halgamuge
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
BMC Bioinformatics
Metagenomics
Binning
Contig coverage
Contig composition
DBSCAN algorithm
author_facet Damayanthi Herath
Sen-Lin Tang
Kshitij Tandon
David Ackland
Saman Kumara Halgamuge
author_sort Damayanthi Herath
title CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
title_short CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
title_full CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
title_fullStr CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
title_full_unstemmed CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
title_sort comet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-12-01
description Abstract Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.
topic Metagenomics
Binning
Contig coverage
Contig composition
DBSCAN algorithm
url http://link.springer.com/article/10.1186/s12859-017-1967-3
work_keys_str_mv AT damayanthiherath cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision
AT senlintang cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision
AT kshitijtandon cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision
AT davidackland cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision
AT samankumarahalgamuge cometaworkflowusingcontigcoverageandcompositionforbinningametagenomicsamplewithhighprecision
_version_ 1725870591624347648