BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial c...

Full description

Bibliographic Details
Main Authors: Elaina D. Graham, John F. Heidelberg, Benjamin J. Tully
Format: Article
Language:English
Published: PeerJ Inc. 2017-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/3035.pdf
id doaj-a2a2f99cda914924ad913a06c5bf1aa6
record_format Article
spelling doaj-a2a2f99cda914924ad913a06c5bf1aa62020-11-24T22:36:38ZengPeerJ Inc.PeerJ2167-83592017-03-015e303510.7717/peerj.3035BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagationElaina D. Graham0John F. Heidelberg1Benjamin J. Tully2Department of Biological Sciences, University of Southern California, Los Angeles, CA, USADepartment of Biological Sciences, University of Southern California, Los Angeles, CA, USADepartment of Biological Sciences, University of Southern California, Los Angeles, CA, USAMetagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.https://peerj.com/articles/3035.pdfAffinity propagationMetagenomicsMicrobial ecologyMetagenome-assembled genomesClusteringBinning
collection DOAJ
language English
format Article
sources DOAJ
author Elaina D. Graham
John F. Heidelberg
Benjamin J. Tully
spellingShingle Elaina D. Graham
John F. Heidelberg
Benjamin J. Tully
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
PeerJ
Affinity propagation
Metagenomics
Microbial ecology
Metagenome-assembled genomes
Clustering
Binning
author_facet Elaina D. Graham
John F. Heidelberg
Benjamin J. Tully
author_sort Elaina D. Graham
title BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_short BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_full BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_fullStr BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_full_unstemmed BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_sort binsanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2017-03-01
description Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.
topic Affinity propagation
Metagenomics
Microbial ecology
Metagenome-assembled genomes
Clustering
Binning
url https://peerj.com/articles/3035.pdf
work_keys_str_mv AT elainadgraham binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation
AT johnfheidelberg binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation
AT benjaminjtully binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation
_version_ 1725719252542947328