Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles

Abstract Background Microbial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was...

Full description

Bibliographic Details
Main Authors: Alejandro R. Walker, Tyler L. Grimes, Somnath Datta, Susmita Datta
Format: Article
Language:English
Published: BMC 2018-05-01
Series:Biology Direct
Subjects:
PCA
Online Access:http://link.springer.com/article/10.1186/s13062-018-0215-8
id doaj-f5d66b1f6f1e4ddfb811d381ddf13f1f
record_format Article
spelling doaj-f5d66b1f6f1e4ddfb811d381ddf13f1f2020-11-24T20:50:00ZengBMCBiology Direct1745-61502018-05-0113111610.1186/s13062-018-0215-8Unraveling bacterial fingerprints of city subways from microbiome 16S gene profilesAlejandro R. Walker0Tyler L. Grimes1Somnath Datta2Susmita Datta3Department of Biostatistics, University of FloridaDepartment of Biostatistics, University of FloridaDepartment of Biostatistics, University of FloridaDepartment of Biostatistics, University of FloridaAbstract Background Microbial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was generated from swipe samples collected from subway stations in Boston, New York City hereafter New York, and Sacramento. DNA was extracted and Illumina sequenced. Sequencing data was provided for all cities as part of 2017 CAMDA contest challenge dataset. Results Principal component analysis (PCA) showed clear clustering of the samples for the three cities, with a substantial proportion of the variance explained by the first three components. We ran two different classifiers and results were robust for error rate (< 6%) and accuracy (> 95%). The analysis of variance (ANOVA) demonstrated that overall, bacterial composition across the three cities is significantly different. A similar conclusion was reached using a novel bootstrap based test using diversity indices. Last but not least, a co-abundance association network analyses for the taxonomic levels “order”, “family”, and “genus” found different patterns of bacterial networks for the three cities. Conclusions Bacterial fingerprint can be useful to predict sample provenance. In this work prediction of provenance reported with over 95% accuracy. Association based network analysis, emphasized similarities between the closest cities sharing common bacterial composition. ANOVA showed different patterns of bacterial amongst cities, and these findings strongly suggest that bacterial signature across multiple cities are different. This work advocates a data analysis pipeline which could be followed in order to get biological insight from this data. However, the biological conclusions from this analysis is just an early indication out of a pilot microbiome data provided to us through CAMDA 2017 challenge and will be subject to change as we get more complete data sets in the near future. This microbiome data can have potential applications in forensics, ecology, and other sciences. Reviewers This article was reviewed by Klas Udekwu, Alexandra Graf, and Rafal Mostowy.http://link.springer.com/article/10.1186/s13062-018-0215-8MicrobiomeBacterial 16S geneClassifierPCANetwork analysisMachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Alejandro R. Walker
Tyler L. Grimes
Somnath Datta
Susmita Datta
spellingShingle Alejandro R. Walker
Tyler L. Grimes
Somnath Datta
Susmita Datta
Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
Biology Direct
Microbiome
Bacterial 16S gene
Classifier
PCA
Network analysis
Machine learning
author_facet Alejandro R. Walker
Tyler L. Grimes
Somnath Datta
Susmita Datta
author_sort Alejandro R. Walker
title Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
title_short Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
title_full Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
title_fullStr Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
title_full_unstemmed Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles
title_sort unraveling bacterial fingerprints of city subways from microbiome 16s gene profiles
publisher BMC
series Biology Direct
issn 1745-6150
publishDate 2018-05-01
description Abstract Background Microbial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was generated from swipe samples collected from subway stations in Boston, New York City hereafter New York, and Sacramento. DNA was extracted and Illumina sequenced. Sequencing data was provided for all cities as part of 2017 CAMDA contest challenge dataset. Results Principal component analysis (PCA) showed clear clustering of the samples for the three cities, with a substantial proportion of the variance explained by the first three components. We ran two different classifiers and results were robust for error rate (< 6%) and accuracy (> 95%). The analysis of variance (ANOVA) demonstrated that overall, bacterial composition across the three cities is significantly different. A similar conclusion was reached using a novel bootstrap based test using diversity indices. Last but not least, a co-abundance association network analyses for the taxonomic levels “order”, “family”, and “genus” found different patterns of bacterial networks for the three cities. Conclusions Bacterial fingerprint can be useful to predict sample provenance. In this work prediction of provenance reported with over 95% accuracy. Association based network analysis, emphasized similarities between the closest cities sharing common bacterial composition. ANOVA showed different patterns of bacterial amongst cities, and these findings strongly suggest that bacterial signature across multiple cities are different. This work advocates a data analysis pipeline which could be followed in order to get biological insight from this data. However, the biological conclusions from this analysis is just an early indication out of a pilot microbiome data provided to us through CAMDA 2017 challenge and will be subject to change as we get more complete data sets in the near future. This microbiome data can have potential applications in forensics, ecology, and other sciences. Reviewers This article was reviewed by Klas Udekwu, Alexandra Graf, and Rafal Mostowy.
topic Microbiome
Bacterial 16S gene
Classifier
PCA
Network analysis
Machine learning
url http://link.springer.com/article/10.1186/s13062-018-0215-8
work_keys_str_mv AT alejandrorwalker unravelingbacterialfingerprintsofcitysubwaysfrommicrobiome16sgeneprofiles
AT tylerlgrimes unravelingbacterialfingerprintsofcitysubwaysfrommicrobiome16sgeneprofiles
AT somnathdatta unravelingbacterialfingerprintsofcitysubwaysfrommicrobiome16sgeneprofiles
AT susmitadatta unravelingbacterialfingerprintsofcitysubwaysfrommicrobiome16sgeneprofiles
_version_ 1716805133814202368