Fecal source identification using random forest
Abstract Background Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-10-01
|
Series: | Microbiome |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40168-018-0568-3 |
id |
doaj-44faede5b7b04bd4820e725d6548aa9f |
---|---|
record_format |
Article |
spelling |
doaj-44faede5b7b04bd4820e725d6548aa9f2020-11-25T01:18:14ZengBMCMicrobiome2049-26182018-10-016111510.1186/s40168-018-0568-3Fecal source identification using random forestAdélaïde Roguet0A. Murat Eren1Ryan J Newton2Sandra L McLellan3School of Freshwater Sciences, University of Wisconsin-MilwaukeeDepartment of Medicine, University of ChicagoSchool of Freshwater Sciences, University of Wisconsin-MilwaukeeSchool of Freshwater Sciences, University of Wisconsin-MilwaukeeAbstract Background Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples. Results Clostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution. Conclusion Random forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples.http://link.springer.com/article/10.1186/s40168-018-0568-3Microbial source tracking16S rRNA geneHigh-throughput sequencingClostridialesBacteroidalesRandom forest classification |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Adélaïde Roguet A. Murat Eren Ryan J Newton Sandra L McLellan |
spellingShingle |
Adélaïde Roguet A. Murat Eren Ryan J Newton Sandra L McLellan Fecal source identification using random forest Microbiome Microbial source tracking 16S rRNA gene High-throughput sequencing Clostridiales Bacteroidales Random forest classification |
author_facet |
Adélaïde Roguet A. Murat Eren Ryan J Newton Sandra L McLellan |
author_sort |
Adélaïde Roguet |
title |
Fecal source identification using random forest |
title_short |
Fecal source identification using random forest |
title_full |
Fecal source identification using random forest |
title_fullStr |
Fecal source identification using random forest |
title_full_unstemmed |
Fecal source identification using random forest |
title_sort |
fecal source identification using random forest |
publisher |
BMC |
series |
Microbiome |
issn |
2049-2618 |
publishDate |
2018-10-01 |
description |
Abstract Background Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples. Results Clostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution. Conclusion Random forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples. |
topic |
Microbial source tracking 16S rRNA gene High-throughput sequencing Clostridiales Bacteroidales Random forest classification |
url |
http://link.springer.com/article/10.1186/s40168-018-0568-3 |
work_keys_str_mv |
AT adelaideroguet fecalsourceidentificationusingrandomforest AT amurateren fecalsourceidentificationusingrandomforest AT ryanjnewton fecalsourceidentificationusingrandomforest AT sandralmclellan fecalsourceidentificationusingrandomforest |
_version_ |
1725142970540228608 |