Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.

Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occu...

Full description

Bibliographic Details
Main Authors: Stephen Woloszynek, Joshua Chang Mell, Zhengqiao Zhao, Gideon Simpson, Michael P O'Connor, Gail L Rosen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0219235
id doaj-ef999a91eeed40d1b53c3801e3acb399
record_format Article
spelling doaj-ef999a91eeed40d1b53c3801e3acb3992021-03-03T21:16:09ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011412e021923510.1371/journal.pone.0219235Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.Stephen WoloszynekJoshua Chang MellZhengqiao ZhaoGideon SimpsonMichael P O'ConnorGail L RosenAnalysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to sample characteristics such as host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. Additionally, analyses that use taxonomic relative abundances to predict the abundances of different gene functions aggregate predicted functional profiles across taxa. This precludes straightforward identification of predicted functional components associated with subsets of co-occurring taxa. We provide an approach to explore co-occurring taxa using "topics" generated via a topic model and link these topics to specific sample features (e.g., disease state). Rather than inferring predicted functional content based on overall taxonomic relative abundances, we instead focus on inference of functional content within topics, which we parse by estimating interactions between topics and pathways through a multilevel, fully Bayesian regression model. We apply our methods to three publicly available 16S amplicon sequencing datasets: an inflammatory bowel disease dataset, an oral cancer dataset, and a time-series dataset. Using our topic model approach to uncover latent structure in 16S rRNA amplicon surveys, investigators can (1) capture groups of co-occurring taxa termed topics; (2) uncover within-topic functional potential; (3) link taxa co-occurrence, gene function, and environmental/host features; and (4) explore the way in which sets of co-occurring taxa behave and evolve over time. These methods have been implemented in a freely available R package: https://cran.r-project.org/package=themetagenomics, https://github.com/EESI/themetagenomics.https://doi.org/10.1371/journal.pone.0219235
collection DOAJ
language English
format Article
sources DOAJ
author Stephen Woloszynek
Joshua Chang Mell
Zhengqiao Zhao
Gideon Simpson
Michael P O'Connor
Gail L Rosen
spellingShingle Stephen Woloszynek
Joshua Chang Mell
Zhengqiao Zhao
Gideon Simpson
Michael P O'Connor
Gail L Rosen
Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
PLoS ONE
author_facet Stephen Woloszynek
Joshua Chang Mell
Zhengqiao Zhao
Gideon Simpson
Michael P O'Connor
Gail L Rosen
author_sort Stephen Woloszynek
title Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
title_short Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
title_full Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
title_fullStr Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
title_full_unstemmed Exploring thematic structure and predicted functionality of 16S rRNA amplicon data.
title_sort exploring thematic structure and predicted functionality of 16s rrna amplicon data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2019-01-01
description Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to sample characteristics such as host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. Additionally, analyses that use taxonomic relative abundances to predict the abundances of different gene functions aggregate predicted functional profiles across taxa. This precludes straightforward identification of predicted functional components associated with subsets of co-occurring taxa. We provide an approach to explore co-occurring taxa using "topics" generated via a topic model and link these topics to specific sample features (e.g., disease state). Rather than inferring predicted functional content based on overall taxonomic relative abundances, we instead focus on inference of functional content within topics, which we parse by estimating interactions between topics and pathways through a multilevel, fully Bayesian regression model. We apply our methods to three publicly available 16S amplicon sequencing datasets: an inflammatory bowel disease dataset, an oral cancer dataset, and a time-series dataset. Using our topic model approach to uncover latent structure in 16S rRNA amplicon surveys, investigators can (1) capture groups of co-occurring taxa termed topics; (2) uncover within-topic functional potential; (3) link taxa co-occurrence, gene function, and environmental/host features; and (4) explore the way in which sets of co-occurring taxa behave and evolve over time. These methods have been implemented in a freely available R package: https://cran.r-project.org/package=themetagenomics, https://github.com/EESI/themetagenomics.
url https://doi.org/10.1371/journal.pone.0219235
work_keys_str_mv AT stephenwoloszynek exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
AT joshuachangmell exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
AT zhengqiaozhao exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
AT gideonsimpson exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
AT michaelpoconnor exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
AT gaillrosen exploringthematicstructureandpredictedfunctionalityof16srrnaamplicondata
_version_ 1714817891785768960