Compositional zero-inflated network estimation for microbiome data

Abstract Background The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since...

Full description

Bibliographic Details
Main Authors: Min Jin Ha, Junghi Kim, Jessica Galloway-Peña, Kim-Anh Do, Christine B. Peterson
Format: Article
Language:English
Published: BMC 2020-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03911-w
id doaj-5507f1a05b324445a40d9d62484be09d
record_format Article
spelling doaj-5507f1a05b324445a40d9d62484be09d2021-01-03T12:21:21ZengBMCBMC Bioinformatics1471-21052020-12-0121S2112010.1186/s12859-020-03911-wCompositional zero-inflated network estimation for microbiome dataMin Jin Ha0Junghi Kim1Jessica Galloway-Peña2Kim-Anh Do3Christine B. Peterson4Department of Biostatistics, University of Texas MD Anderson Cancer CenterCenter for Devices and Radiological Health, U.S. Food and Drug AdministrationDepartment of Veterinary Pathobiology, Texas A&M UniversityDepartment of Biostatistics, University of Texas MD Anderson Cancer CenterDepartment of Biostatistics, University of Texas MD Anderson Cancer CenterAbstract Background The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated. Results We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients. Conclusions Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .https://doi.org/10.1186/s12859-020-03911-wMicrobiomeNetworkGraphical modelZero-inflationCompositional data
collection DOAJ
language English
format Article
sources DOAJ
author Min Jin Ha
Junghi Kim
Jessica Galloway-Peña
Kim-Anh Do
Christine B. Peterson
spellingShingle Min Jin Ha
Junghi Kim
Jessica Galloway-Peña
Kim-Anh Do
Christine B. Peterson
Compositional zero-inflated network estimation for microbiome data
BMC Bioinformatics
Microbiome
Network
Graphical model
Zero-inflation
Compositional data
author_facet Min Jin Ha
Junghi Kim
Jessica Galloway-Peña
Kim-Anh Do
Christine B. Peterson
author_sort Min Jin Ha
title Compositional zero-inflated network estimation for microbiome data
title_short Compositional zero-inflated network estimation for microbiome data
title_full Compositional zero-inflated network estimation for microbiome data
title_fullStr Compositional zero-inflated network estimation for microbiome data
title_full_unstemmed Compositional zero-inflated network estimation for microbiome data
title_sort compositional zero-inflated network estimation for microbiome data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-12-01
description Abstract Background The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated. Results We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients. Conclusions Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .
topic Microbiome
Network
Graphical model
Zero-inflation
Compositional data
url https://doi.org/10.1186/s12859-020-03911-w
work_keys_str_mv AT minjinha compositionalzeroinflatednetworkestimationformicrobiomedata
AT junghikim compositionalzeroinflatednetworkestimationformicrobiomedata
AT jessicagallowaypena compositionalzeroinflatednetworkestimationformicrobiomedata
AT kimanhdo compositionalzeroinflatednetworkestimationformicrobiomedata
AT christinebpeterson compositionalzeroinflatednetworkestimationformicrobiomedata
_version_ 1724350330332774400