Inferring correlation networks from genomic survey data.

High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies b...

Full description

Bibliographic Details
Main Authors:	Jonathan Friedman, Eric J Alm
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2012-01-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC3447976?pdf=render

id	doaj-7d70acb6954e402bab73619f89c87fb3
record_format	Article
spelling	doaj-7d70acb6954e402bab73619f89c87fb32020-11-25T02:31:46ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582012-01-0189e100268710.1371/journal.pcbi.1002687Inferring correlation networks from genomic survey data.Jonathan FriedmanEric J AlmHigh-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.http://europepmc.org/articles/PMC3447976?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jonathan Friedman Eric J Alm
spellingShingle	Jonathan Friedman Eric J Alm Inferring correlation networks from genomic survey data. PLoS Computational Biology
author_facet	Jonathan Friedman Eric J Alm
author_sort	Jonathan Friedman
title	Inferring correlation networks from genomic survey data.
title_short	Inferring correlation networks from genomic survey data.
title_full	Inferring correlation networks from genomic survey data.
title_fullStr	Inferring correlation networks from genomic survey data.
title_full_unstemmed	Inferring correlation networks from genomic survey data.
title_sort	inferring correlation networks from genomic survey data.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2012-01-01
description	High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.
url	http://europepmc.org/articles/PMC3447976?pdf=render
work_keys_str_mv	AT jonathanfriedman inferringcorrelationnetworksfromgenomicsurveydata AT ericjalm inferringcorrelationnetworksfromgenomicsurveydata
_version_	1724822117371871232

Inferring correlation networks from genomic survey data.

Similar Items