ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (featu...

Full description

Bibliographic Details
Main Authors: Do, K.-A (Author), Jenq, R.R (Author), Peterson, C.B (Author), Shi, Y. (Author), Zhang, L. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03972nam a2200541Ia 4500
001 10.1186-s12859-021-04061-3
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04061-3 
520 3 |a Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings. © 2021, The Author(s). 
650 0 4 |a article 
650 0 4 |a controlled study 
650 0 4 |a Data visualization 
650 0 4 |a Differential test 
650 0 4 |a Dynamic representation 
650 0 4 |a feature selection 
650 0 4 |a Feature selection 
650 0 4 |a Fragility index 
650 0 4 |a Fragility index 
650 0 4 |a High-dimensional 
650 0 4 |a microbiome 
650 0 4 |a Microbiome 
650 0 4 |a Microbiota 
650 0 4 |a microflora 
650 0 4 |a Multiple testing 
650 0 4 |a nonhuman 
650 0 4 |a nonparametric test 
650 0 4 |a nonparametric test 
650 0 4 |a Non-parametric test 
650 0 4 |a Permutation 
650 0 4 |a probability 
650 0 4 |a Probability 
650 0 4 |a rank sum test 
650 0 4 |a Robustness 
650 0 4 |a Signal strengths 
650 0 4 |a simulation 
650 0 4 |a Statistics, Nonparametric 
650 0 4 |a Testing 
650 0 4 |a Testing method 
650 0 4 |a Wilcoxon rank sum test 
700 1 |a Do, K.-A.  |e author 
700 1 |a Jenq, R.R.  |e author 
700 1 |a Peterson, C.B.  |e author 
700 1 |a Shi, Y.  |e author 
700 1 |a Zhang, L.  |e author 
773 |t BMC Bioinformatics