Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results

Background: 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis...

Full description

Bibliographic Details
Main Authors: Baruzzo, G. (Author), Di Camillo, B. (Author), Patuzzi, I. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03654nam a2200577Ia 4500
001 10.1186-s12859-022-04587-0
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-022-04587-0 
520 3 |a Background: 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered. Results: In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis. Conclusions: The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts. © 2022, The Author(s). 
650 0 4 |a 16s rDNA 
650 0 4 |a 16s rDNA-seq 
650 0 4 |a 16S rDNA-Seq 
650 0 4 |a Bacteria 
650 0 4 |a bacterium 
650 0 4 |a Benchmarking 
650 0 4 |a Count data 
650 0 4 |a Count datum 
650 0 4 |a Count preprocessing 
650 0 4 |a Count preprocessing 
650 0 4 |a Count simulation 
650 0 4 |a Count simulation 
650 0 4 |a data analysis 
650 0 4 |a Data Analysis 
650 0 4 |a DNA sequence 
650 0 4 |a Genes 
650 0 4 |a Genes, rRNA 
650 0 4 |a genetics 
650 0 4 |a high throughput sequencing 
650 0 4 |a High-Throughput Nucleotide Sequencing 
650 0 4 |a Information analysis 
650 0 4 |a Normalisation 
650 0 4 |a Normalization 
650 0 4 |a Pipelines 
650 0 4 |a Pre-processing step 
650 0 4 |a Recovery 
650 0 4 |a RNA 16S 
650 0 4 |a RNA gene 
650 0 4 |a RNA, Ribosomal, 16S 
650 0 4 |a Sequence Analysis, DNA 
650 0 4 |a Sparsity 
650 0 4 |a Sparsity 
650 0 4 |a Work-flows 
650 0 4 |a Zero-imputation 
650 0 4 |a Zero-imputation 
700 1 |a Baruzzo, G.  |e author 
700 1 |a Di Camillo, B.  |e author 
700 1 |a Patuzzi, I.  |e author 
773 |t BMC Bioinformatics