Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics

Abstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper...

Full description

Bibliographic Details
Main Authors: Kwangbom Choi, Yang Chen, Daniel A. Skelly, Gary A. Churchill
Format: Article
Language:English
Published: BMC 2020-07-01
Series:Genome Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13059-020-02103-2
id doaj-9044e5bb4d9a401a8daec606a6ed647d
record_format Article
spelling doaj-9044e5bb4d9a401a8daec606a6ed647d2020-11-25T03:31:45ZengBMCGenome Biology1474-760X2020-07-0121111610.1186/s13059-020-02103-2Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomicsKwangbom Choi0Yang Chen1Daniel A. Skelly2Gary A. Churchill3The Jackson LaboratoryUniversity of MichiganThe Jackson LaboratoryThe Jackson LaboratoryAbstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. Results We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Conclusions Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis.http://link.springer.com/article/10.1186/s13059-020-02103-2Single-cell RNA sequencingZero inflationBayesian model selectionCell heterogeneityGene expression stochasticity
collection DOAJ
language English
format Article
sources DOAJ
author Kwangbom Choi
Yang Chen
Daniel A. Skelly
Gary A. Churchill
spellingShingle Kwangbom Choi
Yang Chen
Daniel A. Skelly
Gary A. Churchill
Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
Genome Biology
Single-cell RNA sequencing
Zero inflation
Bayesian model selection
Cell heterogeneity
Gene expression stochasticity
author_facet Kwangbom Choi
Yang Chen
Daniel A. Skelly
Gary A. Churchill
author_sort Kwangbom Choi
title Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
title_short Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
title_full Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
title_fullStr Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
title_full_unstemmed Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
title_sort bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2020-07-01
description Abstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. Results We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Conclusions Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis.
topic Single-cell RNA sequencing
Zero inflation
Bayesian model selection
Cell heterogeneity
Gene expression stochasticity
url http://link.springer.com/article/10.1186/s13059-020-02103-2
work_keys_str_mv AT kwangbomchoi bayesianmodelselectionrevealsbiologicaloriginsofzeroinflationinsinglecelltranscriptomics
AT yangchen bayesianmodelselectionrevealsbiologicaloriginsofzeroinflationinsinglecelltranscriptomics
AT danielaskelly bayesianmodelselectionrevealsbiologicaloriginsofzeroinflationinsinglecelltranscriptomics
AT garyachurchill bayesianmodelselectionrevealsbiologicaloriginsofzeroinflationinsinglecelltranscriptomics
_version_ 1724571889344446464