Sequence count data are poorly fit by the negative binomial distribution.

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a...

Full description

Bibliographic Details
Main Authors:	Stijn Hawinkel, J C W Rayner, Luc Bijnens, Olivier Thas
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0224909

id	doaj-4c2341a075444a0d937155230ec257ff
record_format	Article
spelling	doaj-4c2341a075444a0d937155230ec257ff2021-03-03T21:41:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e022490910.1371/journal.pone.0224909Sequence count data are poorly fit by the negative binomial distribution.Stijn HawinkelJ C W RaynerLuc BijnensOlivier ThasSequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.https://doi.org/10.1371/journal.pone.0224909
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas
spellingShingle	Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas Sequence count data are poorly fit by the negative binomial distribution. PLoS ONE
author_facet	Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas
author_sort	Stijn Hawinkel
title	Sequence count data are poorly fit by the negative binomial distribution.
title_short	Sequence count data are poorly fit by the negative binomial distribution.
title_full	Sequence count data are poorly fit by the negative binomial distribution.
title_fullStr	Sequence count data are poorly fit by the negative binomial distribution.
title_full_unstemmed	Sequence count data are poorly fit by the negative binomial distribution.
title_sort	sequence count data are poorly fit by the negative binomial distribution.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2020-01-01
description	Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.
url	https://doi.org/10.1371/journal.pone.0224909
work_keys_str_mv	AT stijnhawinkel sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT jcwrayner sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT lucbijnens sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT olivierthas sequencecountdataarepoorlyfitbythenegativebinomialdistribution
_version_	1714815624979415040

Sequence count data are poorly fit by the negative binomial distribution.

Similar Items