Sequence count data are poorly fit by the negative binomial distribution.
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0224909 |
id |
doaj-4c2341a075444a0d937155230ec257ff |
---|---|
record_format |
Article |
spelling |
doaj-4c2341a075444a0d937155230ec257ff2021-03-03T21:41:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e022490910.1371/journal.pone.0224909Sequence count data are poorly fit by the negative binomial distribution.Stijn HawinkelJ C W RaynerLuc BijnensOlivier ThasSequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.https://doi.org/10.1371/journal.pone.0224909 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas |
spellingShingle |
Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas Sequence count data are poorly fit by the negative binomial distribution. PLoS ONE |
author_facet |
Stijn Hawinkel J C W Rayner Luc Bijnens Olivier Thas |
author_sort |
Stijn Hawinkel |
title |
Sequence count data are poorly fit by the negative binomial distribution. |
title_short |
Sequence count data are poorly fit by the negative binomial distribution. |
title_full |
Sequence count data are poorly fit by the negative binomial distribution. |
title_fullStr |
Sequence count data are poorly fit by the negative binomial distribution. |
title_full_unstemmed |
Sequence count data are poorly fit by the negative binomial distribution. |
title_sort |
sequence count data are poorly fit by the negative binomial distribution. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2020-01-01 |
description |
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods. |
url |
https://doi.org/10.1371/journal.pone.0224909 |
work_keys_str_mv |
AT stijnhawinkel sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT jcwrayner sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT lucbijnens sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT olivierthas sequencecountdataarepoorlyfitbythenegativebinomialdistribution |
_version_ |
1714815624979415040 |