Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/31792
Title: Sequence count data are poorly fit by the negative binomial distribution
Authors: Hawinkel, Stijn
Rayner, J. C. W.
BIJNENS, Luc 
THAS, Olivier 
Issue Date: 2020
Publisher: PUBLIC LIBRARY SCIENCE
Source: PLOS ONE, 15 (4) (Art N° e0224909)
Abstract: Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods.
Notes: Hawinkel, S (corresponding author), Univ Ghent, Dept Data Anal & Math Modelling, Ghent, Belgium.
stijn.hawinkel@ugent.be
Other: Hawinkel, S (corresponding author), Univ Ghent, Dept Data Anal & Math Modelling, Ghent, Belgium. stijn.hawinkel@ugent.be
Keywords: Goodness-Of-Fit;Rna-Seq Data;Models
Document URI: http://hdl.handle.net/1942/31792
ISSN: 1932-6203
e-ISSN: 1932-6203
DOI: 10.1371/journal.pone.0224909
ISI #: WOS:000536673200005
Rights: © 2020 Hawinkel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Category: A1
Type: Journal Contribution
Validations: ecoom 2021
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
Hawinkel_Stijn_2020.pdfPublished version1.06 MBAdobe PDFView/Open
Show full item record

WEB OF SCIENCETM
Citations

24
checked on Apr 22, 2024

Page view(s)

64
checked on Jul 15, 2022

Download(s)

16
checked on Jul 15, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.