Please use this identifier to cite or link to this item:
Title: Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
Authors: De Beuf, Kristof
De Schrijver, Joachim
THAS, Olivier 
Van Criekinge, Wim
Irizarry, Rafael A.
CLEMENT, Lieven 
Issue Date: 2012
Abstract: Background: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
Notes: [De Beuf, Kristof; De Schrijver, Joachim; Thas, Olivier; Van Criekinge, Wim] Univ Ghent, Dept Math Modelling Stat & Bioinformat, B-9000 Ghent, Belgium. [Thas, Olivier] Univ Wollongong, Sch Math & Appl Stat, Ctr Stat & Survey Methodol, Wollongong, NSW 2522, Australia. [Irizarry, Rafael A.] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA. [Clement, Lieven] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium. [Clement, Lieven] Katholieke Univ Leuven, Interuniv Inst Biostat & Stat Bioinformat, B-3000 Louvain, Belgium. [Clement, Lieven] Univ Hasselt, B-3000 Louvain, Belgium.
Keywords: biochemical research Methods; biotechnology & applied microbiology; mathematical & computational biology
Document URI:
ISSN: 1471-2105
e-ISSN: 1471-2105
DOI: 10.1186/1471-2105-13-303
ISI #: 000312894900001
Rights: © 2012 De Beuf et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Category: A1
Type: Journal Contribution
Validations: ecoom 2014
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
1471-2105-13-303.pdf487.58 kBAdobe PDFView/Open
Show full item record


checked on Sep 2, 2020


checked on May 29, 2022

Page view(s)

checked on Mar 21, 2022


checked on Mar 21, 2022

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.