Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/16968
Title: | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model | Authors: | De Beuf, Kristof De Schrijver, Joachim THAS, Olivier Van Criekinge, Wim Irizarry, Rafael A. CLEMENT, Lieven |
Issue Date: | 2012 | Publisher: | BIOMED CENTRAL LTD | Source: | BMC BIOINFORMATICS, 13 | Abstract: | Background: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies. | Notes: | [De Beuf, Kristof; De Schrijver, Joachim; Thas, Olivier; Van Criekinge, Wim] Univ Ghent, Dept Math Modelling Stat & Bioinformat, B-9000 Ghent, Belgium. [Thas, Olivier] Univ Wollongong, Sch Math & Appl Stat, Ctr Stat & Survey Methodol, Wollongong, NSW 2522, Australia. [Irizarry, Rafael A.] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA. [Clement, Lieven] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium. [Clement, Lieven] Katholieke Univ Leuven, Interuniv Inst Biostat & Stat Bioinformat, B-3000 Louvain, Belgium. [Clement, Lieven] Univ Hasselt, B-3000 Louvain, Belgium. | Keywords: | biochemical research Methods; biotechnology & applied microbiology; mathematical & computational biology | Document URI: | http://hdl.handle.net/1942/16968 | ISSN: | 1471-2105 | e-ISSN: | 1471-2105 | DOI: | 10.1186/1471-2105-13-303 | ISI #: | 000312894900001 | Rights: | © 2012 De Beuf et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. | Category: | A1 | Type: | Journal Contribution | Validations: | ecoom 2014 |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
1471-2105-13-303.pdf | 487.58 kB | Adobe PDF | View/Open |
SCOPUSTM
Citations
12
checked on Sep 2, 2020
WEB OF SCIENCETM
Citations
15
checked on Oct 12, 2024
Page view(s)
88
checked on Apr 26, 2023
Download(s)
130
checked on Apr 26, 2023
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.