Journal Article

OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing

Shreepriya Das and Haris Vikalo

in Bioinformatics

Volume 28, issue 13, pages 1677-1683
Published in print July 2012 | ISSN: 1367-4803
Published online May 2012 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/bts256
OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections.

Results: Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation–maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)—a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency.

Availability: A C code implementation of our algorithm can be downloaded from http://www.cerc.utexas.edu/OnlineCall/

Contact: hvikalo@ece.utexas.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  5030 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.