Journal Article

RNA-Seq gene expression estimation with read mapping uncertainty

Bo Li, Victor Ruotti, Ron M. Stewart, James A. Thomson and Colin N. Dewey

in Bioinformatics

Volume 26, issue 4, pages 493-500
Published in print February 2010 | ISSN: 1367-4803
Published online December 2009 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btp692

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.

Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.

Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem.

Contact: cdewey@biostat.wisc.edu

Supplementary information: Supplementary data are available at Bioinformatics on

Journal Article.  6071 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.