Journal Article

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment

Benedict Paten, Javier Herrero, Kathryn Beal and Ewan Birney

in Bioinformatics

Volume 25, issue 3, pages 295-301
Published in print February 2009 | ISSN: 1367-4803
Published online December 2008 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btn630

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: Multiple sequence alignment is a cornerstone of comparative genomics. Much work has been done to improve methods for this task, particularly for the alignment of small sequences, and especially for amino acid sequences. However, less work has been done in making promising methods that work on the small-scale practically for the alignment of much larger genomic sequences.

Results: We take the method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term ‘sequence progressive alignment’, because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole-genome comparative genomic projects.

Availability: The Pecan program is freely available at http://www.ebi.ac.uk/∼bjp/pecan/ Pecan whole genome alignments can be found in the Ensembl genome browser.

Contact: benedict@soe.ucsc.edu

supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  6053 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.