Journal Article

ARCS-Motif: discovering correlated motifs from unaligned biological sequences

Shijie Zhang, Wei Su and Jiong Yang

in Bioinformatics

Volume 25, issue 2, pages 183-189
Published in print January 2009 | ISSN: 1367-4803
Published online December 2008 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btn609
ARCS-Motif: discovering correlated motifs from unaligned biological sequences

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. In 2006, Song et al. introduced Aggregated Related Column Score (ARCS) measure which includes correlation information to the evaluation of motif importance. The paper showed that the ARCS measure is superior to other measures. Due to the complicated nature of the ARCS motif model, we cannot directly apply existing sequential motif discovery methods to find motifs with high ARCS values.

Results: This article presents a novel mining algorithm, ARCSMotif, to discover related sequential motifs in biological sequences. ARCS-Motif is applied to 400 PROSITE datasets and compared with five alternative methods (CONSENSUS, Gibbs sampler, MEME, SPLASH and DIALIGN-TX). ARCS-Motif outperforms all the methods in accuracy, and most of the methods in efficiency. Although SPLASH has better efficiency than ARCS-Motif, ARCS-Motif has much better accuracy than SPLASH. On average, ARCS-Motif is able to produce the motifs which are at least 10% better than the best of the alternative methods. Among the 400 PROSITE datasets, ARCS-Motif produces the best motifs for more than 200 families. Other than SPLASH, the execution time of ARCS-Motif is less than a third of that of the fastest alternative method and its execution time grows at the slowest rate with respect to the number of sequences and the average sequence among all methods.

Availability: Software: http://beijing.case.edu/ARCS_Motif/ARCS_Motif; Results: http://beijing.case.edu/ARCS_Motif

Contact: jiong.yang@case.edu

Journal Article.  4955 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.