Journal Article

Improving position-specific predictions of protein functional sites using phylogenetic motifs

K. C. Dukka Bahadur and Dennis R. Livesay

in Bioinformatics

Volume 24, issue 20, pages 2308-2316
Published in print October 2008 | ISSN: 1367-4803
Published online August 2008 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btn454
Improving position-specific predictions of protein functional sites using phylogenetic motifs

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy.

Results: Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.

Availability: Our benchmark datasets are available for download at http://www.cs.uncc.edu/~drlivesa/dataset.html.

Contact: drlivesa@uncc.edu

Supplementary information: Supplementary data is available at Bioinformatics online.

Journal Article.  7699 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.