Journal Article

Improving subcellular localization prediction using text classification and the gene ontology

Alona Fyshe, Yifeng Liu, Duane Szafron, Russ Greiner and Paul Lu

in Bioinformatics

Volume 24, issue 21, pages 2512-2517
Published in print November 2008 | ISSN: 1367-4803
Published online August 2008 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btn463
Improving subcellular localization prediction using text classification and the gene ontology

Show Summary Details

Preview

Motivation: Each protein performs its functions within some specific locations in a cell. This subcellular location is important for understanding protein function and for facilitating its purification. There are now many computational techniques for predicting location based on sequence analysis and database information from homologs. A few recent techniques use text from biological abstracts: our goal is to improve the prediction accuracy of such text-based techniques. We identify three techniques for improving text-based prediction: a rule for ambiguous abstract removal, a mechanism for using synonyms from the Gene Ontology (GO) and a mechanism for using the GO hierarchy to generalize terms. We show that these three techniques can significantly improve the accuracy of protein subcellular location predictors that use text extracted from PubMed abstracts whose references are recorded in Swiss-Prot.

Contact: duane@cs.ualberta.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  5662 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.