Journal Article

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier

Jiaqi Xia, Zhenling Peng, Dawei Qi, Hongbo Mu and Jianyi Yang

in Bioinformatics

Volume 33, issue 6, pages 863-870
Published in print March 2017 | ISSN: 1367-4803
Published online December 2016 | e-ISSN: 1460-2059 | DOI: https://dx.doi.org/10.1093/bioinformatics/btw768
An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Abstract

Motivation: Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before.

Results: We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4–11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information.

Availability and Implementation: http://yanglab.nankai.edu.cn/TA-fold/

Contact: yangjy@nankai.edu.cn or mhb-506@163.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  6493 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.