Journal Article

Exploring classification strategies with the CoEPrA 2006 contest

Ozgur Demir-Kavuk, Henning Riedesel and Ernst-Walter Knapp

in Bioinformatics

Volume 26, issue 5, pages 603-609
Published in print March 2010 | ISSN: 1367-4803
Published online January 2010 | e-ISSN: 1460-2059 | DOI:
Exploring classification strategies with the CoEPrA 2006 contest

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: In silico methods to classify compounds as potential drugs that bind to a specific target become increasingly important for drug design. To build classification devices training sets of drugs with known activities are needed. For many such classification problems, not only qualitative but also quantitative information of a specific property (e.g. binding affinity) is available. The latter can be used to build a regression scheme to predict this property for new compounds. Predicting a compound property explicitly is generally more difficult than classifying that the property lies below or above a given threshold value. Hence, an indirect classification that is based on regression may lead to poorer results than a direct classification scheme. In fact, initially researchers are only interested to classify compounds as potential drugs. The activities of these compounds are subsequently measured in wet lab.

Results: We propose a novel approach that uses available quantitative information directly for classification rather than first using a regression scheme. It uses a new type of loss function called weighted biased regression. Application of this method to four widely studied datasets of the CoEPrA contest (Comparative Evaluation of Prediction Algorithms, shows that it can outperform simple classification methods that do not make use of this additional quantitative information.

Availability: A stand alone application is available at the webpage that can be used to build a model for a peptide training set to be submitted.


Supplementary Information: Supplementary data are available at Bioinformatics online.

Journal Article.  6504 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.