Journal Article

A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval

S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily and Pierre Baldi

in Bioinformatics

Volume 26, issue 10, pages 1348-1356
Published in print May 2010 | ISSN: 1367-4803
Published online April 2010 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btq140
A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this ‘early retrieval’ problem.

Results: To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization—the CROC(exp), an exponential transform of the ROC curve—as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix.

Availability: Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/

Contact: pfbaldi@ics.uci.edu

Journal Article.  7559 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.