Journal Article

Evaluating the state of the art in disorder recognition and normalization of the clinical narrative

Sameer Pradhan, Noémie Elhadad, Brett R South, David Martinez, Lee Christensen, Amy Vogel, Hanna Suominen, Wendy W Chapman and Guergana Savova

in Journal of the American Medical Informatics Association

Volume 22, issue 1, pages 143-154
Published in print January 2015 | ISSN: 1067-5027
Published online August 2014 | e-ISSN: 1527-974X | DOI:

Show Summary Details


Objective The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners.

Materials and methods We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text—199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance.

Results For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59.

Discussion Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources.

Conclusions The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.

Keywords: Natural Language Processing; Disorder Identifciation; Named Entity Recognition; Information Extraction; Word Sense Disambiguation; Clinical Notes

Journal Article.  6412 words.  Illustrated.

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.