Journal Article

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Tian Kang, Shaodian Zhang, Youlan Tang, Gregory W Hruby, Alexander Rusanov, Noémie Elhadad and Chunhua Weng

in Journal of the American Medical Informatics Association

Published on behalf of American Medical Informatics Association

Volume 24, issue 6, pages 1062-1071
Published in print November 2017 | ISSN: 1067-5027
Published online April 2017 | e-ISSN: 1527-974X | DOI: https://dx.doi.org/10.1093/jamia/ocx019
EliIE: An open-source information extraction system for clinical trial eligibility criteria

More Like This

Show all results sharing these subjects:

  • Medical Statistics and Methodology
  • Bioinformatics and Computational Biology
  • Biomathematics and Statistics

GO

Show Summary Details

Preview

Abstract

Objective

To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0.

Materials and Methods

EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer’s clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling–based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring.

Results

In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation.

Conclusions

This study presents EliIE, an OMOP CDM–based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.

Keywords: natural language processing; machine learning; clinical trials; patient selection; common data model; named entity recognition

Journal Article.  6848 words.  Illustrated.

Subjects: Medical Statistics and Methodology ; Bioinformatics and Computational Biology ; Biomathematics and Statistics

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.