Text Data Mining

Marti A. Hearst

in The Oxford Handbook of Computational Linguistics

Published in print January 2005 | ISBN: 9780199276349
Published online September 2012 | | DOI:

Series: Oxford Handbooks in Linguistics

Text Data Mining


Text expresses a vast range of information, but encodes this information in a form that is difficult to decipher automatically. This article defines data mining, information retrieval, and corpus-based computational linguistics, and then discusses the relationship of these to text data mining. The results of certain types of text processing can yield tools that indirectly aid in the information retrieval process. The standard practice of reading textbooks, journal articles, and other documents helps researchers in the discovery of new information, since this is an integral part of the research process. This article presents an idea for using text for discovery in a more direct manner of which, two examples are, using text to form hypothesis about disease and to uncover the social impact. This article describes the Linking Information for Novel Discovery and Insight (LINDI) project, which investigates how researchers can use large text collections in the discovery of new information.

Keywords: data mining; information retrieval; computational linguistics; text data mining; LINDI project; text processing

Article.  4670 words. 

Subjects: Linguistics ; Computational Linguistics

Full text: subscription required

How to subscribeRecommend to my Librarian

Buy this work at Oxford University Press »