Journal Article

Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages

William D. Lewis and Fei Xia

in Literary and Linguistic Computing

Published on behalf of ALLC: The European Association for Digital Humanities

Volume 25, issue 3, pages 303-319
Published in print September 2010 | ISSN: 0268-1145
Published online May 2010 | e-ISSN: 1477-4615 | DOI: https://dx.doi.org/10.1093/llc/fqq006
Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages

More Like This

Show all results sharing these subjects:

  • Language Teaching and Learning
  • Computational Linguistics
  • Bibliography
  • Digital Lifestyle
  • Information and Communication Technologies

GO

Show Summary Details

Preview

In this article, we review the process of building ODIN, the Online Database of Interlinear Text (http://odin.linguistlist.org) a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted on the web. At the time of this writing, ODIN holds nearly 190,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from >10% of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e. the documents from which data were extracted) by language name, language family, and even annotations used to markup the data (e.g. NOM, ACC, ERG, PST, 3SG). Further, we have sought to enrich the data we have collected and extract ‘knowledge’ from the enriched content. To enrich the data, we use a variety of statistical tagging and parsing methods applied in the English translations. An enhanced search facility allows users to find data across languages for a variety of syntactic constructions and constituent orders, facilitating unprecedented automated and online discovery of language data.

Journal Article.  8271 words.  Illustrated.

Subjects: Language Teaching and Learning ; Computational Linguistics ; Bibliography ; Digital Lifestyle ; Information and Communication Technologies

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content. subscribe or login to access all content.