Corpora, Databases, and Internet Resources

Jennifer Cole, Mark Hasegawa-Johnson, Dan Loehr, Linda Van Guilder, Henning Reetz and Stefan A. Frisch

in The Oxford Handbook of Laboratory Phonology

Published in print December 2011 | ISBN: 9780199575039
Published online September 2012 | | DOI:

Series: Oxford Handbooks in Linguistics

 Corpora, Databases, and Internet Resources

More Like This

Show all results sharing these subjects:

  • Linguistics
  • Phonetics and Phonology


Show Summary Details


This article introduces a wide range of approaches to using large bodies of data for linguistic research. Corpus analysis for phonological research involves the investigation of the phonetic, phonological, and lexical properties of speech for the purpose of understanding the patterns of variation in the phonetic expression of words, and the distributional patterns of sound elements in relation to the linguistic context. A speech corpus provides a basis for investigating variability in phonetic form and also provides a rich resource for studying the relationship between phonological form and other levels of linguistic structure. Linguistic metadata provides information about the speakers, such as sex, age, ethnicity, and region of residence. Metadata may also provide information about speaker recruitment and recording procedures. Forced alignment is done using algorithms from automatic speech recognition (ASR), and is most successful when each phone associated with the word in its dictionary form is actually fully pronounced. One of the easiest methods of manipulating natural speech is the splicing technique, where parts of a speech signal are cut out, repeated, or cross-spliced with another piece of the signal. The gating technique is another form of natural speech signal manipulation often applied in psycholinguistic experiments, where parts of a speech signal are cut off, and incrementally more of the signal is presented to a listener. Another speech signal manipulation is the mixing of two signals.

Keywords: corpus analysis; lexical properties; phonology; usage frequency; linguistic metadata; gating technique; speech signal manipulation; automatic speech recognition

Article.  17529 words. 

Subjects: Linguistics ; Phonetics and Phonology

Full text: subscription required

How to subscribe Recommend to my Librarian

Buy this work at Oxford University Press »

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.