Representing Tone in Levenshtein Distance

Cathryn Yang and Andy Castro

in Computing and Language Variation

Published by Edinburgh University Press

Published in print December 2009 | ISBN: 9780748640300
Published online September 2012 | e-ISBN: 9780748671380 | DOI:
Representing Tone in Levenshtein Distance

More Like This

Show all results sharing this subject:

  • Language Teaching and Learning


Show Summary Details


Levenshtein distance, also known as string edit distance, correlates strongly with both perceived distance and intelligibility in various Indo-European languages. This chapter describes the application of Levenshtein distance to dialect data from Bai, a Sino-Tibetan language, and Hongshuihe Zhuang, a Tai language. In applying Levenshtein distance to languages with contour tone systems, the chapter asks the following questions: How much variation in intelligibility can tone alone explain? Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? The chapter evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation, autosegments of H (high) and L (low), and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contour-offset and onset-contour representations show significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour.

Keywords: Levenshtein distance; tone; intelligibility; Bai; Hongshuihe Zhuang; contour; autosegments; pitch numbers; onset; offset

Chapter.  5080 words.  Illustrated.

Subjects: Language Teaching and Learning

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.