Statistical Methods

Christer Samuelsson

in The Oxford Handbook of Computational Linguistics

Published in print January 2005 | ISBN: 9780199276349
Published online September 2012 | e-ISBN: 9780191743573 | DOI:

Series: Oxford Handbooks in Linguistics

 Statistical Methods

Show Summary Details


Statistical methods now belong to mainstream natural language processing. They have been successfully applied to virtually all tasks within language processing and neighbouring fields, including part-of-speech tagging, syntactic parsing, semantic interpretation, lexical acquisition, machine translation, information retrieval, and information extraction and language learning. This article reviews mathematical statistics and applies it to language modelling problems, leading up to the hidden Markov model and maximum entropy model. The real strength of maximum-entropy modelling lies in combining evidence from several rules, each one of which alone might not be conclusive, but which taken together dramatically affect the probability. Maximum-entropy modelling allows combining heterogeneous information sources to produce a uniform probabilistic model where each piece of information is formulated as a feature. The key ideas of mathematical statistics are simple and intuitive, but tend to be buried in a sea of mathematical technicalities. Finally, the article provides mathematical detail related to the topic of discussion.

Keywords: statistical methods; natural language processing; language modelling; hidden Markov model; maximum entropy modelling; probabilistic model; mathematical statistics

Article.  5626 words. 

Subjects: Computational Linguistics

Full text: subscription required

How to subscribe Recommend to my Librarian

Buy this work at Oxford University Press »

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.