This article outlines the recently used methods for designing part-of-speech taggers; computer programs for assigning contextually appropriate grammatical descriptors to words in texts. It begins with the description of general architecture and task setting. It gives an overview of the history of tagging and describes the central approaches to tagging. These approaches are: taggers based on handwritten local rules, taggers based on n-grams automatically derived from text corpora, taggers based on hidden Markov models, taggers using automatically generated symbolic language models derived using methods from machine tagging, taggers based on handwritten global rules, and hybrid taggers, which combine the advantages of handwritten and automatically generated taggers. This article focuses on handwritten tagging rules. Well-tagged training corpora are a valuable resource for testing and improving language model. The text corpus reminds the grammarian about any oversight while designing a rule.
Keywords: part-of-speech taggers; grammatical descriptors; task setting; machine tagging; Markov model; handwritten tagging rules
Article. 4802 words.
Subjects: Computational Linguistics ; Grammar, Syntax and Morphology
Full text: subscription required