ISBN-13: 9783836427517 / Angielski / Miękka / 2007 / 184 str.
ISBN-13: 9783836427517 / Angielski / Miękka / 2007 / 184 str.
Supervised word sense disambiguation (WSD) for truly polysemous words (incontrast to homonyms) is difficult for machine learning, mainly due to twoproblems: the lack of sense-tagged training data and the sparsity of the matrixof observed instances vs. features. At the same time, high accuracy is necessaryfor WSD to be beneficial for high-level applications, such as informationretrieval, question answering, and machine translation. This work addressesthe above two problems through combining rich linguistic knowledgeand machine learning methods. First, it proposes and demonstrates empiricallyevidence that careful design and generation of linguistically motivatedfeatures help to alleviate the data sparseness inherent in WSD. A state-of-theartsupervised system for verb sense disambiguation was introduced. Explorationin three specific aspects of feature generation was discussed andshown to elevate the system accuracy to top-level. It also shows the effectivenessof active learning in the creation of more labeled training data for supervisedWSD - reducing the required training data by 1/2 to 3/4 when learningcoarse-grained English verb senses. The book is addressed to researchers inComputer and Information Science and Computational Linguistics.