ISBN-13: 9783659794551 / Angielski / Miękka / 2015 / 120 str.
Several studies have attempted to improve the accuracy in dependency parsing by including information about word clusters into the parsing models. The use of word clusters are typically motivated by the shortage of labeled training data and domain adaption, attempting to influence a parsing model for use on data from a new domain. This book shows the effect of using cluster-based features in MaltParser, a data-driven parser for inductive dependency parsing. Different clustering features are used for generating clusters, using the K-means clustering algorithm. The clusters are used as a source of additional information in an expanded feature model used by the MaltParser system. Parsing experiments are performed on several different data sets, including the Wall Street Journal and texts from various web domains. Significantly improved parsing results are reported when using a cluster-informed parser compared to the baseline parser. The contents of this book might be of interest to anyone interested in the application of machine learning in language technology.
Several studies have attempted to improve the accuracy in dependency parsing by including information about word clusters into the parsing models. The use of word clusters are typically motivated by the shortage of labeled training data and domain adaption, attempting to influence a parsing model for use on data from a new domain. This book shows the effect of using cluster-based features in MaltParser, a data-driven parser for inductive dependency parsing. Different clustering features are used for generating clusters, using the K-means clustering algorithm. The clusters are used as a source of additional information in an expanded feature model used by the MaltParser system. Parsing experiments are performed on several different data sets, including the Wall Street Journal and texts from various web domains. Significantly improved parsing results are reported when using a cluster-informed parser compared to the baseline parser. The contents of this book might be of interest to anyone interested in the application of machine learning in language technology.