ISBN-13: 9783836490276 / Angielski / Miękka / 2008 / 356 str.
ISBN-13: 9783836490276 / Angielski / Miękka / 2008 / 356 str.
This book describes novel software architectures for the integration of deep and shallow natural language processing (NLP) components in language technology. The generic markup language XML and the XML transformation language XSLT are used for flexible combination of linguistic markup produced by multiple NLP components. Shallow NLP components such as tokenizers, part-of-speech taggers, named entity recognizers and shallow parsers are combined with a deep parser, operating grammars written in the spirit of the Head-Driven Phrase Structure Grammar (HPSG) theory. The integration paradigm enables synergy leading to more robust deep parsing with increased coverage. It also constitutes a division of labor: the deep grammar models general, correct language use, while shallow systems are responsible for domain-specific extensions. Applications are presented in question answering, information extraction, natural language understanding, ontologies and the Semantic Web. The book addresses to software engineers, computational linguists and language technology engineers.