ISBN-13: 9781904987321 / Angielski / Miękka / 2005 / 240 str.
Automata and Dictionaries is aimed at students and specialists in natural language processing and related disciplines where efficient text analysis plays a role. Large linguistic resources, in particular lexica, are now recognized as a fundamental pre-requisite for all natural language processing tasks. Specialists in this domain cannot afford to be ignorant of the state-of-the-art lexicon-management algorithms. This monograph, which is also intended be used as an advanced text book in computational linguistics, fills a gap in natural language processing monographs and is complementary to other publications in this area. This book is also a source of examples, exercises and problems for software engineering in general. The algorithms that are presented are excellent examples of non-trivial problems of graph construction, graph handling and graph traversal. Even though published in scientific journals, they have not been presented in an easily accessible form so far to teachers and students. These algorithms will also be of interest for the training of software engineers. Chapter 1 of Automata and Dictionaries provides the application-oriented motivation for solving the problems studied in the rest of the book. It introduces and exemplifies several key notions of lexicon-based natural language processing in a way accessible to any computer science student. Chapter 2 surveys the main solutions of the problem, using as an example a very small toy lexicon. Chapter 3 defines the underlying mathematical notions, immediately illustrating theory with practical examples, which makes this part quite readable. Chapters 4 and 5 are dedicated to the two central notions of lexicon construction: the algorithms of determinization and minimization. The standard form of both algorithms is presented, but also their variants and some special cases that occur frequently in practice. The operation of the algorithms is described step by step in examples, introducing the beginner into the world of epsilon-transitions, state heights and reverse automata. Chapter 6 goes a step further into complexity. It is based on algorithms published by scholars from 1998 to now. They are presented here with the same clarity as the preceding, more classical, algorithms. This remarkable achievement owes much to the rigorous structuration of this chapter. These algorithms have variants for transducers, which are presented in Chapter 7 with the same pedagogical skill. The last chapter studies time and space complexity of the algorithms and explains several tricks useful to speed up their operation.
Automata and Dictionaries is aimed at students and specialists in natural language processing and related disciplines where efficient text analysis plays a role. Large linguistic resources, in particular lexica, are now recognized as a fundamental pre-requisite for all natural language processing tasks. Specialists in this domain cannot afford to be ignorant of the state-of-the-art lexicon-management algorithms. This monograph, which is also intended be used as an advanced text book in computational linguistics, fills a gap in natural language processing monographs and is complementary to other publications in this area.This book is also a source of examples, exercises and problems for software engineering in general. The algorithms that are presented are excellent examples of non-trivial problems of graph construction, graph handling and graph traversal. Even though published in scientific journals, they have not been presented in an easily accessible form so far to teachers and students. These algorithms will also be of interest for the training of software engineers.Chapter 1 of Automata and Dictionaries provides the application-oriented motivation for solving the problems studied in the rest of the book. It introduces and exemplifies several key notions of lexicon-based natural language processing in a way accessible to any computer science student. Chapter 2 surveys the main solutions of the problem, using as an example a very small toy lexicon. Chapter 3 defines the underlying mathematical notions, immediately illustrating theory with practical examples, which makes this part quite readable.Chapters 4 and 5 are dedicated to the two central notions of lexicon construction: the algorithms of determinization and minimization. The standard form of both algorithms is presented, but also their variants and some special cases that occur frequently in practice. The operation of the algorithms is described step by step in examples, introducing the beginner into the world of epsilon-transitions, state heights and reverse automata.Chapter 6 goes a step further into complexity. It is based on algorithms published by scholars from 1998 to now. They are presented here with the same clarity as the preceding, more classical, algorithms. This remarkable achievement owes much to the rigorous structuration of this chapter. These algorithms have variants for transducers, which are presented in Chapter 7 with the same pedagogical skill.The last chapter studies time and space complexity of the algorithms and explains several tricks useful to speed up their operation.