Preface.-
1.Introduction.- Part I Background.- 2.Linguistic and Cognitive
Evidence About Anaphora.- 3. Early Approaches to Anaphora Resolution:
Theoretically Inspired and Heuristic-Based.- Part II Resources.- 4.Annotated Corpora and Annotation Tools.- 5.Evaluation
Metrics.- 6.Evaluation Campaigns.- 7.Preprocessing Technology.- 8.Off-the-shelf
Tools.- Part III Algorithms.- 9.The
Mention-Pair Model.- 10.Advanced Machine Learning Models for Coreference
Resolution.- 11.Integer Linear Programming for Coreference Resolution.- 12.Extracting
Anaphoric Agreement Properties from Corpora.- 13.Detecting Non-reference and
Non-anaphoricity.- 14.Using Lexical and Encyclopedic Knowledge.- Part IV Applications.- 15.Coreference
Applications to Summarization.- 16.Towards a Procedure Model for Developing
Anaphora Processing Applications.- Part
V Outlook.- 17.Challenges and Directions of Further Research.- Index.
Massimo Poesio is a cognitive scientist with a primary
interest in computational linguist but interests in psycholinguistics and neuroscience
as well. His research includes the development of computational models of semantic and discourse interpretation (in particular, anaphora resolution); the creation of corpora of anaphorically annotated data (he
pioneered the use of games-with-a-purpose for computational linguistics with the development of Phrase Detectives, http://www.phrasedetectives.org); the study of commonsense knowledge using a combination of methods from computational linguistics and from neuroscience; and the application of
text analytics methods to real life problems, such as deception detection and the identification of reports of human rights violations in social media.
Roland Stuckardt works as a consultant, research &
development manager, and scientific researcher in the fields of computational
linguistics and natural language processing. He studied computer science and
economics at Goethe University Frankfurt. During his work at the German
National Research Center for Information Technology (GMD) Darmstadt, he
specialized in text analysis, parsing, discourse semantics, and robust anaphor
resolution. He received his PhD at Goethe University for his research on
computer-based text content analysis in the social sciences. Among his research interests and main fields of work are
anaphora processing, information extraction, media content monitoring,
innovative natural language processing applications in general, and computer
chess.
Yannick Versley is a group leader in the Leibniz-ScienceCampus "Empirical
Linguistics and Computational Language Modeling", a collaboration
between the Institute for German Language (IDS) in Mannheim and the Institute
for Computational Linguistics at the University of
Heidelberg. He studied
Computer Science, Physics and Mathematics in Hamburg before doing a PhD in
Tübingen on the coreference resolution of definite
noun phrases in German newspaper text. During his subsequent work in
Rovereto/Trento, Tübingen, and Heidelberg, he has worked on a
number of topics including statistical parsing, coreference resolution, discourse relations, and distributional semantics, with particular attention to German.
This book lays out a path leading from the linguistic and cognitive basics, to classical rule-based and machine learning algorithms, to today’s state-of-the-art approaches, which use advanced empirically grounded techniques, automatic knowledge acquisition, and refined linguistic modeling to make a real difference in real-world applications. Anaphora and coreference resolution both refer to the process of linking textual phrases (and, consequently, the information attached to them) within as well as across sentence boundaries, and to the same discourse referent.
The book offers an overview of recent research advances, focusing on practical, operational approaches and their applications. In part I (Background), it provides a general introduction, which succinctly summarizes the linguistic, cognitive, and computational foundations of anaphora processing and the key classical rule- and machine-learning-based anaphora resolution algorithms. Acknowledging the central importance of shared resources, part II (Resources) covers annotated corpora, formal evaluation, preprocessing technology, and off-the-shelf anaphora resolution systems. Part III (Algorithms) provides a thorough description of state-of-the-art anaphora resolution algorithms, covering enhanced machine learning methods as well as techniques for accomplishing important subtasks such as mention detection and acquisition of relevant knowledge. Part IV (Applications) deals with a selection of important anaphora and coreference resolution applications, discussing particular scenarios in diverse domains and distilling a best-practice model for systematically approaching new application cases. In the concluding part V (Outlook), based on a survey conducted among the contributing authors, the prospects of the research field of anaphora processing are discussed, and promising new areas of interdisciplinary cooperation and emerging application scenarios are identified.
Given the book’s design, it can be used both as an accompanying text for advanced lectures in computational linguistics, natural language engineering, and computer science, and as a reference work for research and independent study. It addresses an audience that includes academic researchers, university lecturers, postgraduate students, advanced undergraduate students, industrial researchers, and software engineers.