Chapter 1 Overview.- Chapter 2 Building an NLIDB: The Basics.- Chapter 3 Data and Query Model.- Chapter 4 Text to Data.- Chapter 5 Evaluation.- Chapter 6 Data to Text.- Chapter 7 Interactivity.- Index
Yunyao Li is the Head of Machine Learning at Apple Knowledge Platform. Until early 2022, she was a Distinguished Research Staff Member and Senior Research Manager at IBM Research - Almaden. She was also a Master Inventor and a member of IBM Academy of Technology. She is an ACM Distinguished Member and a member of the inaugural New Voices program of the American National Academies. Her expertise is at the intersection of natural language processing, databases, human-computer interaction, machine learning, and information retrieval. Her contributions in these areas have led to over 100 research publications with multiple awards, 36 patents granted, multiple graduate-level courses (including 2 Massive Open Online Courses), and billions in revenue generated from technology transfer. Yunyao pioneered some of the landmark work on NLIDB, including NaLIX, the first conversational NLIDB for XML. She is the co-author of Natural Language Data Management and Interfaces. Yunyao holds a Ph.D. in Computer Science & Engineering from the University of Michigan - Ann Arbor.
Dragomir Radev was the A. Bartlett Giamatti Professor of Computer Science at Yale University. He had been a Fellow of ACM, AAAI, AAAS, and ACL. Dragomir's interests were in semantic parsing, text summarization, natural language generation, logical reasoning, and information retrieval. Dragomir was the co-author of Graph-based Natural Language Processing and Information Retrieval with Rada Mihalcea (Cambridge University Press, 2011). He was also the editor of a two-volume collection of problems from the North American Computational Linguistics Open Contest (NACLO), Puzzles in Logic, Languages and Computation: The Red Book and Puzzles in Logic, Languages and Computation: The Green Book (both published by Springer in 2013). Dragomir held a Ph.D. in Computer Science from Columbia University. Sadly, Drago passed away shortly before publication of this book in 2023.
Davood Rafiei holds the position of Professor of Computer Science and is an active member of the Database Systems Research Group at the University of Alberta. His areas of expertise span databases, information retrieval, and NLP with a focus on managing large complex data, data integration, and natural language interfaces to databases. He has co-authored the book Natural Language Data Management and Interfaces and, more recently, the article “DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction,” which held the top position on two major text-to-SQL leaderboards. Davood regularly serves on the program committees for major database and data mining conferences (such as SIGMOD, VLDB, KDD, CIKM) and Web and IR conferences (such as WWW, SIGIR, WSDM). His academic journey includes undergraduate studies at Sharif University, a Master's degree from the University of Waterloo, and a Ph.D. from the University of Toronto. He has been a visiting scientist at Google (2007-2008 ), Kyoto University (2014), and the University of Paris Descartes (2015).
This book presents a comprehensive overview of Natural Language Interfaces to Databases (NLIDBs), an indispensable tool in the ever-expanding realm of data-driven exploration and decision making. After first demonstrating the importance of the field using an interactive ChatGPT session, the book explores the remarkable progress and general challenges faced with real-world deployment of NLIDBs. It goes on to provide readers with a holistic understanding of the intricate anatomy, essential components, and mechanisms underlying NLIDBs and how to build them. Key concepts in representing, querying, and processing structured data as well as approaches for optimizing user queries are established for the reader before their application in NLIDBs is explored. The book discusses text to data through early relevant work on semantic parsing and meaning representation before turning to cutting-edge advancements in how NLIDBs are empowered to comprehend and interpret human languages. Various evaluation methodologies, metrics, datasets and benchmarks that play a pivotal role in assessing the effectiveness of mapping natural language queries to formal queries in a database and the overall performance of a system are explored. The book then covers data to text, where formal representations of structured data are transformed into coherent and contextually relevant human-readable narratives. It closes with an exploration of the challenges and opportunities related to interactivity and its corresponding techniques for each dimension, such as instances of conversational NLIDBs and multi-modal NLIDBs where user input is beyond natural language. This book provides a balanced mixture of theoretical insights, practical knowledge, and real-world applications that will be an invaluable resource for researchers, practitioners, and students eager to explore the fundamental concepts of NLIDBs.