1 Introduction.- 2 Preliminaries.- 3 Linguistic Linked Open Data Cloud.- 4 Modelling lexical resources as Linked Data.- 5 Representing annotated texts as RDF.- 6 Modelling linguistic annotations.- 7 Modelling metadata of language resources.- 8 Linguistic Categories.- 9 Converting language resources into Linked Data.- 10 Link Representation and Discovery.- 11 Linked Data-based NLP Workflows.- 12 Applying linked data principles to linking multilingualWordnets.- 13 Linguistic Linked Data in Digital Humanities.- 14 Discovery of language resources.- 15 Conclusion.
Philipp Cimiano is a Professor of Computer Science and Head of the Semantic Computing Group at Bielefeld University. His research focuses on topics at the intersection of knowledge representation and natural language processing. Together with the other authors of this book, he was one of the first researchers to propose applying linked data technologies to the domain of linguistics. For more than ten years, he has been working on topics such as ontology learning, question answering using linked data, and information extraction. He is co-chair of the W3C ontology lexicon community group.
Christian Chiarcos is an Assistant Professor of Computer Science at Goethe University Frankfurt, and heads the Applied Computational Linguistics group. His research focuses on semantic technologies, including computational semantics as well as the innovative application of Semantic Web standards to NLP problems. He has been co-founder of the Open Linguistics Working Group of the Open Knowledge Foundation (OWLG).
John McCrae is a research lecturer at the Data Science Institute and Insight Centre for Data Analytics at the National University of Ireland Galway, where he leads the Unit for Linguistic Data. This group is focused on the creation, maintenance and application of language resources and other linguistic data. He is the coordinator of the Prêt-à-LLOD project, which aims to make linguistic linked open data ready to use as well as the leader of the linguistic linked open data task in the European Lexicographic Infrastructure (ELEXIS).
Jorge Gracia is an Assistant Professor at University of Zaragoza, where he belongs to the Aragon Institute of Engineering Research (I3A) and to the Distributed Information Systems research group. His current research interests include multilingualism and linked data, cross-lingual matching and information access on the Semantic Web, as well as interoperability of language resources on the Web. He currently leads NexusLinguarum, the "European network for Web-centred linguistic data science" COST Action.
This is the first monograph on the emerging area of linguistic linked data. Presenting a combination of background information on linguistic linked data and concrete implementation advice, it introduces and discusses the main benefits of applying linked data (LD) principles to the representation and publication of linguistic resources, arguing that LD does not look at a single resource in isolation but seeks to create a large network of resources that can be used together and uniformly, and so making more of the single resource.
The book describes how the LD principles can be applied to modelling language resources. The first part provides the foundation for understanding the remainder of the book, introducing the data models, ontology and query languages used as the basis of the Semantic Web and LD and offering a more detailed overview of the Linguistic Linked Data Cloud. The second part of the book focuses on modelling language resources using LD principles, describing how to model lexical resources using Ontolex-lemon, the lexicon model for ontologies, and how to annotate and address elements of text represented in RDF. It also demonstrates how to model annotations, and how to capture the metadata of language resources. Further, it includes a chapter on representing linguistic categories. In the third part of the book, the authors describe how language resources can be transformed into LD and how links can be inferred and added to the data to increase connectivity and linking between different datasets. They also discuss using LD resources for natural language processing. The last part describes concrete applications of the technologies: representing and linking multilingual wordnets, applications in digital humanities and the discovery of language resources.
Given its scope, the book is relevant for researchers and graduate students interested in topics at the crossroads of natural language processing / computational linguistics and the Semantic Web / linked data. It appeals to Semantic Web experts who are not proficient in applying the Semantic Web and LD principles to linguistic data, as well as to computational linguists who are used to working with lexical and linguistic resources wanting to learn about a new paradigm for modelling, publishing and exploiting linguistic resources.