1.1 Representativeness Challenge in Ontology Engineering
1.2 The Phenomenon of Saturation
1.3 The Structure of the Book
2 Related Work
2.1 The Methodology Used for Literature Sampling
2.2 Domain Ontology Engineering and Requirements Elicitation
2.3 Ontology Learning from Texts and Community Consensus
2.4 Collecting Relevant Documents of Good Quality
2.5 Terminological Saturation and Representativeness
2.6 Theoretical Saturation and Ontology Learning
2.7 Ordering of Documents for Processing: Timestamps and Impact
2.8 Automated Term Extraction Methods
2.9 Software Implementations of ATE Methods
2.10 Text Similarity Measurement
2.11 Efficient String Matching for Searching Nested Terms
2.12 Research Gaps and Motivation
2.13 Research Questions and Objectives
2.14 Summary
3 The Formal Framework for Terminological Saturation
3.1 Preliminaries
3.2 Research Hypotheses
3.3 Terminological Difference Function (thd)
3.4 The Metric Properties of the thd Function
3.5 The Existence Conditions for Terminological Saturation
3.6 Scalability and Optimization
3.7 Summary
4 The Algorithmic Suite for Terminological Saturation Detection and Measurement
4.1 The Computation Flow for Terminological Saturation Detection and Measurement
4.2 Preparatory Steps and Algorithms
4.3 Pre-processing Steps and Algorithms
4.4 The Algorithms for the Optimized Computation Pipeline
4.5 The Baseline Algorithm for Terminological Difference Measurement
4.6 The Algorithms for Terms Grouping
4.7 The Algorithm for Accumulated Regular Noise Removal
4.8 Implementation in the Software Suite
4.9 Summary
5 Experimental Evaluation
5.1 Experimental Objectives
5.2 General Experimental Settings
5.3 Correctness Check using Synthetic Collections
5.4 The Choice of Software for ATE
5.5 The Influence of Document Ordering
5.6 The Influence of Term Grouping
5.7 Validity and Scalability of the Optimized Term Extraction Pipeline
5.8 Summary
6 Saturated Terminology Extraction and Analysis in Use
6.1 Checking Gartner Trend Prediction Using Terminological Analysis
6.2 Instrumenting the Literature Review Activity of Master Students
6.2.1 The Task for Students
6.2. Method Adoption Results
6.3 Practical Implications (Benefits)
6.4 Potential Use Scenarios in Scientific Publishing
6.5 Summary
7 Conclusions and Outlook
7.1 The Summary of Findings and Results
7.2 Future Work
References
Victoria has recently defended her Ph.D. with the thesis entitled "A Method of Experimental Study of Terminological Saturation in Document Collections for Knowledge Elicitation" at the department of Computer Science of Zaporizhzhia National University (Ukraine). She is currently a self-employed researcher involved in an industrial consulting project with GroupBWT LLC. Her professional interests and competence are within the fields of Automated Terminology Recognition and Ontology Engineering.
Vadim is an associate professor at the Department of Computer Science of Zaporizhzhia National University (Ukraine). He is also the lead of Intelligent Systems Research Group. Throughout his career, he combines academic activities with different professional engagements in industry (as a research consultant) and public international organizations (as an expert) in knowledge and ontology engineering, semantic technologies, intelligent software systems, distributed artificial intelligence. A particular research topic that he focuses on in his research is capturing the dynamics and adaptability of real world in intelligent artefacts. His current research interests are within ontology engineering, ontology learning, and text mining.
This book highlights an innovative approach for extracting terminological cores from subject domain-bounded collections of professional texts. The approach is based on exploiting the phenomenon of terminological saturation. The book presents the formal framework for the method of detecting and measuring terminological saturation as a successive approximation process. It further offers the suite of the algorithms that implement the method in the software and comprehensively evaluates all the aspects of the method and possible input configurations in the experiments on synthetic and real collections of texts in several subject domains. The book demonstrates the use of the developed method and software pipeline in industrial and academic use cases. It also outlines the potential benefits of the method for the adoption in industry.