Chapter 1. Introduction.- Chapter 2. Literature Review.- Chapter 3. Data Analysis.- Chapter 4. SynTime: Token Types and Heuristic Rules.- 5. TOMN: Constituent-based Tagging Scheme.- Chapter 6. UGTO: Uncommon Words and Proper Nouns.- Chapter 7. Conclusion and Future Work.
Xiaoshi Zhong received his bachelor degree in computer science from Beihang University (BUAA), China, and his doctoral degree in computer science from Nanyang Technological University (NTU), Singapore. After a short period as a research fellow in NTU, he will join Beijing Institute of Technology (BIT), China, as an Assistant Professor in the School of Computer Science and Technology. His research interests mainly include data analytics, computational linguistics, and natural language processing.
Erik Cambria is the Founder of SenticNet, a Singapore-based company offering B2B sentiment analysis services, and an Associate Professor at NTU, where he also holds the appointment of Provost Chair in Computer Science and Engineering. Prior to joining NTU, he worked at Microsoft Research Asia and HP Labs India and earned his PhD through a joint programme between the University of Stirling and MIT Media Lab. Erik is recipient of many awards, e.g., the 2018 AI's 10 to Watch and the 2019 IEEE Outstanding Early Career award, and is often featured in the news, e.g., Forbes. He is Associate Editor of several journals, e.g., NEUCOM, INFFUS, KBS, IEEE CIM and IEEE Intelligent Systems (where he manages the Department of Affective Computing and Sentiment Analysis), and is involved in many international conferences as PC member, program chair, and speaker.
This book presents a synthetic analysis about the characteristics of time expressions and named entities, and some proposed methods for leveraging these characteristics to recognize time expressions and named entities from unstructured text. For modeling these two kinds of entities, the authors propose a rule-based method that introduces an abstracted layer between the specific words and the rules, and two learning-based methods that define a new type of tagging scheme based on the constituents of the entities, different from conventional position-based tagging schemes that cause the problem of inconsistent tag assignment. The authors also find that the length-frequency of entities follows a family of power-law distributions. This finding opens a door, complementary to the rank-frequency of words, to understand our communicative system in terms of language use.