ISBN-13: 9783030789633 / Angielski / Miękka / 2022
This book presents a synthetic analysis about the characteristics of time expressions and named entities, and some proposed methods for leveraging these characteristics to recognize time expressions and named entities from unstructured text. For modeling these two kinds of entities, the authors propose a rule-based method that introduces an abstracted layer between the specific words and the rules, and two learning-based methods that define a new type of tagging scheme based on the constituents of the entities, different from conventional position-based tagging schemes that cause the problem of inconsistent tag assignment. The authors also find that the length-frequency of entities follows a family of power-law distributions. This finding opens a door, complementary to the rank-frequency of words, to understand our communicative system in terms of language use.