Content in numerous data sources are not directly amenable to machine processing. This book describes techniques for automated semantic analysis of schematic content which are characterized by being populated from backend databases. Starting with a seed set of hand-labeled instances of semantic concepts in a set of HTML documents, a technique is devised that bootstraps an annotation process for automatic identification of concept instances present in other documents. The technique exploits the observation that semantically related items in schematic HTML documents exhibit consistency in...
Content in numerous data sources are not directly amenable to machine processing. This book describes techniques for automated semantic analysis of sc...