In this book, we develop two frameworks to tackle the task of semi-structured Web data record extraction. We first present a record segmentation search tree framework in which a new search structure, named Record Segmentation Tree (RST), is designed and several efficient search pruning strategies on the RST structure are proposed to identify the records in a given Web page. We also present another DOM Structure Knowledge Oriented Global Analysis (Skoga) framework which can perform robust detection of different kinds of data records and record regions. Skoga can conduct a global analysis on...
In this book, we develop two frameworks to tackle the task of semi-structured Web data record extraction. We first present a record segmentation searc...