This book addresses several knowledge discovery problems on multi-sourced data where the theories, techniques, and methods in data cleaning, data mining, and natural language processing are synthetically used. This book mainly focuses on three data models: the multi-sourced isomorphic data, the multi-sourced heterogeneous data, and the text data. On the basis of three data models, this book studies the knowledge discovery problems including truth discovery and fact discovery on multi-sourced data from four important properties: relevance, inconsistency, sparseness, and heterogeneity,...
This book addresses several knowledge discovery problems on multi-sourced data where the theories, techniques, and methods in data cleaning, data...
This book consists of selected and peer-reviewed papers presented at 2022 4th International Conference on Big Data Engineering and Technology (BDET), held during April 22-24, 2022, in Singapore. As IT infrastructure and data management technologies have become critical assets and capabilities for today's enterprises, this book aims to be part of the effort in contributing to their development. In particular, the BDET conference series aims to provide the much needed forum for researchers and practitioners across the world who are actively engaged in advancing research and raising awareness of...
This book consists of selected and peer-reviewed papers presented at 2022 4th International Conference on Big Data Engineering and Technology (BDET), ...
In both the database and machine learning communities, data quality has become a serious issue which cannot be ignored. In this context, we refer to data with quality problems as “dirty data.” Clearly, for a given data mining or machine learning task, dirty data in both training and test datasets can affect the accuracy of results. Accordingly, this book analyzes the impacts of dirty data and explores effective methods for dirty data processing.
Although existing data cleaning methods improve data quality dramatically, the cleaning costs are still high. If we knew how dirty...
In both the database and machine learning communities, data quality has become a serious issue which cannot be ignored. In this context, we refer t...