ISBN-13: 9783659419102 / Angielski / Miękka / 2013 / 152 str.
The booming growth of the World Wide Web has made more and more information available digitally at unprecedented rates and levels of popularity. Also, the Web itself can be considered unprecedented in the almost complete lack of coordination in its creation and in the diversity of backgrounds and motives of its participants. Each of these contributes in making exploratory data analysis hard. In particular, we will focus on one of the steps in exploratory data analysis that is the clustering phase. Clustering is the unsupervised classification of patterns into groups (clusters). In this book, we provide useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We describe three important applications of clustering algorithms in Information Retrieval: (1) Similarity Search for High Dimensional Data Points, with the purpose to find Near Duplicate Images; (2) Measuring Latent Variable in Social Sciences, with the aim to visualize Research Communities; and (3) Generative Model for Content Analysis of Natural Language Documents to detect Events.
The booming growth of the World Wide Web has made more and more information available digitally at unprecedented rates and levels of popularity. Also, the Web itself can be considered unprecedented in the almost complete lack of coordination in its creation and in the diversity of backgrounds and motives of its participants. Each of these contributes in making exploratory data analysis hard. In particular, we will focus on one of the steps in exploratory data analysis that is the clustering phase. Clustering is the unsupervised classification of patterns into groups (clusters). In this book, we provide useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We describe three important applications of clustering algorithms in Information Retrieval: (1) Similarity Search for High Dimensional Data Points, with the purpose to find Near Duplicate Images; (2) Measuring Latent Variable in Social Sciences, with the aim to visualize Research Communities; and (3) Generative Model for Content Analysis of Natural Language Documents to detect Events.