ISBN-13: 9783659477928 / Angielski / Miękka / 2014 / 132 str.
Most developed learning algorithms are designed for environments in which all the relevant data is stored at single computer site. Advances in network technology and the Internet as well as the growing size of data have contributed to the proliferation of distributed data mining. Therefore, previously developed learning algorithms for computations with such distributed databases require that all data stored in distributed locations must be transferred to a common site and recompiled as one complete and local dataset before the construction can take place. The danger in this transfer is obvious if the data itself is innately sensitive, or the necessary bandwidth to efficiently transmit the data to a single site is not available. In this book, new algorithms have been developed to preserve the privacy of the data and minimize the cost of communication among the database nodes by gathering statistical summaries at each distributed database and then passing messages describing those summaries between the participating sites. This is much more efficient than transferring the complete databases to a single site, join these databases, and then execute algorithms with this data.
Most developed learning algorithms are designed for environments in which all the relevant data is stored at single computer site. Advances in network technology and the Internet as well as the growing size of data have contributed to the proliferation of distributed data mining. Therefore, previously developed learning algorithms for computations with such distributed databases require that all data stored in distributed locations must be transferred to a common site and recompiled as one complete and local dataset before the construction can take place. The danger in this transfer is obvious if the data itself is innately sensitive, or the necessary bandwidth to efficiently transmit the data to a single site is not available. In this book, new algorithms have been developed to preserve the privacy of the data and minimize the cost of communication among the database nodes by gathering statistical summaries at each distributed database and then passing messages describing those summaries between the participating sites. This is much more efficient than transferring the complete databases to a single site, join these databases, and then execute algorithms with this data.