ISBN-13: 9781119516040 / Angielski / Twarda / 2019 / 672 str.
ISBN-13: 9781119516040 / Angielski / Twarda / 2019 / 672 str.
Preface xiiiPreface to the Second Edition xvPreface to the First Edition xvii1 Data-Mining Concepts 11.1 Introduction 21.2 Data-Mining Roots 41.3 Data-Mining Process 61.4 From Data Collection to Data Preprocessing 101.5 Data Warehouses for Data Mining 151.6 From Big Data to Data Science 181.7 Business Aspects of Data Mining: Why a Data-Mining Project Fails? 221.8 Organization of This Book 261.9 Review Questions and Problems 281.10 References for Further Study 302 Preparing the Data 332.1 Representation of Raw Data 342.2 Characteristics of Raw Data 382.3 Transformation of Raw Data 402.4 Missing Data 432.5 Time-Dependent Data 442.6 Outlier Analysis 492.7 Review Questions and Problems 562.8 References for Further Study 593 Data Reduction 613.1 Dimensions of Large Data Sets 623.2 Features Reduction 643.3 Relief Algorithm 753.4 Entropy Measure for Ranking Features 773.5 Principal Component Analysis 803.6 Value Reduction 833.7 Feature Discretization: ChiMerge Technique 863.8 Case Reduction 903.9 Review Questions and Problems 933.10 References for Further Study 954 Learning from Data 974.1 Learning Machine 994.2 Statistical Learning Theory 1044.3 Types of Learning Methods 1104.4 Common Learning Tasks 1124.5 Support Vector Machines 1174.6 Semi-Supervised Support Vector Machines (S3VM) 1314.7 kNN: Nearest Neighbor Classifier 1344.8 Model Selection vs. Generalization 1384.9 Model Estimation 1424.10 Imbalanced Data Classification 1504.11 90% Accuracy ... Now What? 1544.12 Review Questions and Problems 1584.13 References for Further Study 1615 Statistical Methods 1655.1 Statistical Inference 1665.2 Assessing Differences in Data Sets 1685.3 Bayesian Inference 1725.4 Predictive Regression 1755.5 Analysis of Variance 1815.6 Logistic Regression 1845.7 Log-Linear Models 1855.8 Linear Discriminant Analysis 1895.9 Review Questions and Problems 1915.10 References for Further Study 1946 Decision Trees and Decision Rules 1976.1 Decision Trees 1996.2 C4.5 Algorithm: Generating a Decision Tree 2016.3 Unknown Attribute Values 2096.4 Pruning Decision Trees 2146.5 C4.5 Algorithm: Generating Decision Rules 2156.6 Cart Algorithm and Gini Index 2196.7 Limitations of Decision Trees and Decision Rules 2226.8 Review Questions and Problems 2256.9 References for Further Study 2297 Artificial Neural Networks 2317.1 Model of an Artificial Neuron 2337.2 Architectures of Artificial Neural Networks 2377.3 Learning Process 2397.4 Learning Tasks Using Anns 2437.5 Multilayer Perceptrons 2457.6 Competitive Networks and Competitive Learning 2557.7 Self-Organizing Maps 2597.8 Deep Learning 2647.9 Convolutional Neural Networks (CNNs) 2707.10 Review Questions and Problems 2737.11 References for Further Study 2768 Ensemble Learning 2798.1 Ensemble Learning Methodologies 2808.2 Combination Schemes for Multiple Learners 2858.3 Bagging and Boosting 2868.4 AdaBoost 2888.5 Review Questions and Problems 2908.6 References for Further Study 2939 Cluster Analysis 2959.1 Clustering Concepts 2969.2 Similarity Measures 2999.3 Agglomerative Hierarchical Clustering 3069.4 Partitional Clustering 3109.5 Incremental Clustering 3139.6 DBSCAN Algorithm 3179.7 BIRCH Algorithm 3209.8 Clustering Validation 3239.9 Review Questions and Problems 3289.10 References for Further Study 33310 Association Rules 33510.1 Market-Basket Analysis 33710.2 Algorithm Apriori 33810.3 From Frequent Itemsets to Association Rules 34010.4 Improving the Efficiency of the Apriori Algorithm 34210.5 Frequent Pattern Growth Method 34410.6 Associative-Classification Method 34610.7 Multidimensional Association Rule Mining 34910.8 Review Questions and Problems 35110.9 References for Further Study 35511 Web Mining and Text Mining 35711.1 Web Mining 35811.2 Web Content, Structure, and Usage Mining 36011.3 Hits and Logsom Algorithms 36211.4 Mining Path-Traversal Patterns 36811.5 PageRank Algorithm 37111.6 Recommender Systems 37411.7 Text Mining 37511.8 Latent Semantic Analysis 37911.9 Review Questions and Problems 38511.10 References for Further Study 38812 Advances in Data Mining 39112.1 Graph Mining 39212.2 Temporal Data Mining 40612.3 Spatial Data Mining 42212.4 Distributed Data Mining 42612.5 Correlation Does not Imply Causality! 43512.6 Privacy, Security, and Legal Aspects of Data Mining 44212.7 Cloud Computing Based on Hadoop and Map/Reduce 44912.8 Reinforcement Learning 45412.9 Review Questions and Problems 45912.10 References for Further Study 46113 Genetic Algorithms 46513.1 Fundamentals of Genetic Algorithms 46613.2 Optimization Using Genetic Algorithms 46813.3 A Simple Illustration of a Genetic Algorithm 47413.4 Schemata 48013.5 Traveling Salesman Problem 48313.6 Machine Learning Using Genetic Algorithms 48513.7 Genetic Algorithms for Clustering 49013.8 Review Questions and Problems 49313.9 References for Further Study 49414 Fuzzy Sets and Fuzzy Logic 49714.1 Fuzzy Sets 49814.2 Fuzzy Set Operations 50414.3 Extension Principle and Fuzzy Relations 50914.4 Fuzzy Logic and Fuzzy Inference Systems 51314.5 Multifactorial Evaluation 51814.6 Extracting Fuzzy Models from Data 52114.7 Data Mining and Fuzzy Sets 52614.8 Review Questions and Problems 52814.9 References for Further Study 53015 Visualization Methods 53315.1 Perception and Visualization 53415.2 Scientific Visualization and Information Visualization 53515.3 Parallel Coordinates 54215.4 Radial Visualization 54415.5 Visualization Using Self-Organizing Maps 54715.6 Visualization Systems for Data Mining 54915.7 Review Questions and Problems 55415.8 References for Further Study 555Appendix A: Information on Data Mining 559A.1 Data-Mining Journals 559A.2 Data-Mining Conferences 564A.3 Data-Mining Forums/Blogs 568A.4 Data Sets 570A.5 Comercially and Publicly Available Tools 574A.6 Web Site Links 583Appendix B: Data-Mining Applications 589B.1 Data Mining for Financial Data Analyses 589B.2 Data Mining for the Telecomunication Industry 593B.3 Data Mining for the Retail Industry 596B.4 Data Mining in Healthcare and Biomedical Research 599B.5 Data Mining in Science and Engineering 602B.6 Pitfalls of Data Mining 605Bibliography 607Index 633
MEHMED KANTARDZIC, PHD, is a Professor in the Department of Computer Engineering and Computer Science (CECS) at the University of Louisville, and is Director of the Data Mining Lab and CECS Graduate Programs. He is a member of IEEE, ISCA, KAS, WSEAS, IEE, and SPIE.
1997-2024 DolnySlask.com Agencja Internetowa