Acknowledgments xiAbout the Author xii1 Introduction to the World of Big Data 11.1 Understanding Big Data 11.2 Evolution of Big Data 21.3 Failure of Traditional Database in Handling Big Data 31.4 3 Vs of Big Data 41.5 Sources of Big Data 71.6 Different Types of Data 81.7 Big Data Infrastructure 111.8 Big Data Life Cycle 121.9 Big Data Technology 181.10 Big Data Applications 211.11 Big Data Use Cases 21Chapter 1 Refresher 242 Big Data Storage Concepts 312.1 Cluster Computing 322.2 Distribution Models 372.3 Distributed File System 432.4 Relational and Non-Relational Databases 432.5 Scaling Up and Scaling Out Storage 47Chapter 2 Refresher 483 NoSQL Database 533.1 Introduction to NoSQL 533.2 Why NoSQL 543.3 CAP Theorem 543.4 ACID 563.5 BASE 563.6 Schemaless Databases 573.7 NoSQL (Not Only SQL) 573.8 Migrating from RDBMS to NoSQL 76Chapter 3 Refresher 774 Processing, Management Concepts, and Cloud Computing 83Part I: Big Data Processing and Management Concepts 834.1 Data Processing 834.2 Shared Everything Architecture 854.3 Shared-Nothing Architecture 864.4 Batch Processing 884.5 Real-Time Data Processing 884.6 Parallel Computing 894.7 Distributed Computing 904.8 Big Data Virtualization 90Part II: Managing and Processing Big Data in Cloud Computing 934.9 Introduction 934.10 Cloud Computing Types 944.11 Cloud Services 954.12 Cloud Storage 964.13 Cloud Architecture 101Chapter 4 Refresher 1035 Driving Big Data with Hadoop Tools and Technologies 1115.1 Apache Hadoop 1115.2 Hadoop Storage 1145.3 Hadoop Computation 1195.4 Hadoop 2.0 1295.5 HBASE 1385.6 Apache Cassandra 1415.7 SQOOP 1415.8 Flume 1435.9 Apache Avro 1445.10 Apache Pig 1455.11 Apache Mahout 1465.12 Apache Oozie 1465.13 Apache Hive 1495.14 Hive Architecture 1515.15 Hadoop Distributions 152Chapter 5 Refresher 1536 Big Data Analytics 1616.1 Terminology of Big Data Analytics 1616.2 Big Data Analytics 1626.3 Data Analytics Life Cycle 1666.4 Big Data Analytics Techniques 1706.5 Semantic Analysis 1756.6 Visual analysis 1786.7 Big Data Business Intelligence 1786.8 Big Data Real-Time Analytics Processing 1806.9 Enterprise Data Warehouse 181Chapter 6 Refresher 1827 Big Data Analytics with Machine Learning 1877.1 Introduction to Machine Learning 1877.2 Machine Learning Use Cases 1887.3 Types of Machine Learning 189Chapter 7 Refresher 1968 Mining Data Streams and Frequent Itemset 2018.1 Itemset Mining 2018.2 Association Rules 2068.3 Frequent Itemset Generation 2108.4 Itemset Mining Algorithms 2118.5 Maximal and Closed Frequent Itemset 2298.6 Mining Maximal Frequent Itemsets: the GenMax Algorithm 2338.7 Mining Closed Frequent Itemsets: the Charm Algorithm 2368.8 CHARM Algorithm Implementation 2368.9 Data Mining Methods 2398.10 Prediction 2408.11 Important Terms Used in Bayesian Network 2418.12 Density Based Clustering Algorithm 2498.13 DBSCAN 2498.14 Kernel Density Estimation 2508.15 Mining Data Streams 2548.16 Time Series Forecasting 2559 Cluster Analysis 2599.1 Clustering 2599.2 Distance Measurement Techniques 2619.3 Hierarchical Clustering 2639.4 Analysis of Protein Patterns in the Human Cancer-Associated Liver 2669.5 Recognition Using Biometrics of Hands 2679.6 Expectation Maximization Clustering Algorithm 2749.7 Representative-Based Clustering 2779.8 Methods of Determining the Number of Clusters 2779.9 Optimization Algorithm 2849.10 Choosing the Number of Clusters 2889.11 Bayesian Analysis of Mixtures 2909.12 Fuzzy Clustering 2909.13 Fuzzy C-Means Clustering 29110 Big Data Visualization 29310.1 Big Data Visualization 29310.2 Conventional Data Visualization Techniques 29410.3 Tableau 29710.4 Bar Chart in Tableau 30910.5 Line Chart 31010.6 Pie Chart 31110.7 Bubble Chart 31210.8 Box Plot 31310.9 Tableau Use Cases 31310.10 Installing R and Getting Ready 31810.11 Data Structures in R 32110.12 Importing Data from a File 33510.13 Importing Data from a Delimited Text File 33610.14 Control Structures in R 33710.15 Basic Graphs in R 341Index 347
BALAMURUGAN BALUSAMY, PHD, is a Professor with the School of Computing Science and Engineering at Galgotias University, Greater Noida, IndiaNANDHINI ABIRAMI. R is an IT Consultant and Research Scholar at VIT University in Vellore.SEIFEDINE KADRY, PhD, is a Professor of Data Science at the Faculty of Applied Computing and Technology at Noroff University College, Kristiansand, Norway.AMIR H. GANDOMI, PHD, is a Professor of Data Science at the Faculty of Engineering & Information Technology, University of Technology Sydney, Australia.