ISBN-13: 9781119819455 / Angielski / Miękka / 2021 / 416 str.
ISBN-13: 9781119819455 / Angielski / Miękka / 2021 / 416 str.
Introduction xxiAssessment Test xxxChapter 1 History of Analytics and Big Data 1Evolution of Analytics Architecture Over the Years 3The New World Order 5Analytics Pipeline 6Data Sources 7Collection 8Storage 8Processing and Analysis 9Visualization, Predictive and Prescriptive Analytics 9The Big Data Reference Architecture 10Data Characteristics: Hot, Warm, and Cold 11Collection/Ingest 12Storage 13Process/Analyze 14Consumption 15Data Lakes and Their Relevance in Analytics 16What is a Data Lake? 16Building a Data Lake on AWS 19Step 1: Choosing the Right Storage - Amazon S3 is the Base 19Step 2: Data Ingestion - Moving the Data into the Data Lake 21Step 3: Cleanse, Prep, and Catalog the Data 22Step 4: Secure the Data and Metadata 23Step 5: Make Data Available for Analytics 23Using Lake Formation to Build a Data Lake on AWS 23Exam Objectives 24Objective Map 25Assessment Test 27References 29Chapter 2 Data Collection 31Exam Objectives 32AWS IoT 33Common Use Cases for AWS IoT 35How AWS IoT Works 36Amazon Kinesis 38Amazon Kinesis Introduction 40Amazon Kinesis Data Streams 40Amazon Kinesis Data Analytics 54Amazon Kinesis Video Streams 61AWS Glue 64Glue Data Catalog 66Glue Crawlers 68Authoring ETL Jobs 69Executing ETL Jobs 71Change Data Capture with Glue Bookmarks 71Use Cases for AWS Glue 72Amazon SQS 72Amazon Data Migration Service 74What is AWS DMS Anyway? 74What Does AWS DMS Support? 75AWS Data Pipeline 77Pipeline Definition 77Pipeline Schedules 78Task Runner 79Large-Scale Data Transfer Solutions 81AWS Snowcone 81AWS Snowball 82AWS Snowmobile 85AWS Direct Connect 86Summary 87Review Questions 88References 90Exercises & Workshops 91Chapter 3 Data Storage 93Introduction 94Amazon S3 95Amazon S3 Data Consistency Model 96Data Lake and S3 97Data Replication in Amazon S3 100Server Access Logging in Amazon S3 101Partitioning, Compression, and File Formats on S3 101Amazon S3 Glacier 103Vault 103Archive 104Amazon DynamoDB 104Amazon DynamoDB Data Types 105Amazon DynamoDB Core Concepts 108Read/Write Capacity Mode in DynamoDB 108DynamoDB Auto Scaling and Reserved Capacity 111Read Consistency and Global Tables 111Amazon DynamoDB: Indexing and Partitioning 113Amazon DynamoDB Accelerator 114Amazon DynamoDB Streams 115Amazon DynamoDB Streams - Kinesis Adapter 116Amazon DocumentDB 117Why a Document Database? 117Amazon DocumentDB Overview 119Amazon Document DB Architecture 120Amazon DocumentDB Interfaces 120Graph Databases and Amazon Neptune 121Amazon Neptune Overview 122Amazon Neptune Use Cases 123Storage Gateway 123Hybrid Storage Requirements 123AWS Storage Gateway 125Amazon EFS 127Amazon EFS Use Cases 130Interacting with Amazon EFS 132Amazon EFS Security Model 132Backing Up Amazon EFS 132Amazon FSx for Lustre 133Key Benefits of Amazon FSx for Lustre 134Use Cases for Lustre 135AWS Transfer for SFTP 135Summary 136Exercises 137Review Questions 140Further Reading 142References 142Chapter 4 Data Processing and Analysis 143Introduction 144Types of Analytical Workloads 144Amazon Athena 146Apache Presto 147Apache Hive 148Amazon Athena Use Cases and Workloads 149Amazon Athena DDL, DML, and DCL 150Amazon Athena Workgroups 151Amazon Athena Federated Query 153Amazon Athena Custom UDFs 154Using Machine Learning with Amazon Athena 154Amazon EMR 155Apache Hadoop Overview 156Amazon EMR Overview 157Apache Hadoop on Amazon EMR 158EMRFS 166Bootstrap Actions and Custom AMI 167Security on EMR 167EMR Notebooks 168Apache Hive and Apache Pig on Amazon EMR 169Apache Spark on Amazon EMR 174Apache HBase on Amazon EMR 182Apache Flink, Apache Mahout, and Apache MXNet 184Choosing the Right Analytics Tool 186Amazon Elasticsearch Service 188When to Use Elasticsearch 188Elasticsearch Core Concepts (the ELK Stack) 189Amazon Elasticsearch Service 191Amazon Redshift 192What is Data Warehousing? 192What is Redshift? 193Redshift Architecture 195Redshift AQUA 198Redshift Scalability 199Data Modeling in Redshift 205Data Loading and Unloading 213Query Optimization in Redshift 217Security in Redshift 221Kinesis Data Analytics 225How Does It Work? 226What is Kinesis Data Analytics for Java? 228Comparing Batch Processing Services 229Comparing Orchestration Options on AWS 230AWS Step Functions 230Comparing Different ETL Orchestration Options 230Summary 231Exam Essentials 232Exercises 232Review Questions 235References 237Recommended Workshops 237Amazon Athena Blogs 238Amazon Redshift Blogs 240Amazon EMR Blogs 241Amazon Elasticsearch Blog 241Amazon Redshift References and Further Reading 242Chapter 5 Data Visualization 243Introduction 244Data Consumers 245Data Visualization Options 246Amazon QuickSight 247Getting Started 248Working with Data 250Data Preparation 255Data Analysis 256Data Visualization 258Machine Learning Insights 261Building Dashboards 262Embedding QuickSight Objects into Other Applications 264Administration 265Security 266Other Visualization Options 267Predictive Analytics 270What is Predictive Analytics? 270The AWS ML Stack 271Summary 273Exam Essentials 273Exercises 274Review Questions 275References 276Additional Reading Material 276Chapter 6 Data Security 279Introduction 280Shared Responsibility Model 280Security Services on AWS 282AWS IAM Overview 285IAM User 285IAM Groups 286IAM Roles 287Amazon EMR Security 289Public Subnet 290Private Subnet 291Security Configurations 293Block Public Access 298VPC Subnets 298Security Options during Cluster Creation 299EMR Security Summary 300Amazon S3 Security 301Managing Access to Data in Amazon S3 301Data Protection in Amazon S3 305Logging and Monitoring with Amazon S3 306Best Practices for Security on Amazon S3 308Amazon Athena Security 308Managing Access to Amazon Athena 309Data Protection in Amazon Athena 310Data Encryption in Amazon Athena 311Amazon Athena and AWS Lake Formation 312Amazon Redshift Security 312Levels of Security within Amazon Redshift 313Data Protection in Amazon Redshift 315Redshift Auditing 316Redshift Logging 317Amazon Elasticsearch Security 317Elasticsearch Network Configuration 318VPC Access 318Accessing Amazon Elasticsearch and Kibana 319Data Protection in Amazon Elasticsearch 322Amazon Kinesis Security 325Managing Access to Amazon Kinesis 325Data Protection in Amazon Kinesis 326Amazon Kinesis Best Practices 326Amazon QuickSight Security 327Managing Data Access with Amazon QuickSight 327Data Protection 328Logging and Monitoring 329Security Best Practices 329Amazon DynamoDB Security 329Access Management in DynamoDB 329IAM Policy with Fine-Grained Access Control 330Identity Federation 331How to Access Amazon DynamoDB 332Data Protection with DynamoDB 332Monitoring and Logging with DynamoDB 333Summary 334Exam Essentials 334Exercises/Workshops 334Review Questions 336References and Further Reading 337Appendix Answers to Review Questions 339Chapter 1: History of Analytics and Big Data 340Chapter 2: Data Collection 342Chapter 3: Data Storage 343Chapter 4: Data Processing and Analysis 344Chapter 5: Data Visualization 346Chapter 6: Data Security 346Index 349
ASIF ABBASI has over 20 years of experience working in various Data & Analytics engineering, consulting and advisory roles with some of the largest customers across the globe to help them in their quest to become more data driven. Asif is the author of Learning Apache Spark 2.0 and is an AWS Certified Data Analytics & Machine Learning Specialist, AWS Certified Solutions Architect (Professional), Hortonworks Certified Hadoop Professional and Administrator, Certified Spark Developer, SAS Certified Predictive Modeler, and Sun Certified Enterprise Architect. Asif is also a Project Management Professional.
1997-2024 DolnySlask.com Agencja Internetowa