About This Book xiiiForeword xxiIntroduction xxvStage 1 Source (aka Siloed Data) 1Chapter 1 Starting with Source Data 3Common Options for Analyzing Source Data 4Chapter 2 The Need to Replicate Source Data 11Replicate Sources 12Create Read-Only Access 14Chapter 3 Source Data Best Practices 15Keep a Complexity Wiki Page 15Snippet Dictionary 16Use a BI Product 17Double Check Results 18Keep Short Dashboards 19Design Before Building 20Stage 2 Data Lake (aka Data Combined) 23Chapter 4 Why Build a Data Lake? 25What Is a Data Lake? 26Reasons to Build a Data Lake Summarized 27Chapter 5 Choosing an Engine for the Data Lake 33Modern Columnar Warehouse Engines 35Modern Warehouse Engine Products 38Database Engines 41Recommendation 42Chapter 6 Extract and Load (EL) Data 45ETL versus ELT 46EL/ETL Vendors 48Extract Options 49Load Options 51Multiple Schemas 52Other Extract and Load Routes 53Chapter 7 Data Lake Security 55Access in Central Place 56Permission Tiers 57Chapter 8 Data Lake Maintenance 59Why SQL? 60Data Sources 61Performance 64Upgrade Snippets to Views 68Stage 3 Data Warehouse (aka the Single Source of Truth) 69Chapter 9 The Power of Layers and Views 75Make Readable Views 77Layer Views on Views 78Start with a Single View 81Chapter 10 Staging Schemas 83Orient to the Schemas 84Pick a Table and Clean It 85Other Staging Modeling Considerations 98Building on Top of Staging Schemas 106Chapter 11 Model Data with dbt 111Version Control 111Modularity and Reusability 112Package Management 112Organizing Files 113Macros 113Incremental Tables 114Testing 115Chapter 12 Deploy Modeling Code 119Branch Using Version Control Software 119Commit Message 120Test Locally 120Code Review 121Schedule Runs 122Chapter 13 Implementing the Data Warehouse 123Manage Dependencies 124Combine Tables Within Schemas 126Combine Tables Across Schemas 128Keep the Grain Consistent 130Create Business Metrics 131Keeping Accurate History 133Chapter 14 Managing Data Access 135How to Secure Sensitive Data in the Data Warehouse 137How to Secure Sensitive Data in a BI Tool 140Chapter 15 Maintaining the Source of Truth 143Track New Metrics 144Deprecate Old Metrics 147Deprecate Old Schemas 149Resolve Conflicting Numbers 150Handling Ongoing Requests and Ongoing Feedback 151Updating Modeling Code 152Manage Access 153Tuning to Optimize 156Code Review All Modeling 157Maintenance Checklist 158Stage 4 Data Marts (aka Data Democratized) 161Chapter 16 Data Mart Implementation 167Views on the Data Warehouse 167Segment Tables 168Access Update 169Chapter 17 Data Mart Maintenance 171Educate Team 172Identifies Issues 172Identify New Needs 176Help Track Success 176Chapter 18 Modern versus Traditional Data Stacks: What's Changed? 177What's Changed? 177Chapter 19 Row-versusColumn-OrientedDatabase 181Row-OrientedDatabases 182Column-OrientedDatabases 184Summary 190Chapter 20 Style Guide Example 191Simplify 192Clean 194Naming Conventions 195Share It 197Chapter 21 Building an SST Example 199First Attempt--Same Tables with Prefixes 199Second Attempt--Operational Schema (Source Agnostic) 205Third Attempt--Application Separate, Other Sources Smashed 207Less Planning, More Implementing 209Acknowledgments and Contributions 211Index 213
MATT DAVID is the Product Marketing Manager for Platform Data at Atlassian. He formerly worked at Chartio as the Head of Data and before that at Udacity as Product Lead for the School of Data Science.DAVE FOWLER is Head of Analytics and Visualization at Atlassian and Founder of Chartio. He has worked in business intelligence for over ten years. His professional focus is on enabling anyone and everyone to explore and understand their data.