Introduction 1
Part 1: Getting Started with Data Lakes 5
Chapter 1: Jumping into the Data Lake 7
Chapter 2: Planning Your Day (and the Next Decade) at the Data Lake 25
Chapter 3: Break Out the Life Vests: Tackling Data Lake Challenges 49
Part 2: Building the Docks, Avoiding the Rocks 65
Chapter 4: Imprinting Your Data Lake on a Reference Architecture 67
Chapter 5: Anybody Hungry? Ingesting and Storing Raw Data in Your Bronze Zone 97
Chapter 6: Your Data Lake's Water Treatment Plant: The Silver Zone 121
Chapter 7: Bottling Your Data Lake Water in the Gold Zone 139
Chapter 8: Playing in the Sandbox 151
Chapter 9: Fishing in the Data Lake 159
Chapter 10: Rowing End-to-End across the Data Lake 169
Part 3: Evaporating the Data Lake into the Cloud 187
Chapter 11: A Cloudy Day at the Data Lake 189
Chapter 12: Building Data Lakes in Amazon Web Services 199
Chapter 13: Building Data Lakes in Microsoft Azure 217
Part 4: Cleaning Up the Polluted Data Lake 243
Chapter 14: Figuring Out If You Have a Data Swamp Instead of a Data Lake 245
Chapter 15: Defining Your Data Lake Remediation Strategy 259
Chapter 16: Refilling Your Data Lake 283
Part 5: Making Trips to the Data Lake a Tradition 297
Chapter 17: Checking Your GPS: The Data Lake Road Map 299
Chapter 18: Booking Future Trips to the Data Lake 325
Part 6: The Part of Tens 333
Chapter 19: Top Ten Reasons to Invest in Building a Data Lake 335
Chapter 20: Ten Places to Get Help for Your Data Lake 341
Chapter 21: Ten Differences between a Data Warehouse and a Data Lake 345
Index 351
Alan Simon is the managing principal of Thinking Helmet, Inc., the author of 32 books on business technology, and a consultant who's worked with enterprise and government organizations. His professional focus is business intelligence, analytics, and data warehousing. He also teaches university courses in his specialty areas.