Introduction xixChapter 1 Data Sources 1Data Sources 1Tools for Connecting to Data Sources and Editing SQL 2Relational Databases 3Dimensional Data Warehouses 7Asking Questions About the Data Source 9Introduction to the Farmer's Market Database 11A Note on Machine Learning Dataset Terminology 12Exercises 13Chapter 2 The SELECT Statement 15The SELECT Statement 15The Fundamental Syntax Structure of a SELECT Query 16Selecting Columns and Limiting the Number of Rows Returned 16The ORDER BY Clause: Sorting Results 18Introduction to Simple Inline Calculations 20More Inline Calculation Examples: Rounding 22More Inline Calculation Examples: Concatenating Strings 24Evaluating Query Output 26SELECT Statement Summary 29Exercises Using the Included Database 30Chapter 3 The WHERE Clause 31The WHERE Clause 31Filtering SELECT Statement Results 32Filtering on Multiple Conditions 34Multi-Column Conditional Filtering 40More Ways to Filter 41BETWEEN 41IN 42LIKE 43IS NULL 44A Warning About Null Comparisons 44Filtering Using Subqueries 46Exercises Using the Included Database 47Chapter 4 CASE Statements 49CASE Statement Syntax 50Creating Binary Flags Using CASE 52Grouping or Binning Continuous Values Using CASE 53Categorical Encoding Using CASE 56CASE Statement Summary 59Exercises Using the Included Database 60Chapter 5 SQL JOINs 61Database Relationships and SQL JOINs 61A Common Pitfall when Filtering Joined Data 71JOINs with More than Two Tables 74Exercises Using the Included Database 76Chapter 6 Aggregating Results for Analysis 79GROUP BY Syntax 79Displaying Group Summaries 80Performing Calculations Inside Aggregate Functions 84MIN and MAX 88COUNT and COUNT DISTINCT 90Average 91Filtering with HAVING 93CASE Statements Inside Aggregate Functions 94Exercises Using the Included Database 96Chapter 7 Window Functions and Subqueries 97ROW NUMBER 98RANK and DENSE RANK 101NTILE 102Aggregate Window Functions 103LAG and LEAD 108Exercises Using the Included Database 111Chapter 8 Date and Time Functions 113Setting datetime Field Values 114EXTRACT and DATE_PART 115DATE_ADD and DATE_SUB 116DATEDIFF 118TIMESTAMPDIFF 119Date Functions in Aggregate Summaries and Window Functions 119Exercises 126Chapter 9 Exploratory Data Analysis with SQL 127Demonstrating Exploratory Data Analysis with SQL 128Exploring the Products Table 128Exploring Possible Column Values 131Exploring Changes Over Time 134Exploring Multiple Tables Simultaneously 135Exploring Inventory vs. Sales 138Exercises 142Chapter 10 Building SQL Datasets for Analytical Reporting 143Thinking Through Analytical Dataset Requirements 144Using Custom Analytical Datasets in SQL:CTEs and Views 149Taking SQL Reporting Further 153Exercises 157Chapter 11 More Advanced Query Structures 159UNIONs 159Self-Join to Determine To-Date Maximum 163Counting New vs. Returning Customers by Week 167Summary 171Exercises 171Chapter 12 Creating Machine Learning Datasets Using SQL 173Datasets for Time Series Models 174Datasets for Binary Classification 176Creating the Dataset 178Expanding the Feature Set 181Feature Engineering 185Taking Things to the Next Level 189Exercises 189Chapter 13 Analytical Dataset Development Examples 191What Factors Correlate with Fresh Produce Sales? 191How Do Sales Vary by Customer Zip Code,Market Distance, and Demographic Data? 211How Does Product Price Distribution AffectMarket Sales? 217Chapter 14 Storing and Modifying Data 229Storing SQL Datasets as Tables and Views 229Adding a Timestamp Column 232Inserting Rows and Updating Values in Database Tables 233Using SQL Inside Scripts 236In Closing 237Exercises 238Appendix Answers to Exercises 239Index 255
RENÉE M. P. TEATE is the Director of Data Science at HelioCampus, a higher ed tech startup based in the Washington, DC area. She prepares datasets with SQL, develops predictive models with Python, and designs interactive dashboards in Tableau for university decision-makers. She created the "Becoming a Data Scientist" podcast, helped build the data science learning community on Twitter, and is a sought-after speaker at industry conferences.