"The present book is built as an accessible, yet thorough introduction to data analysis using Python as programming environment. ... The style of the book and textbook-like presentation of concepts recommend it as a good starting point for novices who wish either to understand more about data analysis or wish to learn Python through meaningful examples." (Irina Ioana Mohorianu, zbMATH 1393.68002, 2018)
Table of Contents
1. Introduction
How to use this book
Installing iPython Notebook
What is iPython notebook?
What is Anaconda?
Getting Started
Getting the datasets for the workbook’s exercises
2. Getting Data into and out of Python
Loading Data from CSV Files
Saving Data to CSV
Loading Data from Excel Files
Saving Data to Excel Files
Combining Data from Multiple Excel Files:
Loading Data from SQL
Saving Data to SQL
Random Numbers and Creating Random Data
3. Preparing Data is Half the Battle
Cleaning Data
Calculating and Removing Outliers
Missing Data in Pandas Dataframes
Filtering Inappropriate Values
Finding Duplicate Rows
Removing Punctuation from Column Contents
Removing Whitespace from Column Contents
Standardizing Dates
Standardizing Text like SSN’s, Phone #’s and Zip Codes
Creating New Variables
Binning Data
Applying Function to Groups, Bins and Columns
Ranking Rows of Data
Create a Column Based on a Conditional
Making New Columns Using Functions
Converting String Categories to Numeric Variables
Organizing the Data
Removing and Adding Columns
Selecting Columns
Change Column Name
Setting Column Names to Lower Case
Finding Matching Rows
Filter Rows Based on Conditions:
Selecting Rows Based on Conditions
Random Sampling Dataframe
4. Finding the Meaning
Computing aggregate statistics
Computing Aggregate Statistics on Matching Rows
Sorting Data
Correlation
Regression
Regression without Intercept
Basic Pivot Table
Random Sampling Dataframe
Selecting Pandas DataFrame Rows Based on Conditions
Distribution Analysis
Categorical Variable Analysis
Time Series Analysis
5. Visualizing Data
Data Quality Report
Graph a Dataset - Line Plot
Graph a Dataset - Bar Plot
Graph a Dataset - Box Plot
Graph a Dataset - Histogram
Graph a Dataset - Pie Chart
Graph a Dataset - Scatter Plot
Plotting w/ Image
Plotting Data on a Map with Basemap
Plotting a Gantt Chart
Setting ticks, labels & grids
Adding legends & annotations
Moving Spines to the Center
6. Practice Problems
Pivot Exercise 1
Pivot Exercise 2
Pivot Exercise 2
Pivot Exercise 3
Legend
Regression Exercise 1
Regression Exercise 2
Regression Exercise 3
Analysis Project
Notes
AJ Henley is teaching courses on data analysis using Python, Java and more. He is a technology educator with over 20 years experience as a developer, designer and systems engineer. He is an instructor at Howard University and Montgomery College.
Dave Wolf is a certified Project Management Professional (PMP) with over twenty years' experience as a software developer, analyst and trainer. His latest projects include collaboratively developing training materials and programming bootcamps for Java and Python.
Get started using Python in data analysis with this compact practical guide. This book includes three exercises and a case study on getting data in and out of Python code in the right format. Learn Data Analysis with Python also helps you discover meaning in the data using analysis and shows you how to visualize it.
Each lesson is, as much as possible, self-contained to allow you to dip in and out of the examples as your needs dictate. If you are already using Python for data analysis, you will find a number of things that you wish you knew how to do in Python. You can then take these techniques and apply them directly to your own projects.
If you aren’t using Python for data analysis, this book takes you through the basics at the beginning to give you a solid foundation in the topic. As you work your way through the book you will have a better of idea of how to use Python for data analysis when you are finished.