Preface.- 1 Prologue.- 2 Statistical Learning: Concepts.- 3 Statistical Learning: Practical Aspects.- 4 Logistic Regression.- 5 Lasso and Friends.- 6 Working with Text Data.- 7 Nearest Neighbors.- 8 The Naive Bayes Classifier.- 9 Trees.- 10 Random Forests.- 11 Boosting.- 12 Support Vector Machines.- 13 Feature Engineering.- 14 Neural Networks.- 15 Stacking.- Index.
Matthias Schonlau is a Professor in the Department of Statistics and Actuarial Science at the University of Waterloo, Canada. Prior to his academic career, he spent 14 years at the RAND Corporation, USA, the Max Planck Institute for Human Development in Berlin, Germany, the German Institute for Economic Analysis (DIW), the National Institute of Statistical Sciences, USA, and AT&T Labs Research, USA. He won the Humboldt Prize and was elected Fellow of the American Statistical Association. He has published more than 80 peer-reviewed articles and is also the lead author of the book Conducting Research Surveys via E-Mail and the Web (RAND Corporation).
This textbook provides an accessible overview of statistical learning methods and techniques, and includes case studies using the statistical software Stata. After introductory material on statistical learning concepts and practical aspects, each further chapter is devoted to a statistical learning algorithm or a group of related techniques. In particular, the book presents logistic regression, regularized linear models such as the Lasso, nearest neighbors, the Naive Bayes classifier, classification trees, random forests, boosting, support vector machines, feature engineering, neural networks, and stacking. It also explains how to construct n-gram variables from text data. Examples, conceptual exercises and exercises using software are featured throughout, together with case studies in Stata, mostly from the social sciences; true to the book’s goal to facilitate the use of modern methods of data science in the field. Although mainly intended for upper undergraduate and graduate students in the social sciences, given its applied nature, the book will equally appeal to readers from other disciplines, including the health sciences, statistics, engineering and computer science.