ISBN-13: 9781119549840 / Angielski / Twarda / 2019 / 608 str.
ISBN-13: 9781119549840 / Angielski / Twarda / 2019 / 608 str.
Foreword by Gareth James xixForeword by Ravi Bapna xxiPreface to the Python Edition xxiiiAcknowledgments xxviiPart I PreliminariesChapter 1 Introduction 31.1 What is Business Analytics? 31.2 What is Data Mining? 51.3 Data Mining and Related Terms 51.4 Big Data 61.5 Data Science 71.6 Why are There So Many Different Methods? 81.7 Terminology and Notation 91.8 Road Maps to This Book 11Chapter 2 Overview of the Data Mining Process 152.1 Introduction 152.2 Core Ideas in Data Mining 162.3 The Steps in Data Mining 192.4 Preliminary Steps 212.5 Predictive Power and Overfitting 342.6 Building a Predictive Model 402.7 Using Python for Data Mining on a Local Machine 442.8 Automating Data Mining Solutions 452.9 Ethical Practice in Data Mining 47Problems 56Part II Data Exploration and Dimension ReductionChapter 3 Data Visualization 613.1 Introduction 613.2 Data Examples 643.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 653.4 Multidimensional Visualization 743.5 Specialized Visualizations 883.6 Summary: Major Visualizations and Operations, by Data Mining Goal 93Problems 97Chapter 4 Dimension Reduction 994.1 Introduction 1004.2 Curse of Dimensionality 1004.3 Practical Considerations 1004.4 Data Summaries 1024.5 Correlation Analysis 1054.6 Reducing the Number of Categories in Categorical Variables 1064.7 Converting a Categorical Variable to a Numerical Variable 1084.8 Principal Components Analysis 1084.9 Dimension Reduction Using Regression Models 1194.10 Dimension Reduction Using Classification and Regression Trees 119Problems 120Part III Performance EvaluationChapter 5 Evaluating Predictive Performance 1255.1 Introduction 1265.2 Evaluating Predictive Performance 1265.3 Judging Classifier Performance 1315.4 Judging Ranking Performance 1445.5 Oversampling 149Problems 155Part IV Prediction and Classification MethodsChapter 6 Multiple Linear Regression 1616.1 Introduction 1626.2 Explanatory vs. Predictive Modeling 1626.3 Estimating the Regression Equation and Prediction 1646.4 Variable Selection in Linear Regression 169Appendix: Using Statmodels 179Problems 180Chapter 7 k-Nearest Neighbors (kNN) 1857.1 The k-NN Classifier (Categorical Outcome) 1857.2 k-NN for a Numerical Outcome 1937.3 Advantages and Shortcomings of k-NN Algorithms 195Problems 197Chapter 8 The Naive Bayes Classifier 1998.1 Introduction 199Example 1: Predicting Fraudulent Financial Reporting 2018.2 Applying the Full (Exact) Bayesian Classifier 2018.3 Advantages and Shortcomings of the Naive Bayes Classifier 210Problems 214Chapter 9 Classification and Regression Trees 2179.1 Introduction 2189.2 Classification Trees 2209.3 Evaluating the Performance of a Classification Tree 2289.4 Avoiding Overfitting 2329.5 Classification Rules from Trees 2389.6 Classification Trees for More Than Two Classes 2399.7 Regression Trees 2399.8 Improving Prediction: Random Forests and Boosted Trees 2439.9 Advantages and Weaknesses of a Tree 246Problems 248Chapter 10 Logistic Regression 25110.1 Introduction 25210.2 The Logistic Regression Model 25310.3 Example: Acceptance of Personal Loan 25510.4 Evaluating Classification Performance 26110.5 Logistic Regression for Multi-class Classification 26410.6 Example of Complete Analysis: Predicting Delayed Flights 269Appendix: Using Statmodels 278Problems 280Chapter 11 Neural Nets 28311.1 Introduction 28411.2 Concept and Structure of a Neural Network 28411.3 Fitting a Network to Data 28511.4 Required User Input 29711.5 Exploring the Relationship Between Predictors and Outcome 29911.6 Deep Learning 29911.7 Advantages and Weaknesses of Neural Networks 305Problems 306Chapter 12 Discriminant Analysis 30912.1 Introduction 31012.2 Distance of a Record from a Class 31112.3 Fisher's Linear Classification Functions 31412.4 Classification Performance of Discriminant Analysis 31712.5 Prior Probabilities 31812.6 Unequal Misclassification Costs 31912.7 Classifying More Than Two Classes 31912.8 Advantages and Weaknesses 322Problems 324Chapter 13 Combining Methods: Ensembles and Uplift Modeling 32713.1 Ensembles 32813.2 Uplift (Persuasion) Modeling 33413.3 Summary 340Problems 341Part V Mining Relationships among RecordsChapter 14 Association Rules and Collaborative Filtering 34514.1 Association Rules 34614.2 Collaborative Filtering 35714.3 Summary 368Problems 370Chapter 15 Cluster Analysis 37515.1 Introduction 37615.2 Measuring Distance Between Two Records 37915.3 Measuring Distance Between Two Clusters 38515.4 Hierarchical (Agglomerative) Clustering 38715.5 Non-Hierarchical Clustering: The k-Means Algorithm 395Problems 401Part VI Forecasting Time SeriesChapter 16 Handling Time Series 40716.1 Introduction 40816.2 Descriptive vs. Predictive Modeling 40916.3 Popular Forecasting Methods in Business 40916.4 Time Series Components 41016.5 Data-Partitioning and Performance Evaluation 415Problems 419Chapter 17 Regression-Based Forecasting 42317.1 A Model with Trend 42417.2 A Model with Seasonality 42917.3 A Model with Trend and Seasonality 43217.4 Autocorrelation and ARIMA Models 433Problems 442Chapter 18 Smoothing Methods 45118.1 Introduction 45218.2 Moving Average 45218.3 Simple Exponential Smoothing 45718.4 Advanced Exponential Smoothing 460Problems 464Part VII Data AnalyticsChapter 19 Social Network Analytics 47319.1 Introduction 47319.2 Directed vs. Undirected Networks 47519.3 Visualizing and Analyzing Networks 47619.4 Social Data Metrics and Taxonomy 48019.5 Using Network Metrics in Prediction and Classification 48519.6 Collecting Social Network Data with Python 49119.7 Advantages and Disadvantages 491Problems 494Chapter 20 Text Mining 49520.1 Introduction 49620.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words'' 49620.3 Bag-of-Words vs. Meaning Extraction at Document Level 49720.4 Preprocessing the Text 49820.5 Implementing Data Mining Methods 50620.6 Example: Online Discussions on Autos and Electronics 50620.7 Summary 510Problems 511Part VIII CasesChapter 21 Cases 51521.1 Charles Book Club 51521.2 German Credit 52221.3 Tayko Software Cataloger 52721.4 Political Persuasion 53121.5 Taxi Cancellations 53521.6 Segmenting Consumers of Bath Soap 53721.7 Direct-Mail Fundraising 54121.8 Catalog Cross-Selling 54421.9 Time Series Case: Forecasting Public Transportation Demand 546References 549Data Files Used in the Book 551Python Utilities Functions 555Index 565
GALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books.PETER C. BRUCE is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly).PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com.NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
1997-2024 DolnySlask.com Agencja Internetowa