ISBN-13: 9781119741756 / Angielski / Miękka / 2021 / 304 str.
ISBN-13: 9781119741756 / Angielski / Miękka / 2021 / 304 str.
Introduction xixPart I Motivation for Ethical Data Science and Background Knowledge 1Chapter 1 Responsible Data Science 3The Optum Disaster 4Jekyll and Hyde 5Eugenics 7Galton, Pearson, and Fisher 7Ties between Eugenics and Statistics 7Ethical Problems in Data Science Today 9Predictive Models 10From Explaining to Predicting 10Predictive Modeling 11Setting the Stage for Ethical Issues to Arise 12Classic Statistical Models 12Black-Box Methods 14Important Concepts in Predictive Modeling 19Feature Selection 19Model-Centric vs. Data-Centric Models 20Holdout Sample and Cross-Validation 20Overfitting 21Unsupervised Learning 22The Ethical Challenge of Black Boxes 23Two Opposing Forces 24Pressure for More Powerful AI 24Public Resistance and Anxiety 24Summary 25Chapter 2 Background: Modeling and the Black-Box Algorithm 27Assessing Model Performance 27Predicting Class Membership 28The Rare Class Problem 28Lift and Gains 28Area Under the Curve 29AUC vs. Lift (Gains) 31Predicting Numeric Values 32Goodness-of-Fit 32Holdout Sets and Cross-Validation 33Optimization and Loss Functions 34Intrinsically Interpretable Models vs. Black-Box Models 35Ethical Challenges with Interpretable Models 38Black-Box Models 39Ensembles 39Nearest Neighbors 41Clustering 41Association Rules 42Collaborative Filters 42Artificial Neural Nets and Deep Neural Nets 43Problems with Black-Box Predictive Models 45Problems with Unsupervised Algorithms 47Summary 48Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49AI and Intentional Consequences by Design 50Deepfakes 50Supporting State Surveillance and Suppression 51Behavioral Manipulation 52Automated Testing to Fine-Tune Targeting 53AI and Unintended Consequences 55Healthcare 56Finance 57Law Enforcement 58Technology 60The Legal and Regulatory Landscape around AI 61Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64Trends in Emerging Law and Policy Related to AI 66Summary 69Part II The Ethical Data Science Process 71Chapter 4 The Responsible Data Science Framework 73Why We Keep Building Harmful AI 74Misguided Need for Cutting-Edge Models 74Excessive Focus on Predictive Performance 74Ease of Access and the Curse of Simplicity 76The Common Cause 76The Face Thieves 78An Anatomy of Modeling Harms 79The World: Context Matters for Modeling 80The Data: Representation Is Everything 83The Model: Garbage In, Danger Out 85Model Interpretability: Human Understanding for Superhuman Models 86Efforts Toward a More Responsible Data Science 89Principles Are the Focus 90Nonmaleficence 90Fairness 90Transparency 91Accountability 91Privacy 92Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92Justification 94Compilation 94Preparation 95Modeling 96Auditing 96Summary 97Chapter 5 Model Interpretability: The What and the Why 99The Sexist Résumé Screener 99The Necessity of Model Interpretability 101Connections Between Predictive Performance and Interpretability 103Uniting (High) Model Performance and Model Interpretability 105Categories of Interpretability Methods 107Global Methods 107Local Methods 113Real-World Successes of Interpretability Methods 113Facilitating Debugging and Audit 114Leveraging the Improved Performance of Black-Box Models 116Acquiring New Knowledge 116Addressing Critiques of Interpretability Methods 117Explanations Generated by Interpretability Methods Are Not Robust 118Explanations Generated by Interpretability Methods Are Low Fidelity 120The Forking Paths of Model Interpretability 121The Four-Measure Baseline 122Building Our Own Credit Scoring Model 124Using Train-Test Splits 125Feature Selection and Feature Engineering 125Baseline Models 127The Importance of Making Your Code Work for Everyone 129Execution Variability 129Addressing Execution Variability with Functionalized Code 130Stochastic Variability 130Addressing Stochastic Variability via Resampling 130Summary 133Part III EDS in Practice 135Chapter 6 Beginning a Responsible Data Science Project 137How the Responsible Data Science Framework Addresses the Common Cause 138Datasets Used 140Regression Datasets--Communities and Crime 140Classification Datasets--COMPAS 140Common Elements Across Our Analyses 141Project Structure and Documentation 141Project Structure for the Responsible DataScience Framework: Everything in Its Place 142Documentation: The Responsible Thing to Do 145Beginning a Responsible Data Science Project 151Communities and Crime (Regression) 151Justification 151Compilation 154Identifying Protected Classes 157Preparation--Data Splitting and Feature Engineering 159Datasheets 161COMPAS (Classification) 164Justification 164Compilation 166Identifying Protected Classes 168Preparation 169Summary 172Chapter 7 Auditing a Responsible Data Science Project 173Fairness and Data Science in Practice 175The Many Different Conceptions of Fairness 175Different Forms of Fairness Are Trade-Offs with Each Other 177Quantifying Predictive Fairness Within a Data Science Project 179Mitigating Bias to Improve Fairness 185Preprocessing 185In-processing 186Postprocessing 186Classification Example: COMPAS 187Prework: Code Practices, Modeling, and Auditing 187Justification, Compilation, and Preparation Review 189Modeling 191Auditing 200Per-Group Metrics: Overall 200Per-Group Metrics: Error 202Fairness Metrics 204Interpreting Our Models: Why Are They Unfair? 207Analysis for Different Groups 209Bias Mitigation 214Preprocessing: Oversampling 214Postprocessing: Optimizing ThresholdsAutomatically 218Postprocessing: Optimizing Thresholds Manually 219Summary 223Chapter 8 Auditing for Neural Networks 225Why Neural Networks Merit Their Own Chapter 227Neural Networks Vary Greatly in Structure 227Neural Networks Treat Features Differently 229Neural Networks Repeat Themselves 231A More Impenetrable Black Box 232Baseline Methods 233Representation Methods 233Distillation Methods 234Intrinsic Methods 235Beginning a Responsible Neural Network Project 236Justification 236Moving Forward 239Compilation 239Tracking Experiments 241Preparation 244Modeling 245Auditing 247Per-Group Metrics: Overall 247Per-Group Metrics: Unusual Definitions of "False Positive" 248Fairness Metrics 249Interpreting Our Models: Why Are They Unfair? 252Bias Mitigation 253Wrap-Up 255Auditing Neural Networks for Natural Language Processing 258Identifying and Addressing Sources of Bias in NLP 258The Real World 259Data 260Models 261Model Interpretability 262Summary 262Chapter 9 Conclusion 265How Can We Do Better? 267The Responsible Data Science Framework 267Doing Better As Managers 269Doing Better As Practitioners 270A Better Future If We Can Keep It 271Index 273
GRANT FLEMING is a Data Scientist at Elder Research Inc. His professional focus is on machine learning for social science applications, model interpretability, civic technology, and building software tools for reproducible data science.PETER BRUCE is the Senior Learning Officer at Elder Research, Inc., author of several best-selling texts on data science, and Founder of the Institute for Statistics Education at Statistics.com, an Elder Research Company.
1997-2024 DolnySlask.com Agencja Internetowa