


ISBN-13: 9781119544081 / Angielski / Twarda / 2020 / 208 str.
ISBN-13: 9781119544081 / Angielski / Twarda / 2020 / 208 str.
1 Introduction 11.1 Why Managers Need to Know About Data Science 11.2 The New Age of Data Literacy 21.3 Data-Driven Development 31.4 How to Use this Book 42 The Business Side of Data Science 72.1 What Is Data Science? 72.1.1 What Data Scientists Do 72.1.2 History of Data Science 92.1.3 Data Science Roadmap 122.1.4 Demystifying the Terms: Data Science, Machine Learning, Statistics, and Business Intelligence 132.1.4.1 Machine Learning 132.1.4.2 Statistics 142.1.4.3 Business Intelligence 152.1.5 What Data Scientists Don't (Necessarily) Do 152.1.5.1 Working Without Data 162.1.5.2 Working with Data that Can't Be Interpreted 172.1.5.3 Replacing Subject Matter Experts 172.1.5.4 Designing Mathematical Algorithms 182.2 Data Science in an Organization 192.2.1 Types of Value Added 192.2.1.1 Business Insights 192.2.1.2 Intelligent Products 192.2.1.3 Building Analytics Frameworks 202.2.1.4 Offline Batch Analytics 212.2.2 One-Person Shops and Data Science Teams 212.2.3 Related Job Roles 222.2.3.1 Data Engineer 222.2.3.2 Data Analyst 222.2.3.3 Software Engineer 232.3 Hiring Data Scientists 252.3.1 Do I Even Need Data Science? 262.3.2 The Simplest Option: Citizen Data Scientists 272.3.3 The Harder Option: Dedicated Data Scientists 282.3.4 Programming, Algorithmic Thinking, and Code Quality 282.3.5 Hiring Checklist 312.3.6 Data Science Salaries 322.3.7 Bad Hires and Red Flags 322.3.8 Advice with Data Science Consultants 342.4 Management Failure Cases 362.4.1 Using Them as Devs 362.4.2 Inadequate Data 362.4.3 Using Them as Graph Monkeys 372.4.4 Nebulous Questions 372.4.5 Laundry Lists of Questions Without Prioritization 383 Working with Modern Data 413.1 Unstructured Data and Passive Collection 413.2 Data Types and Sources 423.3 Data Formats 433.3.1 CSV Files 433.3.2 JSON Files 443.3.3 XML and HTML 463.4 Databases 473.4.1 Relational Databases and Document Stores 483.4.2 Database Operations 493.5 Data Analytics Software Architectures 503.5.1 Shared Storage 513.5.2 Shared Relational Database 523.5.3 Document Store+Analytics RDB 523.5.4 Storage+Parallel Processing 534 Telling the Story, Summarizing Data 554.1 Choosing What to Measure 564.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture IsWorth a Thousand Numbers 584.3 Experiments, Correlation, and Causality 604.4 Summarizing One Number 624.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 634.5.1 Measuring Central Tendency 634.5.1.1 Mean 634.5.1.2 Median 644.5.1.3 Mode 654.5.2 Measuring Spread 654.5.2.1 Standard Deviation 654.5.2.2 Percentiles 664.5.3 Advanced Material: Managing Heavy Tails 674.6 Summarizing Two Numbers: Correlations and Scatterplots 684.6.1 Correlations 684.6.1.1 Pearson Correlation 714.6.1.2 Ordinal Correlations 714.6.2 Mutual Information 724.7 Advanced Material: Fitting a Line or Curve 724.7.1 Effects of Outliers 754.7.2 Optimization and Choosing Cost Functions 764.8 Statistics: How to Not Fool Yourself 774.8.1 The Central Concept: The p-Value 784.8.2 Reality Check: Picking a Null Hypothesis and Modeling Assumptions 804.8.3 Advanced Material: Parameter Estimation and Confidence Intervals 814.8.4 Advanced Material: Statistical TestsWorth Knowing 824.8.4.1 Chi-square-Test 834.8.4.2 T-test 834.8.4.3 Fisher's Exact Test 844.8.4.4 Multiple Hypothesis Testing 844.8.5 Bayesian Statistics 854.9 Advanced Material: Probability Distributions Worth Knowing 864.9.1 Probability Distributions: Discrete and Continuous 874.9.2 Flipping Coins: Bernoulli Distribution 894.9.3 Adding Coin Flips: Binomial Distribution 894.9.4 Throwing Darts: Uniform Distribution 914.9.5 Bell-Shaped Curves: Normal Distribution 914.9.6 Heavy Tails 101: Log-Normal Distribution 924.9.7 Waiting Around: Exponential Distribution and the Geometric Distribution 934.9.8 Time to Failure: Weibull Distribution 944.9.9 Counting Events: Poisson Distribution 955 Machine Learning 1015.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers 1025.1.1 Reality Check: Getting Labeled Data and Assuming Independence 1035.1.2 Feature Extraction and the Limitations of Machine Learning 1045.1.3 Overfitting 1055.1.4 Cross-Validation Strategies 1065.2 Measuring Performance 1075.2.1 Confusion Matrices 1085.2.2 ROC Curves 1085.2.3 Area Under the ROC Curve 1105.2.4 Selecting Classification Cutoffs 1105.2.5 Other Performance Metrics 1115.2.6 Lift Curves 1125.3 Advanced Material: Important Classifiers 1135.3.1 Decision Trees 1135.3.2 Random Forests 1155.3.3 Ensemble Classifiers 1165.3.4 Support Vector Machines 1165.3.5 Logistic Regression 1195.3.6 Lasso Regression 1215.3.7 Naive Bayes 1215.3.8 Neural Nets 1235.4 Structure of the Data: Unsupervised Learning 1245.4.1 The Curse of Dimensionality 1255.4.2 Principal Component Analysis and Factor Analysis 1255.4.2.1 Scree Plots and Understanding Dimensionality 1285.4.2.2 Factor Analysis 1285.4.2.3 Limitations of PCA 1295.4.3 Clustering 1295.4.3.1 Real-World Assessment of Clusters 1305.4.3.2 k-means Clustering 1315.4.3.3 Advanced Material: Other Clustering Algorithms 1325.4.3.4 Advanced Material: Evaluating Cluster Quality 1335.5 Learning as You Go: Reinforcement Learning 1355.5.1 Multi-Armed Bandits and Epsilon-Greedy Algorithms 1365.5.2 Markov Decision Processes and Q-Learning 1376 Knowing the Tools 1416.1 A Note on Learning to Code 1416.2 Cheat Sheet 1426.3 Parts of the Data Science Ecosystem 1436.3.1 Scripting Languages 1446.3.2 Technical Computing Languages 1456.3.2.1 Python's Technical Computing Stack 1456.3.2.2 R 1466.3.2.3 Matlab and Octave 1466.3.2.4 Mathematica 1476.3.2.5 SAS 1476.3.2.6 Julia 1476.3.3 Visualization 1476.3.3.1 Tableau 1486.3.3.2 Excel 1486.3.3.3 D3.js 1486.3.4 Databases 1486.3.5 Big Data 1496.3.5.1 Types of Big Data Technologies 1506.3.5.2 Spark 1516.3.6 Advanced Material: The Map-Reduce Paradigm 1516.4 Advanced Material: Database Query Crash Course 1536.4.1 Basic Queries 1536.4.2 Groups and Aggregations 1546.4.3 Joins 1566.4.4 Nesting Queries 1577 Deep Learning and Artificial Intelligence 1617.1 Overview of AI 1617.1.1 Don't Fear the Skynet: Strong and Weak AI 1617.1.2 System 1 and System 2 1627.2 Neural Networks 1647.2.1 What Neural Nets Can and Can't Do 1647.2.2 Enough Boilerplate: What's a Neural Net? 1657.2.3 Convolutional Neural Nets 1667.2.4 Advanced Material: Training Neural Networks 1677.2.4.1 Manual Versus Automatic Feature Extraction 1687.2.4.2 Dataset Sizes and Data Augmentation 1687.2.4.3 Batches and Epochs 1697.2.4.4 Transfer Learning 1707.2.4.5 Feature Extraction 1717.2.4.6 Word Embeddings 1717.3 Natural Language Processing 1727.3.1 The Great Divide: Language Versus Statistics 1727.3.2 Save Yourself Some Trouble: Consider Regular Expressions 1737.3.3 Software and Datasets 1747.3.4 Key Issue: Vectorization 1757.3.5 Bag-of-Words 1757.4 Knowledge Bases and Graphs 177Postscript 181Index 183
Field Cady, is a data scientist and author in the Seattle area. Most of his career has focused on consulting, for clients of all sizes in a range of industries. More recently he focused on using AI to mine scientific literature at the Allen Institute for Artificial Intelligence. His previous book, The Data Science Handbook, was published in 2017. His work has been covered in Wired, MIT Press and the Wall Street Journal among others.
1997-2026 DolnySlask.com Agencja Internetowa





