ISBN-13: 9783319644097 / Angielski / Twarda / 2018 / 367 str.
ISBN-13: 9783319644097 / Angielski / Twarda / 2018 / 367 str.
This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:
- A treatment of random variables and expectations dealing primarily with the discrete case.
- A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.- A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.- A chapter dealing with classification, explaining why it's useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.- A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.- A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis. - A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know. Instructor resources include a full set of model solutions for all problems, and an Instructor's Manual with accompanying presentation slides.
1 Notation and conventions 9
1.0.1 Background Information........................................................................ 10
1.1 Acknowledgements................................................................................................. 11I Describing Datasets
2 First Tools for Looking at Data 13
2.1 Datasets....................................................................................
................................... 13 2.2 What’s Happening? - Plotting Data................................................................. 152.2.1 Bar< Charts.................................................................................................... 16
2.2.2 Histograms................................................................................................... 16
2.2.3 How to Make Histograms...................................................................... 17
2.2.4 Conditional Histograms.......................................................................... 19
2.3 Summarizing 1D Data.................................
........................................................... 192.3.1 The Mean...................................................................................................... 20
2.3.2 Standard Deviation................................................................................... 222.3.3 Computing Mean and Standard Deviation Online...................... 26
2.3.4 Variance......................................................................................................... 26
2.3.5 The Median.................................................................................................. 27
2.3.6 Interqu
artile Range.................................................................................. 292.3.7 Using Summaries Sensibly.................................................................... 30
2.4 Plots and Summaries............................................................................................. 31
2.4.1 Some Properties of Histograms.......................................................... 312.4.2 Standard Coordinates and Normal Data......................................... 34
2.4.3 Box Plots....................................................................................................... 38
2.5 Whose is bigger? Inves
tigating Australian Pizzas...................................... 392.6 You should.................................................................................................................. 43
2.6.1 remember these definitions:................................................................. 43
2.6.2 remember these terms............................................................................ 43
2.6.3 remember these facts:............................................................................. 43
2.6.4 be able to...................................................................................................... 43
3 Looking at Relationships
473.1 Plotting 2D Data...................................................................................................... 47
3.1.1
3.1.2 Series...............................
............................................................................... 513.1.3 Scatter Plots for Spatial Data.............................................................. 53
3.1.4 Exposing Relationships with Scatter Plots..................................... 54
3.2 Correlation.................................................................................................................. 57
3.2.1 The Correlation Coefficient................................................................... 60
3.2.2 Using Correlation to Predict................................................................ 643.2.3 Confusion caused by co
rrelation......................................................... 681
3.4 You should.................................................................................................................. 72
3.4.1 remember these definitions:................................................................. 72
3.4.2 remember these terms............................................................................ 72
remember these facts: . .
. . .
3.4.4
use these procedures: . . .
. . .
3.4.5
be able to: . . . . . . . . .
. . .
. . . . . . . . . . . . . . . . . 72
. . . . . . . . . . . . . . . . . 72
. . . . . . . . . . . . . . . . . 72
II Probability &
nbsp; 784 Basic ideas in probability 79
4.1 Experiments, Outcomes and Probability....................................................... 79
4.1.1 Outcomes and Probability...................................................................... 794.2 Events.................
.......................................................................................................... 814.2.1 Computing Event Probabilities by Counting Outcomes............. 83
4.2.2 The Probability of Events...................................................................... 87
4.2.3 Computing Probabilities by Reasoning about Sets...................... 89
4.3 Independence............................................................................................................ 92
4.3.1 Example: Airline Overbooking............................................................ 96
4.4 Conditional ...........................................
............. 994.4.1 Evaluating Conditional Probabilities.............................................. 100
4.4.2 Detecting Rare Events is Hard......................................................... 104
4.4.3 Conditional Probability and Various Forms of Independence . 106 4.4.4 The Prosecutor’s Fallacy 1084.4.5 Example: The Monty Hall Problem................................................ 110
4.5 Extra Worked Examples.................................................................................... 1124.5.1 Outcomes and Probability........................................
........................... 1124.5.2 Events.......................................................................................................... 114
4.5.3 Independence........................................................................................... 115
4.5.4 Conditional Probability......................................................................... 117
4.6 You should............................................................................................................... 121
4.6.1 remember these definitions:.............................................................. 121
4.6.2 remember these terms........
................................................................. 121 4.6.3 remember and use these facts.......................................................... 1214.6.4 remember these points:....................................................................... 121
4.6.5 be able to.................................................................................................... 121
5 Random Variables and Expectations
128 5.1 Random Variables................................................................................................. 1285.1.1 Joint and Conditional Probability for Random Variables . . . 131
5.1.2 Just a Little Continuous Probability............................................... 1345.2 Expectations and Expected Values................................................................ 137
5.2.1 Expected Values...................................................................................... 138
5.2.2 Mean, Variance and Covariance....................................................... 141
5.2.3
Expectations and Statistics................................................................. 145 5.3 The Weak Law of Large Numbers................................................................ 145
5.3.1
IID Samples . . . . . . .
. .. .
. . . .
. . . . .. . .
. . . .
. 145
5.3.2Two Inequalities . . . .
. .
. .. . . .
. . . . .
. . .
. . . .<
. 146
5.3.3
Proving the Inequalities
. .
. .
. . .. . . . .
. . .
. . . .
. 147
5.3.4 The Weak Law of Large Numbers.................................................. 149
5.4 Using the Weak Law of Large Numbers 151
5.4.1 Should you accept a bet?..................................................................... 1515.4.2 Odds, Expectations and Bookmaking — a Cultural Diversion 152 5.4.3 Ending a Game Early 154
5.4.4 Making a Decision with Decision Trees and
Expectations . . 154 5.4.5 Utility 156 5.5 You should................................................................................... 1595.5.1 remember these definitions:.............................................................. 159
5.5.2 remember these terms......................................................................... 159
5.5.3 use and remember these facts.......................................................... 1595.5.4 be able to.................................................................................................... 160
6 Useful Probability Distributions
; 1676.1 Discrete Distributions 167
6.1.1 The Discrete Uniform Distribution................................................. 167
6.1.2 Bernoulli Random Variables..........................................................
..... 1686.1.3 The Geometric Distribution................................................................ 168
6.1.4 The Binomial Probability Distribution........................................... 1696.1.5 Multinomial probabilities..................................................................... 171
6.1.6 The Poisson Distribution..................................................................... 172
6.2 Continuous Distributions
; 1746.2.1 The Continuous Uniform Distribution........................................... 174
6.2.2 The Beta Distribution........................................................................... 174
6.2.3 The Gamma Distribution..................................................................... 176
6.2.4 The Exponential Distribution............................................................ 176
6.3 The Normal Distribution ; 178 6.3.1 The Standard Normal Distribution................................................. 1786.3.2 The Normal Distribution..................................................................... 179
6.3.3 Properties of The Normal Distribution......................................... 180
6.4 Approximating Binomials with Large N 182
6.4.1 Large N..............................................................
......................................... 1836.4.2 Getting Normal<........................................................................................ 185
6.4.3 Using a Normal Approximation to the Binomial Distribution 187
6.5
You should . . . . . . . . . . . . .
. .
. . . .
. . . . .
. . .
. .
6.5.1 remember these definitions:
. .
. . . .
. . . . .
. . .
. .6.5.2
remember these terms: .
. .
. .
. . . .
. . . . .
. . .
. .
6.5.3
remember these facts: .
. .. .
. . . .
. . . . .
. . .
. .
6.5.4
remember these points:
. .
. .
.
. . . . . . . .. . .
. .<
. . . 188
. . . 188
. . . 188
. . . 188
. . . 188
III Inference
7 Samples and Populations 197
7.1 The Sample Mean................................................................................................. 197
7.1.1 The Sample Mean is an Estimate of the Population Mean . . 197
7.1.2 The Varianc
e of the Sample Mean.................................................. 1987.1.3 When The Urn Model Works............................................................ 201
7.1.4 Distributions are Like Populations................................................. 202
7.2 Confidence Intervals............................................................................................ 203
7.2.1 Constructing Confidence Intervals.................................................. 203
7.2.2 Estimating the Variance of the Sample Mean............................ 2047.2.3 The Probability Distribution of the Sample Mean..................... 206 &
lt;7.2.4 Confidence Intervals for Population Means................................. 208
7.2.5 Standard Error Estimates from Simulation................................. 212
7.3 You should............................................................................................................... 216
7.3.1 remember these definitions:.............................................................. 216
7.3.2 remember these terms......................................................................... 216
7.3.3 remember these facts:........................................................................... 216
7.3.4 use these procedures............................................................................. 2167.3.5 be able to.................................................................................................... 216
8 The Significance of Evidence 221
8.1 Significance..............................................................
................................................ 222 8.1.1 Evaluating Significance......................................................................... 2238.1.2 P-values....................................................................................................... 225
8.2 Comparing the Mean of Two Populations.................................................. 230
8.2.1 Assuming Known Population Standard Deviations................... 231
8.2.2 Assuming Same, Unknown Population Standard Deviation . 233
8.2.3 Assuming Different, Unknown Population Stand
ard Deviation 235 8.3 Other Useful Tests of Significance................................................................. 2378.3.1 F-tests and Standard Deviations...................................................... 237
8.3.2 χ2 Tests of Model Fit............................................................................ 239
8.4 Dangerous Behavior............................................................................................. 244
8.5 You should............................................................................................................... 246
8.5.1 remember these definitions:......................................
........................ 2468.5.2 remember
8.5.3
remember these facts: . .
. . .
8.5.4
use these procedures: . . .
. . .
8.5.5
be able to: . . . . . . . . .
. . .
. . . . . . . . . . . . . . . . . 246
. . . . . . . . . . . . . . . . . 246
9 Experiments &nbs
p; 2519.1 A Simple Experiment: The Effect of a Treatment.................................. 251
9.1.1 Randomized Balanced Experiments............................................... 252
9.1.2 Decomposing Error in Predictions.................................................. 2539.1.3 Estimating the Noise Variance......................................................... 253
9.1.4 The ANOVA Table.................................................................................. 255
9.1.5 Unbalanced Experiments.................................................................... 257
9.1.6 Significant Differences.......................................................................... 259
9.2 Two Factor Experiments.................................................................................... 261
9.2.1 &n
9.2.2 Interaction Between Effects................................................................ 265
9.2.3 The Effects of a Treatment................................................................. 266
9.2.4 Setting up an ANOVA Table.............................................................. 2679.3 You should............................................................................................................... 272
9.3.1 remember these definitions:.............................................................. 272
9.3.2
remember these terms......................................................................... 272 9.3.3 remember these facts:........................................................................... 2729.3.4 use these procedures............................................................................. 272
9.3.5 be able to.................................................................................................... 272
9.3.6 Two-Way Experiments.......................................................................... 274
10 Inferring Probability Models from Data &n
bsp; 27510.1 Estimating Model Parameters with Maximum Likelihood.................. 275
10.1.1 The Maximum Likelihood Principle............................................... 277
10.1.2 Binomial, Geometric and Multinomial Distributions................ 278
10.1.3 Poisson and Normal Distributions................................................... 281
10.1.4 Confidence Intervals for Model Parameters................................ 28610.1.5 Cautions about Maximum Likelihood............................................ 288
10.2 Incorporating Prio
rs with Bayesian Inference.......................................... 28910.2.1 Conjugacy................................................................................................... 292
10.2.2 MAP Inference......................................................................................... 294
10.2.3 Cautions about Bayesian Inference................................................. 296
10.3 Bayesian Inference for Normal Distributions............................................ 296
10.3.1 Example: Measuring Depth of a Borehole................................... 296
10.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior 297
10.3.3 Filtering....................................
.................................................................. 30010.4 You should............................................................................................................... 303
10.4.1 remember these definitions:.............................................................. 303
10.4.2 remember these terms......................................................................... 303
10.4.3 remember these facts:........................................................................... 304
10.4.4 use these procedures............................................................................. 304
10.4.5 be able to.................................................................................................... 304&
lt;IV Tools 312
11 Extracting Important Relationships in High Dimensions 313
11.1 Summaries and Simple Plots........................................................................... 31311.1.1 The Mean......................
............................................................................. 31411.1.2 Stem Plots and Scatterplot Matrices.............................................. 315
11.1.3 Covariance.................................................................................................. 317
11.1.4 The Covariance Matrix......................................................................... 319
11.2 Using Mean and Covariance to Understand High Dimensional Data . 321 11.2.1 Mean and Covariance under Affine Transformations............... 322
11.2.2
. .
324
. .
. .
326
. .
327
. .
329
.
332
. .
334
. .
335
. .
335
. .338
. .
339
. .341
. .
<345
. .345
. .
345
. .345
. .
345
. .
345
349
. .
349
. .
350
. .
350
. .
351
. .
351. .
353
. .
355. .
357
. .
358. .
359
. .
<360
.< .
361
<
Eigenvectors and Diagonalization . . . . . . . . . . . . . .11.2.3 Diagonalizing Covariance by Rotating Blobs . . . . . . . .
11.2.4 Approximating Blobs . . . . . . . . . . . . . . . . . . . .
11.2.5 Example: Transforming the Height-Weight Blob . . . . .
11.3 Principal Components Analysis . . . . . . . . . . . . . . . . . . .
11.3.1 Example: Representing Colors with Principal Components
11.3.2 Example: Representing Faces
with Principal Components11.4 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . .
11.4.1 Choosing Low D Points using High D Distances . . . . . .
11.4.2 Factoring a Dot-Product Matrix . . . . . . . . . . . . . .
11.4.3 Example: Mapping with Multidimensional Scaling . . . .
11.5 Example: Understanding Height and Weight . . . . . . . . . . . 11.6 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.6.1 remember these definitions: . . . . . . . . . . . . . . . . . 11.6.2 remember these terms: . . . . . . . . . . . . . . . . . . . .11.6.3 remember these facts: .&
nbsp; . . . . . . . . . . . . . . . . . . .11.6.4 use these procedures: . . . . . . . . . . . . . . . . . . . . . 11.6.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 Learning to Classify
12.1 Classification: The Big Ideas . . . . . . . . . . . . . . . . . . . . 12.1.1 The Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . .
12.1.4 Is the Classifier Working Well? . . . . . . . . . . . . . . .
12.2 Classifying with Nearest Neighbors . . . . .
. . . . . . . . . . . . 12.3 Classifying with Naive Bayes . . . . . . . . . . . . . . . . . . . . 12.3.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . .12.4 The Support
12.4.1 Choosing a Classifier with the Hinge Loss . . . . . . . . .
12.4.2 Finding a Minimum: General Points . . . . . . . . . . . .
12.4.3 Finding a Minimum: Stochastic Gradient Descent . . . .
12.4.4 Example: Training an SVM with Stochastic Gradient Descent 363
12.4.5 Multi-Class Classification with SVMs.............................................. 36612.5 Classifying with Random Forests................................................................... 367
12.5.1 Building a Decision Tree..................................................................... 367
12.5.2 Choosing a Split with Information Gain........................................ 370
12.5.3 Forests......................................................................................................... 373
12.5.4 Building and Evaluating a Decision Forest.................................. 37412.5.5 Classifying Data Items with a Decision Forest........................... 375
12.6 You should...................................................................
............................................ 378 12.6.1 remember these definitions:.............................................................. 37812.6.2 remember these terms......................................................................... 378
12.6.3 remember these facts:........................................................................... 379
12.6.4 use these procedures............................................................................. 379
12.6.5 be able to.................................................................................................... 379
<
13.1 The Curse of Dimension........................................................................
13.1.2 Minor Banes of Dimension.................................................................. 386
13.2 The Multivariate Normal Distribution......................................................... 38713.2.1 Affine Transformations and Gaussians.......................................... 387
13.2.2 Plotting a 2D Gaussian: Covariance Ellipses.............................. 38813.3 Agglomerative and Divisive Clustering........................................................ 389
13.3.1 Clustering and Distance....................................................................... 391
13.4 &
nbsp; The K-Means Algorithm and Variants......................................................... 39213.4.1 How to choose K...................................................................................... 395
13.4.2 Soft Assignment....................................................................................... 397
13.4.3 General Comments on K-Means....................................................... 400
13.4.4 K-Mediods.................................................................................................. 400
13.5 Application Example: Clustering Documents........................................... 401
13.5.1 A Topic Model......................................................................................
.... 402 13.6 Describing Repetition with Vector Quantization...................................... 40313.6.1 Vector Quantization............................................................................... 404
13.6.2 Example: Groceries in Portugal....................................................... 406
13.6.3 Efficient Clustering and Hierarchical K Means.......................... 409
13.6.4 Example: Activity from Accelerometer Data............................... 409
13.7 You should............................................................................................................... 413
13.7.1 remember these definitions:.............................................................. 413
13.7.2 remember these terms......................................................................... 41313.7.3 remember these facts:........................................................................... 413
13.7.4 use these procedures............................................................................. 41314 Regression &nbs
p; 41714.1.1 Regression to Make Predictions....................................................... 417
14.1.2 Regression to Spot Trends.................................................................. 419
14.1 Linear Regression and Least Squares.......................................................... 42114.1.1 Linear Regression................................................................................... 421
14.1.2 Choosing β.................................................................................................. 42214.1.3 Solving the Least Squares Problem................................................ 423
14.1.4 &n
bsp; Residuals..................................................................................................... 42414.1.5 R-squared.................................................................................................... 424
14.2 Producing Good Linear Regressions............................................................. 427
14.2.1 Transforming Variables........................................................................ 428
14.2.2 Problem Data Points have Significant Impact............................ 431
14.2.3 Functions of One Explanatory Variable........................................ 433
14.2.4 Regularizing Linear Regressions...................................................... 435
14.3 &nbs
p; Exploiting Your Neighbors14.3.1 Using your Neighbors to Predict More than a Number............ 441
14.3.2 Example: Filling Large Holes with Whole Images.................... 441
14.4
You should . . . . . . . . . . . . . . .. . . .
. . . . .
. . .
14.4.1 remember these definitions:
. .
. . . .
. . .
. . .
14.4.2 remember these terms: . . .
. .
. . . .
. . . . .
. . .
. . . . . 444 . . . . . 444
. . . . . 444
14.4.3 remember these facts:........................................................................... 444
14.4.4 remember these procedures:............................................................. 44415 Markov Chains and Hidden Markov Models &n
bsp; 45415.1 Markov Chains........................................................................................................ 454
15.1.1 Transition Probability Matrices........................................................ 457
15.1.2 Stationary Distributions....................................................................... 45915.1.3 Example: Markov Chain Models of Text...................................... 462
15.2 Estimating Properties of Markov Chains.................................................... 465
15.2.1 Simulation....................................................
.............................................. 46515.2.2 Simulation Results as Random Variables..................................... 467
15.2.3 Simulating Markov Chains.................................................................. 469
15.3 Example: Ranking the Web by Simulating a Markov Chain................ 472
15.4 Hidden Markov Models and Dynamic Programming............................. 47315.4.1 Hidden Markov Models........................................................................ 474
15.4.2 Picturing Inference with a Trellis.................................................... 47415.4.3 Dynamic Programming for HMM’s: Formalities....................... 478
15.4.4 &nb
sp; Example: Simple Communication Errors..................................... 47815.5 You should............................................................................................................... 481
15.5.1 remember these definitions:.............................................................. 481
15.5.2 remember these terms......................................................................... 481
15.5.3 remember these facts:........................................................................... 481
15.5.4 be able to.................................................................................................... 481
V Some Mathematical Background &nb
sp; 48416 Resources 485
16.1 Useful
Material about Matrices....................................................................... 48516.1.1 The Singular Value Decomposition................................................. 486
16.1.2 Approximating A Symmetric Matrix............................................... 487
16.2 Some Special Functions..................................................................................... 489
16.3 Finding Nearest Neighbors............................................................................... 49016.4 Entropy and Information Gain........................................................................ 493
David Alexander Forsyth is Fulton Watson Copp Chair in Computer Science at the University of Illinois at Urbana-Champaign, where he is a leading researcher in computer vision.
A Fellow of the ACM (2014) and IEEE (2009), Forsyth has also been recognized with the IEEE Computer Society’s Technical Achievement Award (2005), the Marr Prize, and a prize for best paper in cognitive computer vision (ECCV 2002). Many of his former students are famous in their own right as academics or industry leaders.
He is the co-author with Jean Ponce of Computer Vision: A Modern Approach (2002; 2011), published in four languages, and a leading textbook on the topic.Among a variety of odd hobbies, he is
1997-2024 DolnySlask.com Agencja Internetowa