1/61
A Brief Tour of Statistics
This work by Thomas Lotze is licensed under a Creative Commons
Attribution 3.0 United States License. You’re free to share or change
it, so long as you provide attribution to Thomas Lotze.
http://creativecommons.org/licenses/by/3.0/us
2/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
3/61Creative Commons licensed, from gr0don on flickr
4/61Creative Commons licensed, from WxMom on flickr
5/61Creative Commons licensed, from stringberd on flickr
6/61Creative Commons licensed, from benchilada on flickr
7/61Creative Commons licensed, from benchilada on flickr
8/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
9/61Creative Commons licensed, from Jason Tromm on flickr
10/61
11/61
12/61Creative Commons licensed, from Rickydavid on flickr
13/61
14/61Creative Commons licensed, from Antoine Taveneaux on Wikimedia Commons
15/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
16/61
17/61
18/61
19/61
20/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
21/61Creative Commons licensed, from gr0don on flickr
22/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
23/61
21/38
24/61
25/61
26/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
27/61
28/61
29/61
30/61
31/610.10.44
0.050.41
0.010.34
0.0010.30
Probability of 21 or
more successes
Actual
Field
Goal Rate
32/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
33/61
34/61
35/61
36/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
37/61
38/61
39/61
40/61
41/61
42/61
43/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
44/61
45/61
46/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?• Bootstrapping• Multiple Testing
• Useful Tools
47/61Creative Commons licensed, from Alan Vernon on flickr
48/61
49/61
50/61
51/61
52/61
53/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
54/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing• Useful Tools
55/61Creative Commons licensed, from CharlotteKinzie on flickr
56/61
57/61
0.991000.05
0.40100.05
0.0510.05
0.631000.01
0.10100.01
0.0110.01
Probability of
False Detection (in at least 1 test)Number of Tests
Probability of
False Detection (per test)
58/61
• What are Distributions?• Models
– Binomial– Poisson– Uniform– Gaussian (normal)
• Using Distributions to Answer Questions
• Distribution Parameters• Estimating Parameters
• Confidence Intervals and P-values• Why Gaussian?
• Regression
• Logistic Regression• Why Not Gaussian?
• Bootstrapping• Multiple Testing
• Useful Tools
59/61
• R: http://cran.r-project.org/
• Casella & Berger, Statistical Inference
• Wikipedia: http://en.wikipedia.org/wiki/List_of_probability_distributions
• Salsburg, The Lady Tasting Tea
60/61
1. Everything is a Distribution
2. Many Kinds of “Random” (many Distributions)
3. Estimated Parameters are RandomThey have Distributions!
4. Statistical Decisions come from Distribution Estimates
5. Be Skeptical of NormalityMean and Variance are not sufficient!
6. Be Skeptical of Multiple Testing
61/61
• Practical Take-home– Normality test– T-test– Wilcoxon rank-sum test
• Other Distributions– Student’s T– F– Lognormal– Geometric– Levy– Weibull– Benford
• Goodness of fit– Chi-squared test– Q-Q plots
• Distribution Connections• Multivariate Distributions
• Bias/Variance Tradeoff
•Nonparametric Distributions•Model Comparison:
Parameters and Fit (AIC, BIC)
•Bayesian Statistics•Bayes’ Law
•Cognitive Biases•Time series
•Counterintuitive Probability•Monty Haul
•Two Aces
•Poisson Waiting Times•Markov Chain Monte Carlo
•Extreme Value Statistics