Stat 204, Part 3 Probability
Chapter 5: Probability
These notes reflect material from our text, Exploring the Practice of Statistics, by Moore, McCabe, andCraig, published by Freeman, 2014.
Probability
Probability quantifies randomness. It is a formal framework with a very specific vocabulary and nota-tion. Imagine an experiment with a specific set of outcomes (say, flipping a fair coin twice). S is the samplespace of all possible outcomes. Subsets of S are called events and are denoted with letters like A and B.The empty set, φ, is the event that contains no outcomes. Two events are disjoint if their intersection isempty.
The Russian mathematician Kolmogorov helped to clarify the essential properties of a probability func-tion, P.
• P(S) = 1 for the entire sample space S
• 0 ≤ P(A) ≤ 1 for any event A ⊂ S• P(∪ni=1Ai) =
∑ni=1 P(Ai) for disjoint events Ai
First examples : flip a coin, flip three coins, roll a die, roll two dice
If you roll a die once the result is completely uncertain, because the individual outcomes are equallylikely. But now begin to methodically roll the die and after each toss calculate the total number of 6’sobserved so far divided by the total number of rolls at this point. Call this a cumulative proportion andgraph these cumulative proportions for a large number of rolls of the die, say 100,000 rolls. A computer didthis and displayed the following graph. In this particular simulation, the first ten rolls of the die producedthe sequence 0001010010, where 1 means a 6 was rolled and 0 means something else appeared. Calculatethe first ten cumulative sums for this short sequence and compare your results to the following chart. Whatis the height of the dotted red line?
n (number of rolls)
1 10 100 1,000 10,000 100,000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
p̂n
Fig. Cumulative proportions of a 6 in 100,000 rollsof a fair die, from OpenIntro Statistics, chapter 2
Display discrete probabilities in a table
Flip a fair coin
outcome h tprobability 0.5 0.5
Spring 2017 Page 1 of 12
Stat 204, Part 3 Probability
Venn diagram
BA
Rules of Probability
Mutually exclusive events. A ∩B = φ
Unions. P(A ∪B) = P(A) + P(B)− P(A ∩B).
Complements. P(Ac) = 1− P(A).
Independent events. P(A ∩B) = P(A)P(B) when A and B are independent.
Conditional probability. P(A|B) = P(A ∩B)/P(B) when P(B) 6= 0
Intersections. P(A ∩B) = P(A|B)P(B)
Spring 2017 Page 2 of 12
Stat 204, Part 3 Probability
Contingency tables and conditional probabilities
Vocabulary for diagnostic testing, S medical state present, POS test positive :
sensitivity P(POS|S), specificity P(NEG|Sc), incidence P(S)
Consider the Triple Blood Test for Down Syndrome (Agresti and Franklin, chapter 5, pp.232-233)
Blood Test
Status POS NEG Total
D (Down) 48 6 54Dc (unaffected) 1307 3921 5228Total 1355 3927 5282
Calculate the following probabilities based on the figures in this study:
sensitivity P(POS|D), specificity P(NEG|Dc), incidence P(D)false positives P(POS|Dc), false negatives P(NEG|D)
An individual being tested would be most concerned about P(D|POS). What is this probability? Whyis it so small? Hint: Calculate P(Dc|POS).
Again, an individual being tested would want to know P(D|NEG). How would that probability com-pare to the a priori P(D)?
Triple Blood Test
blood test
status
POS NEG
Down
unaffected
Spring 2017 Page 3 of 12
Stat 204, Part 3 Probability
Using R to Compute Conditional Probabilities
Construct a data frame named down to represent the Down Syndrome contingency table, and then useaddmargins(down) to compute its row and column totals.
down <- c(48, 1307, 6, 3921)
dim(down) <- c(2, 2)
dimnames(down) <- list(status=c("down", "unaffected"),
"blood test"=c("pos", "neg"))
down
# blood test
# status pos neg
# down 48 6
# unaffected 1307 3921
addmargins(down)
# blood test
# status pos neg Sum
# down 48 6 54
# unaffected 1307 3921 5228
# Sum 1355 3927 5282
Then prop.table(down, 1) will divide each row by its row sum. The numbers in each row are conditionalprobabilities. And prop.table(down, 2) will divide each column by its column sum. The numbers in eachcolumn are conditional probabilities. Therefore, each of the eight numbers shown below is a conditionalprobability of the form P(A |B) for some A and B. Identify the correct A and B for each number.
prop.table(down, 1)
# blood test
# status pos neg
# down 0.8888889 0.1111111
# unaffected 0.2500000 0.7500000
prop.table(down, 2)
# blood test
# status pos neg
# down 0.03542435 0.001527884
# unaffected 0.96457565 0.998472116
What values do these tables indicate for P(pos | down) and P(down | pos)?
Spring 2017 Page 4 of 12
Stat 204, Part 3 Probability
Boston Smallpox Epidemic of 1721
The following contingency table (OpenIntro Statistics, pp.83–87) refers to the Boston smallpox epidemicof 1721. A total of 6224 residents of Boston contracted smallpox in this epidemic and 850 of them died.The epidemic was marked by vigorous public debate of the value (or lack thereof) of a type of inoculationknown as variolation (which was dangerous). The Reverend Cotton Mather advocated inoculation but thephysician William Douglass was firmly against it. See the article in Harvard’s Contagion for more details.An effective smallpox vaccination procedure was eventually demonstrated by Edward Jenner in Englandin 1796, and succeeding efforts to eradicate smallpox from the world were finally declared to be successfulin 1980 by the World Health Organization. Cotton Mather, on the other hand, lives on in infamy for hisrole in the Salem witch trials.
Inoculated
Result yes no Total
lived 238 5136 5374died 6 844 850Total 244 5980 6224
Smallpox Epidemic, Boston, 1721
innoculated
result
yes no
lived
died
Spring 2017 Page 5 of 12
Stat 204, Part 3 Probability
Tree Diagrams
The following tree diagram, generated by OpenIntro software, summarizes the relevant statistics forthe Boston smallpox epidemic of 1721. Here Inoculated is a categorical explanatory variable with levelsyes and no. In the Inoculated column of the tree diagram are the probabilities P(yes) and P(no). Thecategorical response variable Result has levels lived and died. The conditional probabilities in the Resultcolumn are
P(lived | yes),P(died | yes),P(lived |no),P(died |no).
The probabilities calculated by the software in the third column are
P(lived and yes),P(died and yes),P(lived and no),P(died and no),
becauseP(A)× P(B |A) = P(A ∩B).
Innoculated Result
yes, 0.0392
lived, 0.97540.0392*0.9754 = 0.03824
died, 0.02460.0392*0.0246 = 0.00096
no, 0.9608
lived, 0.85890.9608*0.8589 = 0.82523
died, 0.14110.9608*0.1411 = 0.13557
Fig. Smallpox in Boston, 1721, from OpenIntro Statistics, chapter 2, pp.83-87
Spring 2017 Page 6 of 12
Stat 204, Part 3 Probability
Random variables
A random variable is a function from the sample space, S, of an experiment to the real numbers,X : S → R, so we might characterize a random variable as a function which assigns a numerical value to anoutcome of an experiment. Random variables can be defined on discrete and on continuous sample spaces.
discrete: flip a coin, flip three coins, roll a die, roll two dice, roulette wheel, spinner
continuous: random number generators: U [0, 1], N(0, 1), N(µ, σ)
Expected value of a random variable, E(X) = µX
Variance of a random variable, Var(X) = σ2X
Linear combinations of random variables, Y = aX1 + bX2
Expected value and variance of a linear combination of random variables. If Y = aX1 + bX2, then
E(Y ) = aE(X1) + bE(X2),
andVar(Y ) = a2Var(X1) + b2V(X2).
Distributions of random variables
Calculation of probability using a continuous distribution, P(X ≤ x). The area of the blue region inthe following figure is the probability that the random variable X ∼ N(µ, σ) takes on a value less than orequal to 5. That probability is denoted P(X ≤ 5).
X ~ N(µ, σ)
x
y
-3 -1 1 3 5 7 9
Normal distributions
Normal random variable, X ∼ N(µ, σ). z-Score, z = (x− µ)/σ. If z = (x− µ)/σ, then x = µ+ z × σ.
Spring 2017 Page 7 of 12
Stat 204, Part 3 Probability
Standardized normal random variable, Z ∼ N(0, 1). If X ∼ N(µ, σ) and Z = (X − µ)/σ, thenZ ∼ N(0, 1). This is why our textbook need only contain a table of values for the standard normal distri-bution.
Areas of regions under a normal distribution curve. Percentiles. The 68-95-99.7% rule. Q-Q plots.
Calculations with X ∼ N(0, 1)
Suppose that the random variable X has a standard normal distribution, X ∼ N(0, 1).
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
X ~ N(0, 1)
There are four useful procedures in R for working with normal distributions:
dnorm, pnorm, qnorm, rnorm.
a. pnorm(2) ⇒ P(X ≤ 2)
b. pnorm(2) - pnorm(-2) ⇒ P(−2 ≤ X ≤ 2)
c. 1 - pnorm(2) ⇒ P(X ≥ 2)
d. qnorm(0.60) ⇒ q60 such that P(X ≤ q60) = 0.60, the 60th percentile
e. rnorm(3) ⇒ three random numbers from the standard normal distribution, for instance
0.3612443 0.1075216 − 1.0473477
f. dnorm() used for drawing the graph of the bell curve
Spring 2017 Page 8 of 12
Stat 204, Part 3 Probability
Calculations with X ∼ N(µ, σ)
Agresti and Franklin report that female students at the University of Georgia have an approximatelynormal height distribution, with mean µW = 65 inches and standard deviation σW =3.5 inches. Malestudents have an approximately normal height distribution, with mean µM = 70 inches and standarddeviation σM=4.0 inches. Let W ∼ N(µW , σW ), and M ∼ N(µM , σM ), and calculate the following (usingR and using Agresti and Franklin, Appendix A, pp.A-1 and A-2):
P(W ≤ 66), P(M ≥ 72), q such that P(W ≤ q) = 0.30, q such that P(M ≥ q) = 0.25
Calculate the z-score of a person with W = 63, of a person with M = 67. How tall is a woman withz-score 0.6? How tall is a man with z-score -0.7? See page 11 of these notes for R expressions which willcalculate the answers to these questions.
55 60 65 70 75 80 85
0.00
0.02
0.04
0.06
0.08
0.10
Men's and Women's Heights
height (in)
menwomen
Student’s t, Chi-Square, F
Student’s t, Chi-Square, and F distributions play key roles in the sequel. All of them are families ofcontinuous distributions. Student’s t distributions resemble Normal distributions but they have fatter tails.Chi-Square and F distributions have domains the half line [0,∞), so neither one is symmetric.
Discrete distributions
For X to be a Bernoulli random variable, and hence have a Bernoulli distribution, X ∼ Bernoulli(p),we require
i. a binary outcome for a single event (generally coded as success, 1, or failure, 0)
ii. a fixed probability of success, P(X = 1) = p, and failure, P(X = 0) = 1− p, for that event
iii. exactly one event
Examples of Bernoulli random variables include the outcome of a coin flip (h or t), or driver was wearinga seat belt (yes or no), or basketball player made a basket (1 or 0).
Spring 2017 Page 9 of 12
Stat 204, Part 3 Probability
Expected value and variance of a Bernoulli random variable, X ∼ Bernoulli(p):
Expected value, µX = p.
Variance, σ2X = p(1− p).
0.0
0.2
0.4
0.6
0.8
1.0
Bernoulli distribution, p=1/6
k
prob
abili
ty d
ensi
ty
0 1
Binomial random variable, X ∼ Binomial(n, p). The probability of k successes in n trials. Expectedvalue, µX = np. Variance, σ2
X = np(1− p). Normal approximation to a binomial distribution.
0 2 4 6 8 10
0.00
0.10
0.20
0.30
binomial distribution, p=1/6, n=10
k
prob
abili
ty d
ensi
ty
●
●
●
●
●
● ● ● ● ● ●
Spring 2017 Page 10 of 12
Stat 204, Part 3 Probability
Conditions for a binomial distribution
For X to be a binomial random variable, and hence have a binomial distribution, X ∼ Binomial(n, p),we require
i. a binary outcome for each event (coin flip produces h or t)
ii. a single fixed probability of success for each event (p = 0.5)
iii. a fixed number of events (n = 10 coin flips)
Normal approximations to binomial distributions
The distribution of a binomial random variable, X ∼ Binomial(n, p), has mean np and standarddeviation
√np(1− p). It can be approximated by a normal probability distribution with the same mean
and standard deviation, Y ∼ N(µ = np, σ =√np(1− p)). The fit improves as n gets larger.
0 1 2 3 4 5 6
0.00
0.10
0.20
0.30
binomial distribution, p=1/6, n=10
k
prob
abili
ty d
ensi
ty
0 2 4 6 8 10 12
0.00
0.05
0.10
0.15
binomial distribution, p=1/6, n=30
k
prob
abili
ty d
ensi
ty
0 5 10 15 20
0.00
0.05
0.10
0.15
binomial distribution, p=1/6, n=50
k
prob
abili
ty d
ensi
ty
0 10 20 30 40
0.00
0.04
0.08
binomial distribution, p=1/6, n=100
k
prob
abili
ty d
ensi
ty
Answers
The following R expressions calculate the answers to the questions about heights of men and womenat the University of Georgia posed above. For each calculation, draw a corresponding normal curve andshade the area or mark the measurement in question.
pnorm(66, mean = 65, sd = 3.5), 1− pnorm(72, mean = 70, sd = 4.0),
qnorm(0.30, mean = 65, sd = 3.5), qnorm(1− 0.25, mean = 70, sd = 4.0),
z← 63− 65
3.5, z← 67− 70
4.0,
x← 65 + 0.6× 3.5, x← 70− 0.7× 4.0.
Spring 2017 Page 11 of 12
Stat 204, Part 3 Probability
Exercises
We will attempt to solve some of the following exercises as a community project in class today. Finish thesesolutions as homework exercises, write them up carefully and clearly, and hand them in at the beginningof class next Friday.
Homework 5a – probability models
Exercises from Sections 5.1, 5.2:5.2 (graduation rates), 5.3 (free throws), 5.24 (blood types), 5.26 (Canada)
Homework 5b – random variables
Exercises from Sections 5.3, 5.4:5.46 (households), 5.54 (foreign-born), 5.65 (fruits and veggies), 5.75 (sums)
Homework 5c – binomial distributions and probability rules
Exercises from Sections 5.5, 5.6 and Chapter 5 exercises:5.94 (music), 5.102 (die), 5.118 (tree diagram), 5.142 (SAT scores)
Spring 2017 Page 12 of 12