Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | ambrose-mathews |
View: | 214 times |
Download: | 0 times |
June 10, 2008 Stat 111 - Lecture 9 - Proportions 1
Introduction to Inference
Sampling Distributions
for Counts and Proportions
Statistics 111 - Lecture 9
June 10, 2008 Stat 111 - Lecture 9 - Proportions 2
Administrative Notes
• Homework 3 is due on Monday, June 15th – Covers chapters 1-5 in textbook
• Exam on Monday, June 15th • Review session on Thursday
June 10, 2008 Stat 111 - Lecture 9 - Proportions 3
Last Class• Focused on models for continuous data: using the
sample mean as our estimate of population mean
• Sampling Distributionof the Sample Mean• how does the sample mean change over different samples?
PopulationParameter:
Distributionof thesevalues?
Sample 1 of size n xSample 2 of size n xSample 3 of size n xSample 4 of size n xSample 5 of size n xSample 6 of size n x
.
. .
June 10, 2008 Stat 111 - Lecture 9 - Proportions 4
Today’s Class
• We will now focus on count data: categorical data that takes on only two different values
“Success” (Yi = 1) or “Failure” (Yi = 0)
• Goal is to estimate population proportion:
p = proportion of Yi = 1 in population
June 10, 2008 Stat 111 - Lecture 9 - Proportions 5
Examples
• Gender: our class has 83 women and 42 men • What is proportion of women in Penn student
population?
• Presidential Election: out of 2000 people sampled, 1150 will vote for McCain in upcoming election• What proportion of total population will vote for
McCain?
• Quality Control: Inspection of a sample of 100 microchips from a large shipment shows 10 failures• What is proportion of failures in all shipments?
June 10, 2008 Stat 111 - Lecture 9 - Proportions 6
Inference for Count Data
• Goal for count data is to estimate the population proportion p
• From a sample of size n, we can calculate two statistics:1. sample count Y2. sample proportion = Y/n
• Use sample proportion as our estimate of population proportionp
• Sampling Distributionof the Sample Proportion• how does sample proportion change over different samples?
PopulationParameter: p
Distributionof thesevalues?
Sample 1 of size n xSample 2 of size n xSample 3 of size n xSample 4 of size n xSample 5 of size n xSample 6 of size n x
.
. .
June 10, 2008 Stat 111 - Lecture 9 - Proportions 7
The Binomial Setting for Count Data
1. Fixed number n of observations (or trials)
2. Each observation is independent
3. Each observation falls into 1 of 2 categories:1. Success (Y = 1) or Failure (Y = 0)
4. Each observation has the same probability of success: p = P(Y = 1)
June 10, 2008 Stat 111 - Lecture 9 - Proportions 8
Binomial Distribution for Sample Count
• Sample count Y (number of Yi=1 in sample of size n) has a Binomial distribution
• The binomial distribution has two parameters:• number of trials n and population proportion p
P(X=k) = nCk * pk (1-p)(n-k)
• Binomial formula accounts for• number of success: pk
• number of failures : (1-p)n-k
• different orders of success/failures: nCk = n!/(k!(n-k)!)
June 10, 2008 Stat 111 - Lecture 9 - Proportions 9
Binomial Probability Histogram• Can make histogram out of these probabilities
• Can add up bars of histogram to get any probability we want: eg. P(Y < 4)
• Different values of n and p have different histograms, but Table C in book has probabilities for many values of n and p
June 10, 2008 Stat 111 - Lecture 9 - Proportions 11
Example: Genetics• If a couple are both carriers of a certain
disease, then their children each have probability 0.25 of being born with disease
• Suppose that the couple has 4 children• P(none of their children have the disease)?
P(X=0) = 4!/(0!*4!) * .250 * (1-.25)4
• P(at least two children have the disease)?P(Y ≥ 2) = P(Y = 2) +P(Y = 3) +P(Y = 4)
= 0.2109 +0.0469 +0.0039 (from table)
= 0.2617
June 10, 2008 Stat 111 - Lecture 9 - Proportions 12
Example: Quality Control
• A worker inspects a sample of n=20 microchips from a large shipment
• The probability of a microchip being faulty is 10% (p = 0.10)
• What is the probability that there are less than three failures in the sample?
P(Y < 3) = P(Y = 0) + P(Y =1) + P(Y = 2)
= 0.1216 + 0.2702 + 0.2852 (from table)
= 0.677
June 10, 2008 Stat 111 - Lecture 9 - Proportions 13
Sample Proportions• Usually, we are more interested in a sample
proportion = Y/n instead of a sample count
P ( < k ) = P( Y < n*k)• Example: a worker inspects a sample of 20
microchips from a large shipment with probability of a microchip being faulty is 0.1
• What is the probability that our sample proportion of faulty chips is less than 0.05?
• P ( < .05 ) = P( Y < 1) = P(Y=0) = .1216
0.05 x 20
June 10, 2008 Stat 111 - Lecture 9 - Proportions 14
Mean and Variance of Binomial Counts
• If our sample count Y is a random variable with a Binomial distribution, what is the mean and variance of Y across all samples?• Useful since we only observe the value of Y for our
sample but what are the values in other samples?
• We can calculate the mean and variance of a Binomial distribution with parameters n and p:
μY = n*p
σ2 = n*p*(1-p)
σ = √ (n*p*(1-p))
June 10, 2008 Stat 111 - Lecture 9 - Proportions 15
Mean/Variance of Binomial Proportions
• Sample proportion is a linear transformation of the sample count ( = Y/n )
μ = 1/n * mean(Y) = 1/n * np = p
• Mean of sample proportion is true probability of success p
σ2 = 1/n2 Var(Y) = 1/n2 * n*p*(1-p) = p(1-p)/n
• Variance of sample proportion decreases as sample size n increases!
June 10, 2008 Stat 111 - Lecture 9 - Proportions 16
Variance over Long-Run• Lower variance with larger sample size means that
sample proportion will tend to be closer to population mean in larger samples
• Long-run behaviour of two different coin tossing runs. Much less likely to get unexpected events in larger samples
June 10, 2008 Stat 111 - Lecture 9 - Proportions 17
Binomial Probabilities in Large Samples• In large samples, it is often tedious to calculate
probabilities using the binomial distribution• Example: Gallup poll for presidential election• Bush has 49% of vote in population. What is the
probability that Bush gets a count over 550 in a sample of 1000 people?
P(Y > 550) = P(Y = 551) + P(Y = 552) + … + P(Y =1000)
= 450 terms to look up in the table!
• We can instead use the fact that for large samples, the Binomial distribution is closely approximated by the Normal distribution
June 10, 2008 Stat 111 - Lecture 9 - Proportions 19
Normal Approximation to Binomial• If count Y follows a binomial distribution with
parameters n and p, then Y approximately follows a Normal distribution with mean and variance:
μY = n*p
• This approximation is only good if n is “large enough”. • Rule of thumb for “large enough”:n·p≥ 10 and n(1-p) ≥ 10
• Also works for sample proportion: = Y/n follows a Normal distribution with mean and variance
June 10, 2008 Stat 111 - Lecture 9 - Proportions 20
Example: Quality Control• Sample of 100 microchips (with usual 10% of
microchips are faulty. What is the probability there are at least 17 bad chips in our sample?
• Using Binomial calculation/table is tedious. Instead use Normal approximation:
• Mean = n·p = 1000.10 = 10
• Var = n·p·(1-p) = 1000.100.90 = 9
= P(Z ≥ 2.33)
=1- P(Z ≤ 2.33)
= 0.01 (from table)
June 10, 2008 Stat 111 - Lecture 9 - Proportions 21
Example: Gallup Poll• Bush has 49% of vote in population• What is the probability that Bush gets sample
proportion over 0.51 in sample of size 1000? • Use normal distribution with
mean = p = 0.49 and variance p·(1-p)/n = 0.000245
= P(Z ≥1.27) =1- P(Z ≤1.27)
= 0.102
June 10, 2008 Stat 111 - Lecture 9 - Proportions 22
Why does Normal Approximation work?
• Central Limit Theorem: in large samples, the distribution of the sample mean is approx. Normal
• Well, our count data takes on two different values:“Success” (Yi = 1) or “Failure” (Yi = 0)
• The sample proportion is the same as the sample mean for count data!
• So, Central Limit Theorem works for sample proportions as well!