Central limit TheoremSample Distribution Models for Means and Proportions
Central Limit TheoremTwo assumptions
1. The sampled values must be independent
2. The sample size, n, must be large enough
• The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model.
• The larger the sample, the better the approximation will be.
• This is regardless of the shape of the distribution of the population being sampled from or the shape of the distribution of the sample.
Distribution of sample proportions• Population has a fixed proportion
• To find population proportion a sample is taken and a sample proportion is calculated
If samples are repeatedly taken with the same sample size
• The mean of the sample distribution would be the population proportion,
• The standard deviation would be
( )p
( )p
p̂
pq
n
p̂ p
Conditions to check for the assumptions
1. Success/Failure: The expected number of successes and failure is both greater than 10
2. 10% Condition: Each sample is less than 10% of the population
3. Randomization: The sample was obtained through random sample techniques or we can at least assume that the sample is representative.
All conditions have been met to use the Normal model for the distribution of sample proportions.
10 10np andnq
• If samples were repeatedly taken with the same sample size then from the CLT, the distribution would be approximately Normal
ˆ ~ ,pq
p N pn
Example: Skittles• According to the manufacturer of the candy
Skittles, 20% of the candy produced is the color red. What is the probability that given a large bag of skittles with 58 candies that we get at least 17 red?
Conditions:
1. 10% condition: 58 skittles is less than 10% of all skittles produced.
2. Success/Failure:
There are at least 10 successes and failures
3. Randomization: Though not from a random sample we can assume the bag is representative of the population.
All conditions have been met to use the Normal model for the distribution of sample proportions.
58 0.20 11.6 10np 58 0.80 46.4 10nq
• Mean:
• Standard Deviation:
• So the model for becomes N(0.20,0.0525)
• Sample proportion:
0.20p
0.20 0.800.0525
58
pq
n
17ˆ 0.293
58p
p̂
• Then to find the probability that we get a sample proportion of 0.293 or higher:
ˆ( 0.293) 0.0383P p
Confidence Intervals1 Proportion z-intervals
Distribution of Sample Proportions
• From previous work-
• Distribution of sample proportions follow a Normal Model
• But most of the time we don’t know what the population
proportion is.
,pq
N pn
• We take samples to try to find the population
proportion.
• is the estimate of p
• Since we don’t know p we can’t find the standard
deviation.
• We’ll estimate it with the Standard Error:
p̂
ˆ ˆ
ˆpq
SE pn
Confidence Interval
• An interval based on the sample proportion in which we
have a measure of confidence that the true population
proportion lies in.
• Size of the interval is based on sample size and level of
confidence.
• The larger the sample size, the smaller the interval is
• The larger the confidence, the larger the interval is
• Every confidence interval has the same basic setup
• ME is the measure of error
• For a one-proportion sample
where z* is the critical value, the z value associated
with the level of confidence
estimate ME
*
*
ˆ ˆ( )
ˆ ˆˆ
p z SE p
pqp z
n
* ˆ ˆpqME z
n
Critical Values – some basicsLevel of Confidence z*
90% 1.645
95% 1.960
99% 2.576
To find the critical value given a level of confidence
1. Subtract level of confidence from 1
2. Divide difference by 2
3. Use invNorm( ) function on the calculator but make it a positive value
Ex. 90% confidence
1-.9 = 0.10
0.10/2 = 0.05
invNorm(0.05) = -1.645
z* = 1.645
Conditions• Randomization
• 10% Condition(Independence)
• Success/Failure: this uses the sample
proportion since we don’t know the population
proportion
All conditions have been met to use the Normal
model for a 1-proportion z-interval
ˆ ˆ10; 10np nq
An experiment finds that 27% of 53 subjects report
improvement after using a new medicine. Create a 95%
confidence interval for the actual cure rate.
Conditions:
1) Random: assume representative sample
2) 10% Condition: It is safe to assume that 53 subjects is less
than 10% of all subjects
3) Success/Failure:
All conditions have been met to use the Normal model for a 1-
proportion z-interval.
ˆ53 0.27n p
ˆ 53 .27 14.31 10
ˆ 53 .73 38.69 10
np
nq
Mechanics:
Conclusion:
We are 95% confident that the true proportion of subjects that
show improvement lies between 15.05% and 38.95%.
ˆ53 0.27 95 1.96n p CL z
0.27 0.730.27 1.96
53
0.1505,0.3895
ˆ ˆˆ:
pqCI p z
n
0.27 0.1195
p. 456 #11. In January 2007 Consumer Reports published
their study of bacterial contamination of chicken sold in the
United States. They purchased 525 broiler chickens from
various kinds of food stores in 23 states and tested them
for types of bacteria that cause food-borne illnesses.
Laboratory results indicated that 83% of these chickens
were infected with Campylobacter. Construct a 95%
confidence interval.
ˆ 525(0.17) 89.25 10nq
Conditions:
•Random: assume sample is representative
•10% Condition: 525 chickens is less than 10% of all chickens for sale
•Success/Failure:
p. 456 #11. Contaminated Chicken
n = 525 ˆ 0.83p
ˆ 525(0.83) 435.75 10np
ˆ ˆ 0.83 0.17ˆ 0.83 1.96
525
p qp z
n
All conditions have been met to use the Normal model for a 1 proportion z-interval.
CI:
(0.7979, 0.8621)
We are 95% confident that the true proportion of broiler chickens
infected with Campylobacter lies between 79.8% and 86.2%.
p. 456 #18. Direct mail advertisers send solicitations (a.k.a.
“junk mail”) to thousands of potential customers in the hope
that some will buy the company’s product. The acceptance
rate is usually quite low. Suppose a company wants to test
the response to a new flyer, and sends it to 1000 people
randomly selected from their mailing list of over 200,000
people. They get orders from 123 of the recipients. Create a
90% confidence interval for the percentage of people the
company contacts who may buy something.
ˆ 877 10nq
Conditions:
•Random: stated as a random sample
•10% Condition: 1000 people is less than 10% of 200,000 people on the mailing list
•Success/Failure:
p. 456 #18. Junk Mail
n = 1000
123ˆ 0.123
1000p
ˆ 123 10np
ˆ ˆ 0.123 0.877ˆ 0.123 1.645
1000
p qp z
n
All conditions have been met to use the Normal model for a 1 proportion z-interval.
CI:
(0.1059,0.1400)
We are 90% confident that the true proportion of people contacted that
buy something lies between 10.6% and 14.0%
What does __% confidence mean?
Stock Statement:
About ___% of random samples of size (n) will produce confidence
intervals that contain the true proportion of ___
Ex. #18. What does 90% confidence mean?
About 90% of random samples of size 1000 will produce confidence
intervals that contain the true proportion of people contacted who will
buy something.
d) Since 5% lies below the interval it is
suggested that the company run the mass
mailing.
From a previous experiment we found the cure rate to be 27%. How
many subjects would we need in a new experiment to be able to
create a confidence interval with 98% and a ME of only ±5%?
427n
0.27 98% * 2.326 0.05p z ME
*pq
ME zn
(0.27)(0.73)0.05 2.326
n
• 95% Confidence ME = 0.03 ˆ 0.36p
90% Confidence ME = 0.045 ˆ 0.27p
(0.36)(0.64)0.03 1.96
n
984n
(0.27)(0.73)0.045 1.645
n
264n