Chapters 18 - 19

transcript

Chapters 18 - 19

Sampling Distribution Models and

Confidence Intervals

Example• We want to find out the proportion of men in

U.S. population. • We draw a sample and calculate the proportion

of men in the sample. Suppose we find that the proportion = 60%.

• We conclude that 60% of the population are men. How much should we trust this estimate?

• Suppose the actual percentage is p = 52%

Margin of Errors

• Example: – 60% of U.S. population are men with margin of

errors ± 5%– This yields an interval of estimate from 55% to 65%

The extent of the interval on either side of the sample proportion or mean is called the margin of error (ME).

In general, the intervals of estimate have the form estimate ± ME.

Sampling Distribution• Now imagine what would happen if we looked

at the sample proportions for these samples.• The histogram we’d get if we could see all the

proportions from all possible samples is called the sampling distribution of the proportions.

• What would the histogram of all the sample proportions look like?

• It turns out that the histogram is unimodal, symmetric, and centered at p. It’s a normal distribution.

Sampling Distribution Model• The mean of the sampling distribution is at p.• The standard deviation of the distribution is

• So, the distribution of the sample proportions

is modeled with normal model

N p,pqn

Assumptions• Most models are useful only when specific

assumptions are true.• There are two assumptions in the case of the

model for the distribution of sample proportions:

1. The Independence Assumption: The sampled values must be independent of each other.

2. The Sample Size Assumption: The sample size, n, must be large enough.

Assumptions and Conditions• Assumptions are hard—often impossible—to

check. • Still, we need to check whether the assumptions

are reasonable by checking conditions that provide information about the assumptions.

• The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition, 10% Condition and the Success/Failure Condition.

Conditions1. Randomization Condition: The sample should

be a simple random sample of the population.2. 10% Condition: If sampling has not been made

with replacement, then the sample size, n, must be no larger than 10% of the population.

3. Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10.

A Sampling Distribution Model • A proportion is no longer just a computation

for a set of data.– It is now a random quantity that has a distribution.– This distribution is called the sampling distribution

model for proportions.• Even though we depend on sampling

distribution models, we never actually get to see them. – We never actually take repeated samples from the

same population and make a histogram. We only imagine or simulate them.

The Central Limit Theoremfor a Proportion

Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of is modeled by a Normal model with – Mean:

– Standard deviation:

Means – The “Average” of One Die

• Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Means – Averaging More Dice• Looking at the average of

two dice after a simulation of 10,000 tosses:

• The average of three dice after a simulation of 10,000 tosses looks like:

Means – Averaging Still More Dice

• The average of 5 dice after a simulation of 10,000 tosses looks like:

• The average of 20 dice after a simulation of 10,000 tosses looks like:

Means – What the Simulations Show

• As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean.– So, we see the shape continuing to tighten around

3.5• And, it probably does not shock you that the

sampling distribution of a mean becomes Normal.

The Central Limit Theorem: The Fundamental Theorem of Statistics

• The sampling distribution of any mean becomes more nearly Normal as the sample size grows. – All we need is for the observations to be

independent and collected with randomization.– We don’t even care about the shape of the

population distribution!• The Fundamental Theorem of Statistics is called

the Central Limit Theorem (CLT).

The Central Limit Theorem (CLT)The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.

Assumptions and ConditionsThe CLT requires essentially the same assumptions we saw for modeling proportions:

Independence Assumption: The sampled values must be independent of each other.

Sample Size Assumption: The sample size must be sufficiently large.

We can’t check these directly, but we can think about whether the Independence Assumption is plausible, and check

– Randomization Condition– 10% Condition: n is less than 10% of the population.– Large Enough Sample Condition: The CLT doesn’t tell us how

large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

CLT and Standard Error• The CLT says that the sampling distribution of any

mean or proportion is approximately Normal.– For proportions, the sampling distribution is centered

at the population proportion.– For means, it’s centered at the population mean.– For proportions

– For means

SD p̂ pqn

SD y n

Standard Error• But we don’t know p or σ, we’re stuck, right?• Since we don’t know p or σ, we can’t find the

true standard deviation of the sampling distribution model, so we need to estimate the S..D. of a sampling distribution. We call this estimate (of the S.D. of a sampling distribution a standard error (SE).

• For a sample proportion,

• For the sample mean,

SE p̂ p̂q̂n

SE y sn

The Real World & the Model WorldBe careful! Now we have two distributions to deal with. The first is the real world distribution of the sample,

which we might display with a histogram. The second is the math world sampling distribution

of the statistic, which we model with a Normal model based on the Central Limit Theorem.

Don’t confuse the two!

A Confidence Interval

• By the 68-95-99.7% Rule, we knowabout 68% of all samples will have ’s within 1 SE

of pabout 95% of all samples will have ’s within 2 SEs

of pabout 99.7% of all samples will have ’s within 3

SEs of pp̂

A Confidence Interval

• Consider the 95% level: There’s a 95% chance that p is no more than 2 SEs

away from . So, if we reach out 2 SEs, we are 95% sure that p

will be in that interval. In other words, if we reach out 2 SEs in either direction of , we can be 95% confident that this interval contains the true proportion.

• This is called a 95% confidence interval.

A 95 % Confidence Interval

What Does “95% Confidence” Mean? • Each confidence interval uses a

sample statistic to estimate a population parameter.

• But, since samples vary, the statistics we use, and thus the confidence intervals we construct, vary as well.

• Our confidence is in the process of constructing the interval, not in any one interval itself.

• “95% confidence” means there is 95% chance that our interval will contain the true parameter.

M. E: Certainty vs. Precision• We can claim, with 95% confidence, that the

interval contains the true population proportion.

• The more confident we want to be, the larger our ME needs to be (makes the interval wider).

p̂2SE( p̂)

M.E: Certainty vs. Precision• To be more confident, we wind up being less precise. • Because of this, every confidence interval is a balance

between certainty and precision.• The tension between certainty and precision is always

there. Fortunately, in most cases we can be both sufficiently certain and sufficiently precise to make useful statements.

• The choice of confidence level is somewhat arbitrary, but keep in mind this tension between certainty and precision when selecting your confidence level.

• The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used).

Critical Values• The ‘2’ in (our 95% confidence interval)

came from the 68-95-99.7% Rule.• Using a table or technology, we find that a more

exact value for our 95% confidence interval is 1.96 instead of 2. We call 1.96 the critical value and denote it z*.

• For any confidence level, we can find the corresponding critical value.

• Example: • For a 90% confidence interval, the critical value is 1.645:

p̂2SE( p̂)

One-Proportion z-Interval• When the conditions are met, we are ready to

find the confidence interval for the population proportion, p.

• The confidence interval is

• The critical value, z*, depends on the particular confidence level, C, that you specify.

p̂z SE p̂

SE( p̂)p̂q̂n

What Can Go Wrong?Margin of Error Too Large to Be Useful:• We can’t be exact, but how precise do we

need to be?• One way to make the margin of error smaller

is to reduce your level of confidence. (That may not be a useful solution.)

• You need to think about your margin of error when you design your study.– To get a narrower interval without giving up

confidence, you need to have less variability.– You can do this with a larger sample…

Homework AssignmentChapter 18:• Problem # 11, 15, 23, 29, 31, 47.

Chapter 19:• Problem # 7, 9, 11, 13, 17, 27, 35, 37.

Chapters 18 - 19

Documents