Estimation Statistics with Confidence. Estimation Before we collect our sample, we know: -3z -2z...

Post on 12-Jan-2016

222 views 5 download

Tags:

transcript

Estimation

Statistics with Confidence

Estimation

Before we collect our sample, we know:

-3z -2z -1z 0z 1z 2z 3z

Repeated sampling sample means would stack up in a normal curve, Centered on the true population mean, With a standard error (measure of dispersion) that depends on 1. population standard deviation 2. sample size

What are they doing?

Estimation

But we do not know: 1. True Population Mean

2. Population Standard Deviation

-3z -2z -1z 0z 1z 2z 3z

Repeated sampling sample means would stack up in a normal curve, Centered on the true population mean, With a standard error (measure of dispersion) that depends on 1. population standard deviation 2. sample size

Estimation

Will our sample be one of these (accurate)?

Or one of these (inaccurate)?

-3z -2z -1z 0z 1z 2z 3z

Estimation

Which is more likely?accurate?or inaccurate?

-3z -2z -1z 0z 1z 2z 3z 68%

95%

Estimation

We’re most likely to get close to the true population mean…

Our sample’s mean is the best guess of the population mean, but it is not precise.

-3z -2z -1z 0z 1z 2z 3z 68%

95%

Estimation

And if we increase our sample size (n)…

-3z -2z -1z 0z 1z 2z 3z 68%

95%

Estimation

And if we increase our sample size our sample mean is an

even better estimate of the population mean, we are more precise!

-3z -2z -1z 0z 1z 2z 3z

68%

95%

-3 -2 -1 0 1 2 3

EstimationWe know that the standard deviation of this pile of

samples (standard error) equals the population standard deviation () divided by the square

root of the sample size (n).

68%

95%

-3 -2 -1 0 1 2 3

Estimation

But we do not know the population standard deviation!

What is our best guess of that?

68%

95%

-3 -2 -1 0 1 2 3

Estimation

Our best guess of the population standard deviation is our sample’s s.d.! On average, this s.d. gives population .

In fact, when we calculate that,we use “n – 1” to make our “estimate” larger to reflect that dispersion of a sample

is smaller than a population’s.

(Yi – Y)2

s = n - 1

Population Dispersion Sample Dispersion0 5 10 15 20 25 30 35

0 5 10 15 20 25 30 35

= Cases in the sample

EstimationSo now we know that we can use the sample standard deviation to stand

in for the population’s standard deviation.

So we can use the formula for standard error with that sestimate and get a good estimate s.e = of the dispersion of the nsampling distribution.

68%

-3 -2 -1 0 1 2 3

95%

EstimationNow we know some limits on how far off our

sample mean is likely to be from the true population mean!

68% of means willbe within +/- 1 s.e.

95% of means willbe within +/- 2 s.e.

68%

-3 -2 -1 0 1 2 3

95%

s

s.e. = n

EstimationFor example, if we took GPAs from a sample of 625

students and our s was .50…

68% of means wouldbe within +/- 1*(.02)

95% of means wouldbe within +/- 2*(.02)

68%

-3 -2 -1 0 1 2 3

95%

.5

s.e. = 625 = 0.02

[0.02]

EstimationGPAs from a sample of 625 students with s = .50…

If our sample werethis one,our estimate ofthe mean wouldbe correct!

68%

-3 -2 -1 0 1 2 3

95%

.5

s.e. = 625 = 0.02

Estimation

GPAs from a sample of 625 students with s = .50…

But what if it werethis one?

We’d be slightly wrong, but well within +/- 2 *(.02)

95% of samples would be!

68%

-3 -2 -1 0 1 2 3

95%

.5

s.e. = 625 = 0.02

Estimation

A sample’s mean is the best estimate of the population mean.

But what if we base our estimate on this erroneous sample?

68%

-3 -2 -1 0 1 2 3

95%

s

s.e. = n

EstimationLet’s create a “measuring device” with our

sampling distribution and center it over our sample’s mean.

Check it Out!The true mean falls withinthe 95% bracket.

68%

-3 -2 -1 0 1 2 3

95%

s

s.e. = n

EstimationWhat if the sample we collected were this one?…and we used the measuring device again?

Check it Out!The true mean falls withinthe 95% bracket.

68%

-3 -2 -1 0 1 2 3

95%

s

s.e. = n

Estimation

The sampling distribution allows us to: 1. Be humble and admit that our sample

statistic may not be the population’s and 2. Forms a measuring device

with which we can determine a range where the true population mean

is likely to fall...this is called a confidence interval.

EstimationIf you calculate your sampling distribution’s standard

error, you can form a device that tells you thatif your sample meanis wrong, there is adocumented a range in which the true population mean is likely 2Xist.

Check it Out!The true mean falls withinthe 95% bracket.

68%

-3 -2 -1 0 1 2 3

95%

s

s.e. = n

Sample

Estimation

For example, if we took GPAs from a sample of 625 students and our mean was 2.5 and s.d. was .50…

We make a confidenceinterval (C.I.)by…

Calculating the s.e. (.02) andGoing +/- 2 * s.e. from the mean.

=2.52

68%

-3 -2 -1 0 1 2 3

95%

.5

s.e. = 625 = 0.02

95% C.I. = 2.5 +/- 2(.02) = 2.46 to 2.54 We are 95% confident that the true mean is in this range!

EstimationGuys… This is power!

Knowing that the spread of 95% of normally distributed sample means has outer limits…

We know that if we put these limits around our sample mean…

We have defined the range where the population mean has a 95% probability of being!

Estimation

Our sample statistics provide enough information to give us a great estimation (highly educated guess) about population statistics.

We do this without needing to know the population mean—without needing to have a census.

EstimationAnother Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

Provide a 95% C.I. = M +/- 2 * (s.e.)1. s.e. = $8,000/2,500 = $1602. 2 * $160 = $3203. C.I. = $28,000 +/- $320

C.I. >>> $27,680 to $28,320

s

s.e. = n

EstimationAnother Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

Provide a 95% C.I. = M +/- 2 * (s.e.)1. s.e. = $8,000/2,500 = $1602. 2 * $160 = $3203. C.I. = $28,000 +/- $320

C.I. >>> $27,680 to $28,320

We are 95% confident that the true mean falls from $27,680 up to $28,320

EstimationNO WAIT! We’re wrong!

Technically speaking, on a normal curve, 95% of cases fall between +/- 1.96 standard deviations rather than 2.

(Check your book’s table.)

Empirical Rule vs. Actuality

68% 1z 0.99z

95% 2z 1.96z

99.9973% 3z 3z

EstimationAnother Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

Provide a 95% C.I. = M +/- 1.96 * (s.e.)1. s.e. = $8,000/2,500 = $1602. 1.96 * $160 = $313.63. C.I. = $28,000 +/- $313.6

C.I. >>> $27,686.4 to $28,313.6

We are 95% confident that the population mean falls between $27,686.4 and $28,313.6

Estimation

Another Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

What if we want a 99% confidence interval, What z do we use?

Check the table in your book!

Estimation

Another Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

What if we want a 99% confidence interval?

99% fall between +/- 2.58 z’s

EstimationAnother Example:

Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000.

What if we want a 99% confidence interval?

1. s.e. = $8,000/2,500 = $1602. 2.58 * $160 = $412.83. C.I. = $28,000 +/- $412.8

CI >>> $27,587.2 to $28,412.8

We are 99% confident that the population mean falls between these values.

Why did the interval get wider than 95% CI’s which was $27,686.4 to $28,313.6???

Estimation99% CI >>> $27,587.2to $28,412.8Why did the interval get wider than 95% CI’s which was $27,686.4 to $28,313.6???

M

68% 95%

-3 -2 -1 0 1 2 3

99%

Estimation…Let’s recap: We can say that 95% of the sample means in repeated

sampling will always be in the range marked by -1.96 over to +1.96 standard errors.

z -3 -2 -1 0 1 2 3

Z-3 -2 -1 0 1 2 3

95% Range

Self-esteem 15 20 25 30 35 40

1.96-1.96

Estimation

And remember: If we don’t know the true population mean, 95% of the time a 95% confidence interval would contain the true population mean!

95% Ranges for different samples.

Self-esteem 15 20 25 30 35 40

Estimation

If we want that range to contain the true population mean 99% of the time (99% confidence interval) we just construct a wider interval, corresponding with 2.58 z’s.

99% Ranges for different samples, overlaying 95% intervals.

Self-esteem 15 20 25 30 35 40

Estimation

25

68%

-3 -1.96 -1 0 1 1.96 3

95%

The sampling distribution’s standard error is a measuring stick that we can use to indicate the range of a specified middle percentage of sample means in repeated sampling.

95%

68%

1.96z

1z

3z

99.99%

99.99%

Estimation

Another Confidence Interval Example:

I collected a sample of 2,500 with an average self-esteem score of 28 with a standard deviation of 8.

What if we want a 99% confidence interval? CI = Mean +/- z * s.e.

1. Find the standard error of the sampling distribution: s.d. / n = 8/50 = 0.16

2. Build the width of the Interval. 99% corresponds with a z of 2.58.2.58 * 0.16 = 0.41

3. Insert the mean to build the interval:99% C.I. = 28 +/- 0.41

The interval: 27.59 to 28.41

We are 99% confident that the population mean falls between these values.

EstimationAnd if we wanted a 95% Confidence Interval instead?

I collected a sample of 2,500 with an average self-esteem score of 28 with a standard deviation of 8.

What if we want a 99% confidence interval? CI = Mean +/- z * s.e.

1. Find the standard error of the sampling distribution: s.d. / n = 8/50 = 0.16

2. Build the width of the Interval. 99% corresponds with a z of 2.58.2.58 * 0.16 = 0.41

3. Insert the mean to build the interval:99% C.I. = 28 +/- 0.41

The interval: 27.59 to 28.41

We are 99% confident that the population mean falls between these values.

X

XX

X X

X X

X X

95%

95% 1.96

1.96 0.31

95% 0.3127.69 to 28.31

X95%

Estimation

By centering my sampling distribution’s +/- 1.96z range around my sample’s mean...

I can identify a range that, if my sample is

one of the middle 95%, would contain the population’s mean.

Or I have a 95% chance that the population’s

mean is somewhere in that range.

Estimation

By centering my sampling distribution’s +/- 1.96z range around my sample’s mean...

I can identify a range that, if my sample is

one of the middle 95%, would contain the population’s mean.

Or I have a 95% chance that the population’s

mean is somewhere in that range.

X2.58z

X99%

X99%