Introduction to Inference Sampling Distributions.

Post on 17-Jan-2018

235 views 0 download

description

Inference with Sample Mean Sample mean is our estimate of population mean How much would the sample mean change if we took a different sample? Key to this question: Sampling Distribution of x Population Sample Parameter:  Statistic: x Sampling Inference Estimation ?

transcript

Introduction to Inference

Sampling Distributions

Inference with a Single Observation

• Each observation Xi in a random sample is a representative of unobserved variables in population

• How different would this observation be if we took a different random sample?

Population

Observation Xi

Parameter:

Sampling Inference

?

Inference with Sample Mean

• Sample mean is our estimate of population mean• How much would the sample mean change if we took

a different sample?• Key to this question: Sampling Distribution of x

Population

Sample

Parameter:

Statistic: x

Sampling Inference

Estimation

?

Sampling Distribution of a Sample Statistic• Sampling Distribution of a Sample Statistic: The

distribution of values for a sample statistic obtained from repeated samples, all of the same size and all drawn from the same population

1) Make a list of all samples of size 2 that can be drawn from this set (Sample with replacement)

2) Construct the sampling distribution for the sample mean for samples of size 23) Construct the sampling distribution for the minimum for samples of size 2

Example: Consider the set {1, 2, 3, 4}:

{1, 1} 1.0 1 1/16{1, 2} 1.5 1 1/16{1, 3} 2.0 1 1/16{1, 4} 2.5 1 1/16{2, 1} 1.5 1 1/16{2, 2} 2.0 2 1/16{2, 3} 2.5 2 1/16{2, 4} 3.0 2 1/16{3, 1} 2.0 1 1/16{3, 2} 2.5 2 1/16{3, 3} 3.0 3 1/16{3, 4} 3.5 3 1/16{4, 1} 2.5 1 1/16{4, 2} 3.0 2 1/16{4, 3} 3.5 3 1/16{4, 4} 4.0 4 1/16

Sample x Minimum Probability

This table lists all possible samples of size 2, the mean for each sample, and the probability of each sample occurring (all equally likely)

# of possible samples (with placement) = Nn

Table of All Possible Samples

1.0 1/161.5 2/162.0 3/162.5 4/163.0 3/163.5 2/164.0 1/16

Sampling Distributionof the Sample Mean

x P x( )

1.0 1.5 2.0 2.5 3.0 3.5 4.00.00

0.05

0.10

0.15

0.20

0.25

x

P x( )

Histogram: Sampling Distributionof the Sample Mean

Sampling Distribution• Summarize the information in the previous table to obtain the sampling distribution of the sample mean and the sample minimum:

Sampling Distribution of Sample Mean• Distribution of values taken by statistic in all possible

samples of size n from the same population• Model assumption: our observations xi are sampled

from a population with mean and variance 2

PopulationUnknown

Parameter:

Sample 1 of size n xSample 2 of size n xSample 3 of size n xSample 4 of size n xSample 5 of size n xSample 6 of size n xSample 7 of size n xSample 8 of size n x .

. .

Distributionof thesevalues?

Mean of Sample Mean• First, we examine the center of the sampling

distribution of the sample mean.

• Center of the sampling distribution of the sample mean is the unknown population mean:

mean( X ) = μ• Over repeated samples, the sample mean will, on

average, be equal to the population mean – no guarantees for any one sample!

Variance of Sample Mean• Next, we examine the spread of the sampling

distribution of the sample mean

• The variance of the sampling distribution of the sample mean is

variance( X ) = 2/n

• As sample size increases, variance of the sample mean decreases! • Averaging over many observations is more accurate than

just looking at one or two observations

• Comparing the sampling distribution of the sample mean when n = 1 (parent population) vs. n = 10

Law of Large Numbers

• Remember the Law of Large Numbers:• If one draws independent samples from a

population with mean μ, then as the number of observations increases, the sample mean x gets closer and closer to the population mean μ

• This is easier to see now since we know that

mean(x) = μ

variance(x) = 2/n 0 as n gets large

Example• Population: seasonal home-run totals for

7032 baseball players from 1901 to 1996• Take different samples from this population and

compare the sample mean we get each time• In real life, we can’t do this because we don’t

usually have the entire population!

Sample Size Mean Variance100 samples of size n = 1 3.69 46.8

100 samples of size n = 10 4.43 4.43

100 samples of size n = 100 4.42 0.43

100 samples of size n = 1000 4.42 0.06

Population Parameter = 4.42

Distribution of Sample Mean

• We now know the center and spread of the sampling distribution for the sample mean.

• What about the shape of the distribution?

• If our data x1,x2,…, xn follow a Normal distribution, then the sample mean x will also follow a Normal distribution!

Example

• Mortality in US cities (deaths/100,000 people)

• This variable seems to approximately follow a Normal distribution, so the sample mean will also approximately follow a Normal distribution irrespective of the sample size drawn.

Central Limit Theorem

• What if the original data doesn’t follow a Normal distribution?

• HR/Season for sample of baseball players

• If the sample is large enough, it doesn’t matter!

Central Limit Theorem• If the sample size is large enough (n≥ 30),

then the sample mean x has an approximately Normal distribution

• This is true no matter what the shape of the distribution of the original data!

Example: Home Runs per Season• Take many different samples from the seasonal HR

totals for a population of 7032 players• Calculate sample mean for each sample

n = 1

n = 10

n = 100

Important Definition & Theorem

Central Limit TheoremThe sampling distribution of sample means will become normal as the sample size increases.

Sampling Distribution of Sample MeansIf all possible random samples, each of size n, are taken from any population with a mean and a standard deviation , the sampling distribution of sample means will:

1. have a mean equal to

2. have a standard deviation equal to

Further, if the sampled population has a normal distribution, then the sampling distribution of will also be normal for samples of all sizes

n

x

x

x

Summary

• The standard deviation of the sampling distribution of (also called the standard error of the mean) is equal to the standard deviation of the original population divided by the square root of the sample size:Notes: – The distribution of becomes more compact as n increases. (Why?)– The variance of :

x

x n

xx x n2 2

• The distribution of is (exactly) normal when the original population is normal

x

• The CLT says: the distribution of is approximately normal regardless of the shape of the original distribution, when the sample size is large enough!

x

• The mean of the sampling distribution of is equal to the mean of the original population:

x x

Standard Error of the Mean

Notes:• The n in the formula for the standard error of the mean is

the size of the sample

• The proof of the Central Limit Theorem is beyond the scope of this course

• The following example illustrates the results of the Central Limit Theorem

Standard Error of the Mean: The standard deviation of the sampling distribution of sample means: x n

Graphical Illustration of the Central Limit TheoremOriginal Population

x10 3020

10 x

Distribution of x: n = 10

x

Distribution of x:n = 30

10 20

x

Distribution of x: n = 2

10 3020

7.3 ~ Applications of the Central Limit Theorem

• When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately normally distributed (by the CLT), we can answer probability questions using the standard normal distribution, using the z standard score for dealing with the normal distribution,

Example 2 Example: Consider a normal population with = 50

and = 15. Suppose a sample of size 9 is selected at random. Find:

P x( )45 60

P x( . )47 5

1)2)

Solutions: Since the original population is normal, the distribution of the sample mean is also (exactly) normal

1) x 50

x n 15 9 15 3 52)

5045 60 x0 1.00 2.00 z

0 3413. 0 4772.

Example 2

P x P

P z

( )

(. . .

45 60 45 505

60 505

1.00 2.00)0 3413 0 4772 08185

zz = ;x - n

5047.5 x0-0.50 z

01915.0 3085.

Example 2

P x P x

P z

( . ) .

( . ). . .

47 5 505

47 5 505

505000 01915 0 3085

z = ;x - n