+ All Categories
Home > Documents > Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn...

Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn...

Date post: 17-Jan-2016
Category:
Upload: nickolas-sharp
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Week 7 Sample Means & Proportions
Transcript
Page 1: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Week 7

Sample Means & Proportions

Page 2: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Variability of Summary Statistics

Variability in shape of distn of sample

Variability in summary statistics Mean, median, st devn, upper quartile, …

Summary statistics have distributions

Page 3: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Parameters and statistics

Parameter describes underlying population Constant Greek letter (e.g. , , , …) Unknown value in practice

Summary statistic Random Roman letter (e.g. m, s, p, …)

We hope statistic will tell us about corresponding parameter

Page 4: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Distn of sample vsSampling distn of statistic

Values in a single random sample have a distribution

Single sample --> single value for statistic

Sample-to-sample variability of statistic is its sampling distribution.

Page 5: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Means

Unknown population mean,

Sample mean, X, has a distribution — its sampling distribution.

Usually x ≠

A single sample mean, x, gives us information about

Page 6: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Sampling distribution of mean

If sample size, n, increases:

Spread of distn of sample is (approx) same.

Spread of sampling distn of mean gets smaller. x is likely to be closer to x becomes a better estimate of

Page 7: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Sampling distribution of mean

Sample mean, X, has sampling distn with: Mean,

St devn,

Population with mean , st devn

(We will deal later with the problem that and are unknown in practice.)

X

= μ

X

= σn

Random sample (n independent values)

Page 8: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Weight loss

Random sample of n = 25 people Sample mean, x

Estimate mean weight loss for those attending clinic for 10 weeks

How accurate?

Let’s see, if the population distn of weight loss is:

X ~ normal μ =8lb, σ =5lb ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

Page 9: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Some samples

Four random samples of n = 25 people:

1. Mean = 8.32 pounds, st devn = 4.74 pounds

2. Mean = 8.32 pounds, st devn = 4.74 pounds

3. Mean = 8.48 pounds, st devn = 5.27 pounds

4. Mean = 7.16 pounds, st devn = 5.93 pounds

N.B. In all samples, x ≠

Page 10: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Sampling distribution

Means from simulation of 400 samples

Theory:

(How does this compare to simulation? To popn distn?)

mean = = 8 lb, s.d.( ) = lbx 125

5==

n

Page 11: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Errors in estimation

From 70-95-100 rule x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error

Even if we didn’t know x is unlikely to be more than 3 lb in error

X ~ normal μ =8lb, σ =5lb ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

mean = = 8 lb, s.d.( ) = lbx 125

5==

n

Population

Sampling distribution of mean

Page 12: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Increasing sample size, n

If we sample n = 100 people instead of 25:

s.d.( ) = lb.x 5.0100

5==

n

Larger samples more accurate estimates

Page 13: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Central Limit Theorem

If population is normal (, )

If popn is non-normal with (, ) but n is large

Guideline: n > 30 even if very non-normal

X ~ normal, n

⎝ ⎜

⎠ ⎟

X approx ~ normal, n

⎝ ⎜

⎠ ⎟

Page 14: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Other summary statistics

E.g. Lower quartile, proportion, correlation

Usually not normal distns

Formula for standard devn of samling distn sometimes

Sampling distn usually close to normal if n is large

Page 15: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Lottery problem

Pennsylvania Cash 5 lottery 5 numbers selected from 1-39 Pick birthdays of family members (none 32-39) P(highest selected is 32 or over)?

Statistic:

H = highest of 5 random numbers (without replacement)

Page 16: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Lottery simulation

Simulation: Generated 5 numbers (without replacement) 1560 times

Theory? Fairly hard.

Highest number > 31 in about 72% of repetitions

Page 17: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Normal distributions

Family of distributions (populations) Shape depends only on parameters (mean) & (st devn)

All have same symmetric ‘bell shape’

= 65 inches, = 2.7 inches

Page 18: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Importance of normal distn

A reasonable model for many data sets

Transformed data often approx normal

Sample means (and many other statistics) are approx normal.

Page 19: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Standard normal distribution

Z ~ Normal ( = 0, = 1)

0 1 2 3-1-2-3

Prob ( Z < z* )

Page 20: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probabilities for normal (0, 1)

Check from tables:P(Z -3.00) =

P(Z −2.59) =

P(Z 1.31) =

P(Z 2.00) =

P(Z -4.75) =

0.0013

0 .0048

0 .9049

0 .9772

0 .000001

Page 21: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probability Z > 1.31

P(Z > 1.31) = 1 – P(Z 1.31)

= 1 – .9049 = .0951

Page 22: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Prob ( Z between –2.59 and 1.31)

P(-2.59 Z 1.31)

= P(Z 1.31) – P(Z -2.59)

= .9049 – .0048 = .9001

Page 23: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Standard devns from mean

Normal (, )

= 65 inches, = 2.7 inches

Heightsof students

Page 24: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probability and area

X ~ normal ( = 65 , = 2.7 )

P (X ≤ 67.7) = area

Page 25: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probability and area (cont.)

Exactly 70-95-100 rule

P(X within of ) = 0.683 approx 70% P(X within 2 of ) = 0.954 approx 95% P(X within 3 of ) = 0.997 approx 100%

Normal (, )

Page 26: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Finding approx probabilities

Prob (X ≤ 62 )?

About 1/8

Ht of college woman, X ~ normal ( = 65 , = 2.7 )

P (X ≤ 62) = area

1. Sketch normal density

2. Estimate area

Page 27: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Translate question from X to Z

Translate to z-score:

Z ~ Normal ( = 0, = 1)0 1 2 3-1-2-3

X ~ Normal (, ) Find P(X ≤ x*)

Z = X −

x*

z*

Page 28: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Finding probabilities

Prob (height of randomly selected college woman ≤ 62 )?

P X ≤6( ) =P Z≤6 −65.7

⎝ ⎜

⎠ ⎟

=P Z≤−1.11( ) =.15About 13%.

Page 29: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Prob (X > value)

( ) ( ) ( )

1335.8665.1

11.1111.17.2

658668

=−=

≤−=>=⎟⎠

⎞⎜⎝

⎛ −>=> ZPZPZPXP

Prob (X > 68 inches)?

Ht of college woman, X ~ normal ( = 65 , = 2.7 )

Page 30: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Finding upper quartile

Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile?

Step 1: Solve for z-score

Step 2: Calculate x = z* + x = (0.67)(10) + 120 = 126.7 or about 127.

Closest z* with area of 0.7500 (tables)

z = 0.67

Page 31: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probabilities about means

Blood pressure ~ normal ( = 120, = 10)

8 people given drug

If drug does not affect blood pressure, Find P(average blood pressure > 130)

Page 32: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

P ( X > 130) ?

prob = 0.0023

X ~ normal X =10 , X =108 = .54

⎝ ⎜

⎠ ⎟

z = 10 −10.54

= .8

Very little chance!

X ~ normal ( = 120, = 10) n = 8

Page 33: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Distribution of sum

X ~ distn with (, )

aX ~ distn with (a, a)

X ~ distn with , n

⎝ ⎜

⎠ ⎟

X = n∑ X ~ distn with n, n( )

Central Limit Theorem implies approx normal

e.g. milesto kilometers

Page 34: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probabilities about sum

Profit in 1 day ~ normal ( = $300, = $200)

Prob(total profit in week < $1,000)?

Total =

Prob = 0.0188

X∑ ~ normal 7 =,100, 7 =59( )

z = 1000 −10059

= −.08 Assumesindependence

Page 35: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Categorical data

Most important parameter is = Prob (success)

Corresponding summary statistic is p = Proportion (success)

N.B. Textbook uses p and p̂

Page 36: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Number of successes

Easiest to deal with count of successes before proportion.

If…

1. n “trials” (fixed beforehand).

2. Only “success” or “failure” possible for each trial.

3. Outcomes are independent.

4. Prob (success), remains same for all trials, .• Prob (failure) is 1 – .

X = number of successes ~ binomial (n, )

Page 37: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Examples

Page 38: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Binomial Probabilities

Prob (win game) = 0.2

Plays of game are independent.

What is Prob (wins 2 out of 3 games)?

What is P(X = 2)?

P X =k( ) =n!

k! n−k( )! k 1−( )

n−kfor k = 0, 1, 2, …, n

P X =( ) =!

! −( )!. 1−.( )

=(.) (.8)1 =0.096

You won’t need to use this!!

Page 39: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Mean & st devn of Binomial

For a binomial (n, )

Mean =n Standard deviation = n 1−( )

Page 40: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Extraterrestrial Life?

50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?”

Sample of n = 100

X = # “yes” ~ binomial (n = 100, = 0.5)

Mean =E X( ) =100 (.5) =50

Standard deviation = 100 (.5) .5( ) =5

Page 41: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Extraterrestrial Life?

70-95-100 rule of thumb for # “yes” About 95% chance of between 40 & 60 Almost certainly between 35 & 65

Sample of n = 100

X = # “yes” ~ binomial (n = 100, = 0.5)

=E X( ) =100(.5) =50

= 100(.5) .5( ) =5

Page 42: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Normal approx to binomial

If X is binomial (n , ), and n is large, then X is also approximately normal, with

Conditions: Both n and n(1 – ) are at least 10.

Mean =E X( ) =n

Standard deviation = n 1−( )

(Justified by Central Limit Theorem)

Page 43: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Number of H in 30 Flips

X = # heads in n = 30 flips of fair coinX ~ binomial ( n = 30, = 0.5)

=E X( ) =0(.5) =15

= 0(.5) .5( ) =.74

Bell-shaped & approx normal.

Page 44: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Opinion poll

n = 500 adults; 240 agreed with statement

=E X( ) =500(.5) =50

= 100(.5) .5( ) =11.

X is approx normal with

P X ≤40( ) ≈P Z ≤40 −5011.

⎝ ⎜

⎠ ⎟=P Z ≤−.89( ) =.1867

Not unlikely to see 48% or less, even if 50% in population agree.

If = 0.5 of all adults agree, what P(X ≤ 240) ?

Page 45: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Sample Proportion

Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40).

Random sample of 25 people; X = # with gene. X ~ binomial (n = 25 , = 0.4)

p = proportion with gene

p = Xn

Page 46: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Distn of sample proportion

X ~ binomial (n , )

X =n

X = n 1−( )

p = Xn

p =

p = 1−( )

n

Large n:p is approx normal

(n ≥ 10 & n (1 – ) ≥ 10)

Page 47: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Examples

Election Polls: to estimate proportion who favor a candidate; units = all voters.

Television Ratings: to estimate proportion of households watching TV program; units = all households with TV.

Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers.

Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

Page 48: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Public opinion pollSuppose 40% of all voters favor Candidate A.

Pollsters sample n = 2400 voters.

Simulation 400 times & theory.

p = = 0.4

p = 1−( )

n = 0.4 ×0.6

400 = 0.01

Propn voting for A is approx normal

Page 49: Week 7 Sample Means & Proportions. Variability of Summary Statistics Variability in shape of distn of sample Variability in summary statistics Mean, median,

Probability from normal approx

If 40% of voters favor Candidate A, and n = 2400 sampled

p = 0.4

p = 0.01

Sample proportion, p, is almost certain to be between 0.37 and 0.43

Prob 0.95 of p being between 0.38 and 0.42


Recommended