+ All Categories

Week 7

Date post: 19-Jan-2016
Category:
Upload: oneida
View: 47 times
Download: 0 times
Share this document with a friend
Description:
Week 7. Sample Means & Proportions. Variability of Summary Statistics. Variability in shape of distn of sample Variability in summary statistics Mean, median, st devn, upper quartile, … Summary statistics have distributions. Parameters and statistics. - PowerPoint PPT Presentation
Popular Tags:
49
Week 7 Sample Means & Proportions
Transcript
Page 1: Week 7

Week 7

Sample Means & Proportions

Page 2: Week 7

Variability of Summary Statistics

Variability in shape of distn of sample

Variability in summary statistics Mean, median, st devn, upper quartile, …

Summary statistics have distributions

Page 3: Week 7

Parameters and statistics

Parameter describes underlying population Constant Greek letter (e.g. , , , …) Unknown value in practice

Summary statistic Random Roman letter (e.g. m, s, p, …)

We hope statistic will tell us about corresponding parameter

Page 4: Week 7

Distn of sample vsSampling distn of statistic

Values in a single random sample have a distribution

Single sample --> single value for statistic

Sample-to-sample variability of statistic is its sampling distribution.

Page 5: Week 7

Means

Unknown population mean,

Sample mean, X, has a distribution — its sampling distribution.

Usually x ≠

A single sample mean, x, gives us information about

Page 6: Week 7

Sampling distribution of mean

If sample size, n, increases:

Spread of distn of sample is (approx) same.

Spread of sampling distn of mean gets smaller. x is likely to be closer to x becomes a better estimate of

Page 7: Week 7

Sampling distribution of mean

Sample mean, X, has sampling distn with: Mean,

St devn,

Population with mean , st devn

(We will deal later with the problem that and are unknown in practice.)

X

= μ

X

= σn

Random sample (n independent values)

Page 8: Week 7

Weight loss

Random sample of n = 25 people Sample mean, x

Estimate mean weight loss for those attending clinic for 10 weeks

How accurate?

Let’s see, if the population distn of weight loss is:

X ~ normal μ =8lb, σ =5lb ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

Page 9: Week 7

Some samples

Four random samples of n = 25 people:

1. Mean = 8.32 pounds, st devn = 4.74 pounds

2. Mean = 8.32 pounds, st devn = 4.74 pounds

3. Mean = 8.48 pounds, st devn = 5.27 pounds

4. Mean = 7.16 pounds, st devn = 5.93 pounds

N.B. In all samples, x ≠

Page 10: Week 7

Sampling distribution

Means from simulation of 400 samples

Theory:

(How does this compare to simulation? To popn distn?)

mean = = 8 lb, s.d.( ) = lbx 125

5==

n

Page 11: Week 7

Errors in estimation

From 70-95-100 rule x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error

Even if we didn’t know x is unlikely to be more than 3 lb in error

X ~ normal μ =8lb, σ =5lb ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

mean = = 8 lb, s.d.( ) = lbx 125

5==

n

Population

Sampling distribution of mean

Page 12: Week 7

Increasing sample size, n

If we sample n = 100 people instead of 25:

s.d.( ) = lb.x 5.0100

5==

n

Larger samples more accurate estimates

Page 13: Week 7

Central Limit Theorem

If population is normal (, )

If popn is non-normal with (, ) but n is large

Guideline: n > 30 even if very non-normal

X ~ normal, n

⎝ ⎜

⎠ ⎟

X approx ~ normal, n

⎝ ⎜

⎠ ⎟

Page 14: Week 7

Other summary statistics

E.g. Lower quartile, proportion, correlation

Usually not normal distns

Formula for standard devn of samling distn sometimes

Sampling distn usually close to normal if n is large

Page 15: Week 7

Lottery problem

Pennsylvania Cash 5 lottery 5 numbers selected from 1-39 Pick birthdays of family members (none 32-39) P(highest selected is 32 or over)?

Statistic:

H = highest of 5 random numbers (without replacement)

Page 16: Week 7

Lottery simulation

Simulation: Generated 5 numbers (without replacement) 1560 times

Theory? Fairly hard.

Highest number > 31 in about 72% of repetitions

Page 17: Week 7

Normal distributions

Family of distributions (populations) Shape depends only on parameters (mean) & (st devn)

All have same symmetric ‘bell shape’

= 65 inches, = 2.7 inches

Page 18: Week 7

Importance of normal distn

A reasonable model for many data sets

Transformed data often approx normal

Sample means (and many other statistics) are approx normal.

Page 19: Week 7

Standard normal distribution

Z ~ Normal ( = 0, = 1)

0 1 2 3-1-2-3

Prob ( Z < z* )

Page 20: Week 7

Probabilities for normal (0, 1)

Check from tables:P(Z -3.00) =

P(Z −2.59) =

P(Z 1.31) =

P(Z 2.00) =

P(Z -4.75) =

0.0013

0 .0048

0 .9049

0 .9772

0 .000001

Page 21: Week 7

Probability Z > 1.31

P(Z > 1.31) = 1 – P(Z 1.31)

= 1 – .9049 = .0951

Page 22: Week 7

Prob ( Z between –2.59 and 1.31)

P(-2.59 Z 1.31)

= P(Z 1.31) – P(Z -2.59)

= .9049 – .0048 = .9001

Page 23: Week 7

Standard devns from mean

Normal (, )

= 65 inches, = 2.7 inches

Heightsof students

Page 24: Week 7

Probability and area

X ~ normal ( = 65 , = 2.7 )

P (X ≤ 67.7) = area

Page 25: Week 7

Probability and area (cont.)

Exactly 70-95-100 rule

P(X within of ) = 0.683 approx 70% P(X within 2 of ) = 0.954 approx 95% P(X within 3 of ) = 0.997 approx 100%

Normal (, )

Page 26: Week 7

Finding approx probabilities

Prob (X ≤ 62 )?

About 1/8

Ht of college woman, X ~ normal ( = 65 , = 2.7 )

P (X ≤ 62) = area

1. Sketch normal density

2. Estimate area

Page 27: Week 7

Translate question from X to Z

Translate to z-score:

Z ~ Normal ( = 0, = 1)0 1 2 3-1-2-3

X ~ Normal (, ) Find P(X ≤ x*)

Z = X −

x*

z*

Page 28: Week 7

Finding probabilities

Prob (height of randomly selected college woman ≤ 62 )?

P X ≤6( ) =P Z≤6 −65.7

⎝ ⎜

⎠ ⎟

=P Z≤−1.11( ) =.15About 13%.

Page 29: Week 7

Prob (X > value)

( ) ( ) ( )

1335.8665.1

11.1111.17.2

658668

=−=

≤−=>=⎟⎠

⎞⎜⎝

⎛ −>=> ZPZPZPXP

Prob (X > 68 inches)?

Ht of college woman, X ~ normal ( = 65 , = 2.7 )

Page 30: Week 7

Finding upper quartile

Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile?

Step 1: Solve for z-score

Step 2: Calculate x = z* + x = (0.67)(10) + 120 = 126.7 or about 127.

Closest z* with area of 0.7500 (tables)

z = 0.67

Page 31: Week 7

Probabilities about means

Blood pressure ~ normal ( = 120, = 10)

8 people given drug

If drug does not affect blood pressure, Find P(average blood pressure > 130)

Page 32: Week 7

P ( X > 130) ?

prob = 0.0023

X ~ normal X =10 , X =108 = .54

⎝ ⎜

⎠ ⎟

z = 10 −10.54

= .8

Very little chance!

X ~ normal ( = 120, = 10) n = 8

Page 33: Week 7

Distribution of sum

X ~ distn with (, )

aX ~ distn with (a, a)

X ~ distn with , n

⎝ ⎜

⎠ ⎟

X = n∑ X ~ distn with n, n( )

Central Limit Theorem implies approx normal

e.g. milesto kilometers

Page 34: Week 7

Probabilities about sum

Profit in 1 day ~ normal ( = $300, = $200)

Prob(total profit in week < $1,000)?

Total =

Prob = 0.0188

X∑ ~ normal 7 =,100, 7 =59( )

z = 1000 −10059

= −.08 Assumesindependence

Page 35: Week 7

Categorical data

Most important parameter is = Prob (success)

Corresponding summary statistic is p = Proportion (success)

N.B. Textbook uses p and p̂

Page 36: Week 7

Number of successes

Easiest to deal with count of successes before proportion.

If…

1. n “trials” (fixed beforehand).

2. Only “success” or “failure” possible for each trial.

3. Outcomes are independent.

4. Prob (success), remains same for all trials, .• Prob (failure) is 1 – .

X = number of successes ~ binomial (n, )

Page 37: Week 7

Examples

Page 38: Week 7

Binomial Probabilities

Prob (win game) = 0.2

Plays of game are independent.

What is Prob (wins 2 out of 3 games)?

What is P(X = 2)?

P X =k( ) =n!

k! n−k( )! k 1−( )

n−kfor k = 0, 1, 2, …, n

P X =( ) =!

! −( )!. 1−.( )

=(.) (.8)1 =0.096

You won’t need to use this!!

Page 39: Week 7

Mean & st devn of Binomial

For a binomial (n, )

Mean =n Standard deviation = n 1−( )

Page 40: Week 7

Extraterrestrial Life?

50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?”

Sample of n = 100

X = # “yes” ~ binomial (n = 100, = 0.5)

Mean =E X( ) =100 (.5) =50

Standard deviation = 100 (.5) .5( ) =5

Page 41: Week 7

Extraterrestrial Life?

70-95-100 rule of thumb for # “yes” About 95% chance of between 40 & 60 Almost certainly between 35 & 65

Sample of n = 100

X = # “yes” ~ binomial (n = 100, = 0.5)

=E X( ) =100(.5) =50

= 100(.5) .5( ) =5

Page 42: Week 7

Normal approx to binomial

If X is binomial (n , ), and n is large, then X is also approximately normal, with

Conditions: Both n and n(1 – ) are at least 10.

Mean =E X( ) =n

Standard deviation = n 1−( )

(Justified by Central Limit Theorem)

Page 43: Week 7

Number of H in 30 Flips

X = # heads in n = 30 flips of fair coinX ~ binomial ( n = 30, = 0.5)

=E X( ) =0(.5) =15

= 0(.5) .5( ) =.74

Bell-shaped & approx normal.

Page 44: Week 7

Opinion poll

n = 500 adults; 240 agreed with statement

=E X( ) =500(.5) =50

= 100(.5) .5( ) =11.

X is approx normal with

P X ≤40( ) ≈P Z ≤40 −5011.

⎝ ⎜

⎠ ⎟=P Z ≤−.89( ) =.1867

Not unlikely to see 48% or less, even if 50% in population agree.

If = 0.5 of all adults agree, what P(X ≤ 240) ?

Page 45: Week 7

Sample Proportion

Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40).

Random sample of 25 people; X = # with gene. X ~ binomial (n = 25 , = 0.4)

p = proportion with gene

p = Xn

Page 46: Week 7

Distn of sample proportion

X ~ binomial (n , )

X =n

X = n 1−( )

p = Xn

p =

p = 1−( )

n

Large n:p is approx normal

(n ≥ 10 & n (1 – ) ≥ 10)

Page 47: Week 7

Examples

Election Polls: to estimate proportion who favor a candidate; units = all voters.

Television Ratings: to estimate proportion of households watching TV program; units = all households with TV.

Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers.

Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

Page 48: Week 7

Public opinion pollSuppose 40% of all voters favor Candidate A.

Pollsters sample n = 2400 voters.

Simulation 400 times & theory.

p = = 0.4

p = 1−( )

n = 0.4 ×0.6

400 = 0.01

Propn voting for A is approx normal

Page 49: Week 7

Probability from normal approx

If 40% of voters favor Candidate A, and n = 2400 sampled

p = 0.4

p = 0.01

Sample proportion, p, is almost certain to be between 0.37 and 0.43

Prob 0.95 of p being between 0.38 and 0.42


Recommended