+ All Categories
Home > Documents > Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables...

Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables...

Date post: 26-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
9
Unit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C ¸ etinkaya-Rundel October 24, 2013 Announcements Announcements PA opens at 5pm today, due Sat evening (based on feedback on midterm evals) If I still have your midterm or project proposal, pick it up at the end of class New unit next week... Statistics 104 (Mine C ¸ etinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 2/1 The t distribution for the difference of two means Diamonds 0.99 carat 1 carat pt99 pt100 ¯ x 44.50 53.43 s 13.32 12.22 n 23 30 carat = 0.99 carat = 1 20 30 40 50 60 70 80 These data are a random sample from the Statistics 104 (Mine C ¸ etinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 3/1 The t distribution for the difference of two means Hypothesis testing for the difference of two means From last time.. We are interested in finding out if the average point price of a 1 carat diamond is higher than the average point price of a 0.99 carat diamond. H 0 :μ pt 99 = μ pt 100 H A :μ pt 99 pt 100 SE = 3.56 T = -2.508 df = 22 Statistics 104 (Mine C ¸ etinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 4/1
Transcript
Page 1: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

Unit 4: Inference for numerical variablesLecture 4: ANOVA

Statistics 104

Mine Cetinkaya-Rundel

October 24, 2013

Announcements

Announcements

PA opens at 5pm today, due Sat evening (based on feedback onmidterm evals)

If I still have your midterm or project proposal, pick it up at theend of class

New unit next week...

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 2 / 1

The t distribution for the difference of two means

Diamonds

0.99 carat 1 caratpt99 pt100

x 44.50 53.43s 13.32 12.22n 23 30

carat = 0.99 carat = 1

20

30

40

50

60

70

80

These data are a random sample from the diamonds data set in ggplot2 R package.Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 3 / 1

The t distribution for the difference of two means Hypothesis testing for the difference of two means

From last time..

We are interested in finding out if the average point price of a 1 caratdiamond is higher than the average point price of a 0.99 caratdiamond.

H0 :µpt99 = µpt100

HA :µpt99 < µpt100

SE = 3.56

T = −2.508

df = 22

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 4 / 1

Page 2: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

The t distribution for the difference of two means Hypothesis testing for the difference of two means

p-value

Clicker question

Which of the following is the correct p-value for this hypothesis test?

T = −2.508

(a) between 0.005 and 0.01

(b) between 0.01 and 0.025

(c) between 0.01 and 0.025

(d) between 0.02 and 0.05

(e) between 0.01 and 0.02

one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010

df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 5 / 1

The t distribution for the difference of two means Confidence intervals for the difference of two means

Application exercise: t interval for comparing means

The equivalent confidence level for a one sided HT with α = 0.05 is90%. Calculate a 90% confidence interval for the average differencebetween the point prices of 0.99 and 1 carat diamonds, and choosethe closest answer below. Then, interpret this interval in context of thedata.

(a) (-15.05, -2.81)

(b) (-15.05, -2.81)

(c) (-15.91, -1.95)

(d) (-16.30, -1.56)

(e) (-15.05, 2.81)

(f) (-16.30, 1.56)

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 6 / 1

The t distribution for the difference of two means Confidence intervals for the difference of two means

Solution

one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010

df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79

(xpt99 − xpt1) ± t?df × SE = (44.50 − 53.43) ± 1.72 × 3.56

= −8.93 ± 6.12

= (−15.05,−2.81)

We are 90% confident that the average point price of a 0.99 caratdiamond is $15.05 to $2.81 lower than the average point price of a 1carat diamond.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 7 / 1

The t distribution for the difference of two means Confidence intervals for the difference of two means

Synthesis

How (if at all) would this conclusion change your behaviour if you wentdiamond shopping?

Maybe buy a 0.99 carat diamond?It looks like a 1 carat, but issignificantly cheaper.

http:// rstudio-pubs-static.s3.amazonaws.com/

2176 75884214fc524dc0bc2a140573da38bb.html

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 8 / 1

Page 3: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA Classy vocabulary

The GSS gives the following 10 question vocabulary test:

A SPACE (school, noon, captain, room, board, don’t know)B BROADEN (efface, make level, elapse, embroider, widen, don’t know)C EMANATE (populate, free, prominent, rival, come, don’t know)D EDIBLE (auspicious, eligible, fit to eat, sagacious, able to speak, don’t know)E ANIMOSITY (hatred, animation, disobedience, diversity, friendship, don’t know)F PACT (puissance, remonstrance, agreement, skillet, pressure, don’t know)G CLOISTERED (miniature, bunched, arched, malady, secluded, don’t know)H CAPRICE (value, a star, grimace, whim, inducement, don’t know)I ACCUSTOM (disappoint, customary, encounter, get used to, business, don’t know)J ALLUSION (reference, dream, eulogy, illusion, aria, don’t know)

vocabulary scores

0 2 4 6 8 10

010

020

0

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 9 / 1

ANOVA Classy vocabulary

The GSS also asks the following question: “If you were asked to useone of four names for your social class, which would you say youbelong in: the lower class, the working class, the middle class, or theupper class?”

LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS

(self reported) class

0.0

0.1

0.2

0.3

0.4

0.5

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 10 / 1

ANOVA Classy vocabulary

Data

wordsum class1 6 MIDDLE CLASS2 9 WORKING CLASS3 6 WORKING CLASS4 5 WORKING CLASS5 6 WORKING CLASS6 6 WORKING CLASS7 8 MIDDLE CLASS8 10 WORKING CLASS9 8 WORKING CLASS

10 9 UPPER CLASS· · ·

795 9 MIDDLE CLASS

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 11 / 1

ANOVA Classy vocabulary

Exploratory analysis

●●●●●●

●●

●●●●

LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS

02

46

810

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 12 / 1

Page 4: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA ANOVA and the F test

Clicker question

Which of the following plots shows groups with means that are mostand least likely to be significantly different from each other?

510

1520

2530

35

I

●●−

50

510

1520

II●

−5

05

1015

2025

III

(a) most: I, least: II

(b) most: I, least: II

(c) most: II, least: III

(d) most: I, least: III

(e) most: III, least: II

(f) most: II, least: I

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 13 / 1

ANOVA ANOVA and the F test

Research question

Is there a difference between the average vocabulary scores of Amer-icans from different (self reported) classes?

To compare means of 2 groups we use a Z or a T statistic.

To compare means of 3+ groups we use a new test calledANOVA and a new statistic called F.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 14 / 1

ANOVA ANOVA and the F test

ANOVA - hypotheses

H0 : The mean outcome is the same across all categories,

µ1 = µ2 = · · · = µk ,

where µi represents the mean of the outcome for observations incategory i.

HA : At least one pair of means are different from each other.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 15 / 1

ANOVA ANOVA and the F test

z/t test vs. ANOVA - Purpose

z/t test

Compare means from two groupsto see whether they are so farapart that the observed differencecannot reasonably be attributed tosampling variability.

H0 : µ1 = µ2

ANOVA

Compare the means from two ormore groups to see whether theyare so far apart that the observeddifferences cannot all reasonablybe attributed to samplingvariability.

H0 : µ1 = µ2 = · · · = µk

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 16 / 1

Page 5: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA ANOVA and the F test

z/t test vs. ANOVA - Method

z/t test

Compute a test statistic (a ratio).

z/t =(x1 − x2) − (µ1 − µ2)

SE(x1 − x2)

ANOVA

Compute a test statistic (a ratio).

F =variability bet. groupsvariability w/in groups

Large test statistics lead to small p-values.

If the p-value is small enough H0 is rejected, and we concludethat the population means are not equal.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 17 / 1

ANOVA ANOVA and the F test

F distribution and p-value

F =variability bet. groupsvariability w/in groups

In order to be able to reject H0, we need a small p-value, whichrequires a large F statistic.

In order to obtain a large F statistic, variability between samplemeans needs to be greater than variability within sample means.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 18 / 1

ANOVA ANOVA and the F test

Goal: Determine measures of variability between and within groups,so that we can make a decision on the hypotheses based on howthey compare to each other.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 19 / 1

ANOVA ANOVA output, deconstructed

Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628

Total 794 3106.36

Sum of squares total, SST

Measures the total variability in the data

SST =n∑

i=1

(xi − x)2

where xi represent the value of the response variable of each obser-vation in the dataset.[Very similar to calculation of variance, except not scaled by the sample size.]

SST = (6 − 6.14)2 + (9 − 6.14)2 + · · ·+ (9 − 6.14)2

= 3106.36

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 20 / 1

Page 6: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA ANOVA output, deconstructed

Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628

Total 794 3106.36

Sum of squares between groups, SSG

Measures the variability between groups, i.e. how the group meanscompare to the grand mean

SSG =k∑

j=1

ni(xj − x)2

nj : each group size, xj : average for each group, x: overall (grand)mean[Explained variability: deviation of group mean from overall mean, weighted by sample size.]

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98

SSG =(41 × (5.07 − 6.14)2

)+

(407 × (5.75 − 6.14)2

)+

(331 × (6.76 − 6.14)2

)+

(16 × (6.19 − 6.14)2

)= 236.56Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 21 / 1

ANOVA ANOVA output, deconstructed

Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628

Total 794 3106.36

Sum of squares error, SSE

Measures the variability within groups:

SSE = SST − SSG

[Unexplained variability, i.e. unexplained by the group variable, due to other reasons]

SSE = 54.29 − 16.96 = 37.33

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 22 / 1

ANOVA ANOVA output, deconstructed

now we need a way to get from these measures of total variability toaverage variability (scaling by a measure that incorporates samplesizes and number of groups→ degrees of freedom)

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 23 / 1

ANOVA ANOVA output, deconstructed

Application exercise: ANOVA output

Fill in the rest of the ANOVA table, and make a decision on thehypotheses. Submit your decision using your clicker.

The data provide convincing evidence that the:

(a) average vocabulary scores are different for all classes.

(b) average vocabulary score for middle class is higher than the average forthe lower class.

(c) average vocabulary score is different for at least one pair of classes.

(d) average vocabulary score is different for at least one pair of classes.

(e) average vocabulary scores are the same for all classes.

(f) average vocabulary scores are different for upper and lower classes.

Note that you will need access to R to calculate the p-value. You canuse the following function:> pf(F-score, df_group, df_error, lower.tail = FALSE)

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 24 / 1

Page 7: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA ANOVA output, deconstructed

Relevant formulasDegrees of freedom associated with ANOVA

groups: dfG = k − 1, where k is the number of groups

total: dfT = n − 1, where n is the total sample size

error: dfE = dfT − dfG

Mean squares

Associated sum of squares divided by the associated df: MS = SS/df

Test statistic, F value

Ratio of the between group and within group variability: F = MSGMSE

p-value

Probability of at least as large a ratio between the “between group” and “withingroup” variability as the one observed, if in fact the means of all groups areequal – calculated as the area under the F curve, with degrees of freedomdfG and dfE , above the observed F statistic.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 25 / 1

ANOVA ANOVA output, deconstructed

Solution

Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628

Total 794 3106.36

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 26 / 1

ANOVA Checking conditions

(1) independence

If the data are a simple random sample from less than 10% ofthe population, this condition is satisfied.

Carefully consider whether the data may be independent (e.g. nopairing).

Always important, but sometimes difficult to check.

Does this condition appear to be satisfied?

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 27 / 1

ANOVA Checking conditions

(2) approximately normal

The observations within each group should be nearly normal(especially important when the sample sizes are small.)

Does this condition appear to be satisfied?

●●●

●●

−2 −1 0 1 2

02

46

8

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

−3 −2 −1 0 1 2 32

46

810

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

24

68

10

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

● ●

● ●

−2 −1 0 1 2

24

68

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 28 / 1

Page 8: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

ANOVA Checking conditions

(3) constant variance

The variability across the groups should be about equal (especiallyimportant when the sample sizes differ between groups.)

Does this condition appear to be satisfied?

●●●●●●

●●

●●●●

LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS

02

46

810

n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 29 / 1

Multiple comparisons & Type 1 error rate

Which means differ?

Earlier we concluded that at least one pair of means differ. Thenatural question that follows is “which ones?”

We can do two sample t tests for differences in each possiblepair of groups.

Can you see any pitfalls with this approach?

When we run too many tests, the Type 1 Error rate increases.

This issue is resolved by using a modified significance level.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 30 / 1

Multiple comparisons & Type 1 error rate

Multiple comparisons

The scenario of testing many pairs of groups is called multiplecomparisons.

The Bonferroni correction suggests that a more stringentsignificance level is more appropriate for these tests:

α? = α/K

where K is the number of comparisons being considered.

If there are k groups, then usually all possible pairs arecompared and K =

k(k−1)2 .

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 31 / 1

Multiple comparisons & Type 1 error rate

Determining the modified α

Clicker question

In the aldrin data set depth has 3 levels: bottom, mid-depth, and sur-face. If α = 0.05, what should be the modified significance level for twosample t tests for determining which pairs of groups have significantlydifferent means?

(a) α∗ = 0.05

(b) α∗ = 0.05/2 = 0.025

(c) α∗ = 0.05/4 = 0.0125

(d) α∗ = 0.05/6 = 0.0083

(e) α∗ = 0.05/6 = 0.0083

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 32 / 1

Page 9: Unit 4: Inference for numerical variables Lecture 4: ANOVAUnit 4: Inference for numerical variables Lecture 4: ANOVA Statistics 104 Mine C¸etinkaya-Rundel October 24, 2013 Announcements

Multiple comparisons & Type 1 error rate

Which means differ?

Based on the box plots below, which means would you expect to besignificantly different?

●●●●●●

●●

●●●●

LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS

02

46

810

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 33 / 1

Multiple comparisons & Type 1 error rate

Which means differ? (cont.)

When doing multiple comparisons after ANOVA, since theassumption of equal variability across groups must have beensatisfied, we re-think how we measure the standard error and thedegrees of freedom.For all comparisons, use a consistent

SE: calculate SE using spooled =√

MSE instead of s1 and s2.

SE =

√s2

1

n1+

s22

n2→ SE =

√MSE

n1+

MSEn2

df: use df = dfE from ANOVA instead of df calculated based onindividual sample sizes n1 and n2.

df = min(n1 − 1, n2 − 1)→ df = dfE

Finally, compare the p-value of this test to the modifiedsignificance level (α?).

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 34 / 1

Multiple comparisons & Type 1 error rate

[Time permitting] Is there a difference between the average vocabularyscores between middle and lower class Americans?

TdfE =(xmiddle − xlower)√

MSEnmiddle

+ MSEnlower

T791 =(6.76 − 5.07)√

3.628331 + 3.628

41

=1.690.315

= 5.365

p − value = 1.06 × 10−7 (two-sided)

α? = 0.05/6 = 0.0083

Reject H0, the data provide convincing evidence of a difference between theaverage vocabulary scores of those from the lower and middle classes.

Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 35 / 1


Recommended