+ All Categories
Home > Documents > Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x...

Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x...

Date post: 21-Jan-2016
Category:
Upload: pamela-christine-anderson
View: 214 times
Download: 0 times
Share this document with a friend
28
Week 12 Anova and contingency tables
Transcript
Page 1: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Week 12

Anova andcontingency tables

Page 2: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Two categorical variables

Joint probabilities px,y = P(X=x and Y=y) proportion of popn with values (x, y)

School performance and wt of children

Page 3: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Conditional probabilities

Proportion within row

pblonde blue eyes = nblonde & blue eyes

nblue eyes

= pblonde & blue eyes

pblue eyes

Page 4: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Conditional probabilities

Weight & performance are independent

School performance and wt of children

Page 5: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Independence from sample?

214 child skiers classified by skiing ability and whether they got injured

Injured Uninjured Total

Beginner 20 60 80 Intermediate 9 84 93 Advanced 2 39 41

Total 31 183 214

Are ability and injury independent in underlying population?

Page 6: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Independence from sample?

Is there independence in underlying population?

Conditional sample proportions

Page 7: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Testing for independence

Can a relationship observed in the sample data be inferred to hold in the population represented by the data?

Could observed sample relationship have occurred by chance?

Page 8: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

31 out of 214 injured overall

Expect 31/214 of the 80 beginners to be injured

i.e. expect

Expected counts — independence

injured beginners

31× 80

214 = 11.59

Page 9: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Expected counts — independence

General formula

exy = nxny

n

Injured Uninjured

Beginner 80

Intermediate 93

Advanced 41

31 183 214

80 × 31

214 = 11.59

93 × 31

214 = 13.47

41× 31

214 = 5.94€

80 ×183

214 = 68.41

93 ×183

214 = 79.53

41×183

214 = 35.06

Page 10: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Observed and estimated counts

Are the differences more than would be expected by chance?

Injured Uninjured

Beginner20

(11.59)60

(68.41) 80

Intermediate9

(13.47)84

(79.53) 93

Advanced2

(5.94)39

(35.06) 41

31 183 214

Page 11: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Chi-squared test of independence

H0: independence of injury & experience

HA: association between injury & experience

or equivalently H0: P(injury|beginner) = P(injury|intermediate) = ...

HA: P(injury | experience) depends on experience

Test statistic:

χ 2 =observed − expected( )

2

expected∑

Page 12: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Small values consistent with independence Big values arise when observed are very

different from what would be expected under independence.

p-value = Prob(χ2 as big as obtained) if indep Tail area of chi-squared distribution d.f. of chi-squared = (rows–1)(cols–1)

χ 2 =observed − expected( )

2

expected∑

Chi-squared test of independence

Page 13: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Chi-squared distributions

• Skewed to the right distributions.• Minimum value is 0.• Indexed by the degrees of freedom.

Page 14: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Skiing injury and experience

p-value = 0.003

Chi-Square Test: Injured, Uninjured

Expected counts are printed below observed countsChi-Square contributions are printed below expected counts

Injured Uninjured Total 1 20 60 80 11.59 68.41 6.105 1.034

2 9 84 93 13.47 79.53 1.484 0.251

3 2 39 41 5.94 35.06 2.613 0.443

Total 31 183 214

Chi-Sq = 11.930, DF = 2, P-Value = 0.003

Strong evidence that the chance of injury is related to experience.

Page 15: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Ear Infections and Xylitol

Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge

Response = Did child have an ear infection?

Page 16: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Ear Infections and Xylitol

Moderately strong evidence of differences between probs of infection

Page 17: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Making friends

With whom do you find it easiest to make friend — opposite sex, same sex or no difference?

Page 18: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Making friends

Fairly strong evidence of difference between Females(1) & Males(2)

Females more likely to choose opposite sex

H0: No difference in distribution of responses of men and women (no relationship between gender & response)

HA: Difference in distribution of responses of men and women (association between gender & response)

Chi-Square Test: Opposite sex, Same sex, No difference

Expected counts are printed below observed countsChi-Square contributions are printed below expected counts

Opposite sex Same sex No difference Total 1 58 16 63 137 48.79 19.38 68.83 1.740 0.590 0.494

2 15 13 40 68 24.21 9.62 34.17 3.507 1.188 0.996

Total 73 29 103 205

Chi-Sq = 8.515, DF = 2, P-Value = 0.014

Page 19: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Comparing means of 3+ groups

Do best students sit in the front of a classroom? Seat location and GPA for n = 384 students

Students sitting in the front generally have slightly higher GPAs than others.

Chance?

Page 20: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Seat location and GPA

p-value = 0.0001.

Such big differences between sample means unlikely if popn means were same

Extremely strong evidence that means are not all same.

H0: 1 = 2 = 3

HA: The means are not all equal.

Page 21: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

95% CIs for separate means:

Seat location and GPA

Main difference seems to be between front and others

Page 22: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Assumptions for F-test

Independent random samples. Normal distribution within each population. Perhaps different population means. Same standard deviation, in each group.

Can still proceed if n is big or assumptions approx hold

Page 23: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

F ratio

More evidence of a real difference when: Group means are far apart Variability within groups is small

How do you measure: Variation between means? Variation within groups?

F =Variation between sample means

Natural variation within groups

Page 24: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Variation between means

Between-groups sum of squares

SSGroups = ni x i − x ( )2

groups∑

Mean sum of squares for groups (k groups):

MSGroups = SSGroups

k −1

Page 25: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Variation within groups

Within-groups sum of squares Residual sum of squares

Mean residual sum of squares:

SSError = ni −1( ) si( )2

groups∑

MSError = SSError

N − k

Best estimate of error st devn, :

sp = MSError

Also called residual sum of squares

Page 26: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Total variation

Total sum of squares = SSTotal

SSTotal = x ij − x ( )2

values∑

SSTotal = SSGroups + SSError

Page 27: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Analysis of variance table

Anova table

F test is based on F ratio p-value = Prob of such a high F ratio if all means same (p-value found from an ‘F distribution’)

Page 28: Week 12 Anova and contingency tables. Two categorical variables Joint probabilities p x,y = P(X=x and Y=y) proportion of popn with values (x, y) School.

Seat location and GPA (again)

p-value = P(F ≥ 6.69) under H0 = 0.0001.

Such a big F ratio unlikely if popn means were same

Extremely strong evidence that means are not all same.

H0: 1 = 2 = 3

HA: The means are not all equal.


Recommended