Analysis of variance (ANOVA) - | Department of Zoology at UBCbio300/notes/14ANOVA.pdf · 2006. 11....

Post on 07-Oct-2020

3 views 0 download

transcript

Analysis of variance (ANOVA)

Comparing the means of more than two groups

Example: Cost of mating in male fruit flies

Drosophila

Treatments: place males with and without unmated

(virgin) females

Five treatments ("groups"), randomly assigned

Response variable: longevity (days) The data (n = 25 males per group):

GROUPS: 1preg 46 42 65 46 58 42 48 58 50 80 63 65 70 70 72 97 46 56 70 70 72 76 90 76 92 1virgin 21 40 44 54 36 40 56 60 48 53 60 60 65 68 60 81 81 48 48 56 68 75 81 48 68 8preg 35 37 49 46 63 39 46 56 63 65 56 65 70 63 65 70 77 81 86 70 70 77 77 81 77 8virgin 16 19 19 32 33 33 30 42 42 33 26 30 40 54 34 34 47 47 42 47 54 54 56 60 44 none 40 37 44 47 47 47 68 47 54 61 71 75 89 58 59 62 79 96 58 62 70 72 75 96 75

Results: histograms of male longevity n = 25 males per group

Descriptive statistics

Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25

Hypothesis testing with ANOVA Like a two-sample t-test, but to test differences between means of more than two groups H0: μ1 = μ2 = μ3 = … = μk HA: At least one of the μi is different

Hypothesis testing with ANOVA

Rejecting H0 in ANOVA indicates that at least one of the means μi is different from the others.

Scenarios being compared

Y

Frequency

μ1 = μ2 = μ3

Y

Frequency

Not all μ's equal

Why we compare variances Under H0, the variance among group means should be

,/22 nY σσ =

or

.22 σσ =Yn

• 2Yσ is the variance (squared SD) between group means iY

• n is the sample size within groups • σ 2 is the variance in Y within each group

If the null hypothesis is not true, the variance among groups should be larger than this

Variance within groups ("pooled sample variance") Error sum of squares

∑ −= )1(2error ii nsSS

Error degrees of freedom

kNndf ierror

−=

−= ∑ )1(

where N is the total number of data points in all groups

Mean Square Error (MSerror)

error

errorerror df

SSMS =

Variance among groups

Groups sum of squares SSgroups = ni(Y i − Y )2∑

where Y is the grand mean,

NYn

Y ii∑= .

Degrees of freedom for groups

dfgroups = k − 1.

Mean Square Groups (MSgroups)

groups

groups

dfSS

MS =groups.

F-ratio

error

groups

MSMS

F =

Under H0, MSgroups ≈ MSerror

(except for chance) So under H0,

F ≈ 1 (except for chance)

F test Reject H0 is rejected if F is significantly larger than 1 Compare F with critical value of the F distribution,

F0.05(1),dfgroups ,dferror Reject H0 F ≥ F 0.05(1),dfgroups ,dferror

F distribution

ANOVA table for fruit fly experiment

Source Sum of Squares df Mean Squares F P-valueGroups 11939.28 4 2984.8200 13.61195 <0.001

Error 26313.52 120 219.2793 Total 38252.80 124

F0.05(1),4 ,120 = 2.447

Reject H0 Conclusion: Mean longevity not equal in all groups

Cost of mating in male fruit flies

Implication: sex is deadly (in males)

R2 ("R-squared")

R2 measures the fraction of the variation in Y that is "explained" by group differences.

Based on the fact that the total sum of squares is split into its two component parts, the sum of squares of error and the sum of squares of groups,

errorgroupstotal SSSSSS +=

total

groups2

SSSS

R =.

Fruit flies

312.038252.8011939.282

=

=R

Conclusion: 31% of the variation in longevity is "explained" by treatment

ANOVA vs two-sample t-test An ANOVA with k=2 is mathematically equivalent to a two-tailed, 2-sample t-test of H0: μ1 − μ2 = 0 HA: μ1 − μ2 ≠ 0

Assumptions of ANOVA

• The measurements in every group are a random sample from the corresponding population.

• The variable has a normal distribution in all k populations. • The variance is the same in all k population.

The robustness of ANOVA

• The ANOVA is fairly robust to deviations from the assumption of normality, particularly when sample size is large.

• This robustness stems from a property of sample means described by the Central Limit Theorem: the sampling distribution of means is approximately normal when sample size is large, even when the variable itself does not have a normal distribution.

• ANOVA is also fairly robust to departures from the assumption of equal variance in the k populations if the samples sizes are all equal or nearly so.

Strategies if assumptions of ANOVA not met and we can't rely on robustness property

• Transformations (e.g., log-transformation, square root transformation, and arcsine transformation).

• Nonparametric alternatives

Nonparametric alternatives to ANOVA: Kruskal-Wallis test

• Analogous to the Mann-Whitney U-test for more than

two groups. • Based on ranks • Test statistic H has an approximately χ2 distribution

H0: The population median is the same in all k groups. HA: At least one of the groups has a different median.

ANOVA: which means are different? H0: μ1 = μ2 = μ3 = … = μk HA: At least one of the μi is different (but which?)

ANOVA: which means are different? Two approaches to addressing this question:

1. Planned comparisons

2. Unplanned comparisons

Planned comparisons

One or a very small number of focal comparisons that were planned at the time the study was designed.

Unplanned comparisons A comprehensive search for differences between groups

What's so important about the difference?

1. Planned comparisons: • No need to correct for multiple comparisons • Very few planned comparisons permitted

2. Unplanned comparisons

• Correction for multiple comparisons required because of rising Type 1 error rates

Planned confidence interval for the difference between two means

Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25

Planned confidence interval for the difference between two means

Formula for confidence interval similar to that for two-sample case:

kNjijikNji tSEYYtSEYY −− +−<−<−− ),2(05.0),2(05.0 )()( μμ

except that we calculate SE using MSerror instead of the pooled sample variance for just the two samples

SE = MSerror

1

ni

+1

n j

⎝ ⎜ ⎜

⎠ ⎟ ⎟ .

and we use dferror = N − k for degrees of freedom

Planned confidence interval for the difference between two means

Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25

)( ji YY − = 38.7 − 63.4 = −24.64

SE = 4.188, df = 120, t0.05(2),120 = 1.98 95% confidence interval:

35.1693.32 −<−<− ji μμ

Example of unplanned comparisons The Tukey-Kramer method for testing

all pairs of means

Comparison )( ji YY − SE q critical q(0.05) conclusion

1preg 8virgin 26.08 4.188 6.227 2.770 reject 1preg 1virgin 8.04 4.188 1.920 2.770 do not reject 1preg 8preg 1.44 4.188 0.344 2.770 do not reject 1preg none 1.24 4.188 0.296 2.770 do not reject none 8virgin 24.84 4.188 5.931 2.770 reject none 1virgin 6.80 4.188 1.624 2.770 do not reject none 8preg 0.20 4.188 0.048 2.770 do not reject 8preg 8virgin 24.64 4.188 5.883 2.770 reject 8preg 1virgin 6.60 4.188 1.576 2.770 do not reject

1virgin 8virgin 18.04 4.188 4.307 2.770 reject

8virgin 1virgin 8preg none 1preg38.72 56.76 63.36 63.56 64.80

The Tukey-Kramer method for testing all pairs of means

With the Tukey-Kramer method, the probability of making at least one Type 1 error throughout the

course of testing all pairs of means is no greater than the significance level α.

Fixed vs random effects ANOVA

1. Fixed effects: With fixed effects, the treatments are chosen by the experimenter. They are repeatable and of direct interest.

2. Random effects: With random effects, the treatments are a random sample from a "population" of treatments.

For single-factor ANOVA, there is no difference in the F-test of the null and alternative hypotheses

Example of random effects ANOVA: Flycatcher patch height in 2 different years

Patch height measurements

Bird Height 1 (mm) Height 2 (mm) Bird Height 1 (mm) Height 2 (mm)1 10.5 9.3 16 8.1 7.0 2 10.6 9.2 17 8.1 6.8 3 8.7 9.3 18 7.4 6.7 4 8.6 9.1 19 6.7 6.8 5 9.0 9.0 20 6.3 6.7 6 9.3 8.7 21 6.8 6.4 7 8.9 8.7 22 6.7 6.2 8 9.0 7.6 23 6.1 5.8 9 7.9 8.0 24 6.5 5.6

10 7.6 7.8 25 6.7 5.3 11 7.6 8.1 26 6.5 5.0 12 7.7 8.1 27 5.7 5.1 13 6.8 7.9 28 6.7 7.8 14 7.2 7.5 29 6.6 7.6 15 7.8 7.1 30 6.6 7.7

Random effects ANOVA

• Each bird is a "group", randomly sampled from a population of groups (birds)

• n = 2 measurements per group • We don't care so much about the individual birds, they are

just a sample • What we really want to know is whether the trait varies in

the population (and by how much) H0: Patch height does not vary in the population HA: Patch height varies in the population

Random effects ANOVA

Source Sum of Squares df Mean Squares F P-valueGroups 82.494 29 2.844621 7.946 <0.001

Error 10.740 30 0.358000 Total 93.234 59

F0.05(1),29 ,30 = 1.847

Reject H0 Conclusion: patch height varies in the population

Repeatability

Repeatability is the fraction of the variance in the trait that is among groups

,ityRepeatabilerror

2

2

MSss

A

A

+=

where the variance among groups is

.errorgroups2

nMSMS

sA

−=

Repeatability of patch height

.24331.12

0.3580002.8446212

=

−=As

yielding

.884.00.35800024331.1

24331.1ityRepeatabil

=+

=

Repeatability of patch height

Conclusion: 88% of the variance in patch height in the population is among birds, whereas the remaining 12% is variance from measurement to measurement on the same birds.