+ All Categories
Home > Documents > © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance:...

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance:...

Date post: 16-Dec-2015
Category:
Upload: walter-cole
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
26
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups
Transcript
Page 1: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The Statistical Imagination

Chapter 12: Analysis of Variance:

Differences among Means of Three or More Groups

Page 2: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Analysis of Variance (ANOVA)

• ANOVA is used to compare three or more group means

• Instead of comparing each group mean to the others (as with a t-test), ANOVA compares each group mean to the grand mean, which is the mean for all cases in the sample

Page 3: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Main Effects

• In ANOVA, the difference between each group mean and the grand mean is a test effect, which are called main effects

• When the main effects are zero, this indicates that there are no differences among the means

Page 4: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The ANOVA Hypothesis Test

• For the ANOVA test, the H0 states that the population means of the groups are equal

• The H0 can also be stated as “the main effects are equal to zero,” or “there is no difference among the means”

Page 5: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The Idea Behind ANOVA

• ANOVA hypothesizes about differences among means, but its calculation is based on explaining variance around the grand mean

• E.G., suppose that the overall or “grand” mean of socioeconomic status (SES) of all household heads is 45. Urban residents, however, average 50. The 5-point difference we call the main effect of the category urban

Page 6: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The Idea Behind ANOVA (cont.)

• Shaneka, an urban dweller, scores 60. This is 15 SES points more than the grand mean of 45. This 15 SES points is her deviation score, the difference between her raw score and the overall mean

• ANOVA determines whether it is feasible to say that 5 SES points of her 15-point deviation score are due to the fact that she is an urban resident

Page 7: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The Idea Behind ANOVA (cont.)

• The focus with ANOVA is on explaining deviation scores

• Deviation scores when squared, summed, and averaged for a group of scores make up the variance. Hence the name “analysis of variance”

Page 8: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The Idea Behind ANOVA (cont.)

• With ANOVA we are asserting that the spread of scores is due to the main effects of the groups, as illustrated in Figure 12-2 in the text

• Can scores be explained by differences between group classifications? If so, then scores will cluster around group means rather than the grand mean, and this suggests a difference among means

Page 9: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

The General Linear Model

• The general linear model is a useful framework for understanding ANOVA

• The general linear model states that the best prediction of an individual’s score on a dependent variable is the overall mean plus an adjustment for the effects of group membership on an independent variable

Page 10: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Applying the General Linear Model

• For Shaneka, the urban resident with a SES of 60, we decompose her score into 45 points for the grand mean and 5 points explained by urban resident (the main effect of urban). The remaining 10 points are unexplained error

Page 11: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating ANOVA Statistics

• ANOVA calculations are summarized in a source table

• To obtain variances, we calculate three parts of the variation (or sums of squares) of the interval/ratio dependent variable and divide them by degrees of freedom

Page 12: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Sums of Squares

• The three types of sums of squares for ANOVA are:

1. the total sum of squares (SST)

2. the between-group or “explained” sum of squares (SSB), and

3. the within-group or unexplained sum of squares (SSW)

Page 13: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating the SST

• The total sum of squares (SST) is calculated by summing the squared deviation scores for all cases

• The SST is the same sum of squares calculated for the standard deviation (Chapter 5)

Page 14: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating the SSB

• The between-group or explained sum of squares (SSB) is calculated by squaring the main effect of each case and summing these squares

• The SSB is explained in the sense that it is accounted for by differences among the group means, as measured by main effects

Page 15: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating the SSW

• The within-group or unexplained sum of squares (SSW) is that part of the squared deviation scores that is not accounted for by main effects. It is unexplained error in the prediction of scores

• The SSW is most easily calculated by subtracting the between-group sum of squares from the total sum of squares

Page 16: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating the Mean Square Variance (MSV)

• After sums of squares are computed, to account for sample size and the number of groups, these sums are divided by their degrees of freedom. The resulting variances are called mean square variances (MSV)

• MSWB = the mean square variance between

groups

• MSWW = the mean square variance within

groups

Page 17: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Calculating the F-Ratio Test Statistic

• The test statistic for ANOVA is the F-ratio statistic

• This is the ratio of the mean square variance between groups to the mean square variance within groups: F = MSVB / MSVW

• The p-value is determined using F-distribution curves, Appendix B, Tables D and E

Page 18: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

When to Use the F-ratio Test

• In general, we use ANOVA and the F-ratio when testing a hypothesis between a nominal/ordinal independent variable with three or more categories, and an interval/ratio dependent variable

• ANOVA is a difference of means test and a cousin of the t-test

Page 19: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

When to Use the F-ratio Test (cont.)

• 1. Number of variables, samples, and populations:• a) One population with a single interval/ratio

dependent variable, comparing means for three or more groups of a single nominal/ordinal independent variable. Each group’s sample must be representative of its subpopulation, or

• b) a single interval/ratio dependent variable whose mean is compared for three or more populations using representative samples

Page 20: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

When to Use the F-ratio Test (cont.)

• 2) Sample size: generally no requirements. However, the dependent interval/ratio variable should not be highly skewed within any group sample. Moreover, range tests are unreliable unless sample sizes of groups are about equal. These restrictions are less important when group sample sizes are large

Page 21: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

When to Use the F-ratio Test (cont.)

• 3) Variances (and standard deviations) of the groups are equal. This is the same restraint for the t-test (see equality of variances, Chapter 11)

Page 22: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Existence and Direction of the Relationship for ANOVA

• Existence: Determined by using the F-ratio to test the null hypothesis of equal group means

• Direction: Not applicable (because the independent variable is nominal)

Page 23: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Strength of theRelationship for ANOVA

• Strength: A strong relationship is one in which a high proportion of the total variance in the dependent interval/ratio variable is accounted for by the group variable

• The correlation ratio, ε2 (epsilon squared) is a conservative measure that is unlikely to overinflate the strength of the relationship

Page 24: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Nature of the Relationship for ANOVA

• To assess the nature for ANOVA:1) Make best estimates at the group level by

reporting the grand mean, group means, and main effects

2) Provide examples of best estimates for individuals using the general linear model

3) Use range tests to specify which group means are significantly different from others

Page 25: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Range Tests

• With ANOVA, rejection of the null hypothesis merely indicates that at least two group means are significantly different

• Range tests determine which means differ, by establishing the range of differences between means that is statistically significant

• Tukey’s Highly Significant Difference (HSD) is a conservative range test, unlikely to mistakenly tell us that a difference exists when in fact it does not

Page 26: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

© 2008 McGraw-Hill Higher Education

Statistical Follies

• Care must be taken not to apply a group finding to individuals

• The “ecological fallacy,” drawing conclusions about individuals on the basis of analysis of group units, such as communities, is an extreme case of misapplying statistical findings


Recommended