Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.

Analysisof

Variance

Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2

ANOVA

Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors.

If one categorical variable: one-way ANOVA

If two categorical variables: two-way ANOVA


16.1 Comparing Means with an ANOVA F-Test

F-statistic:

H0: 1 = 2 = … = k

Ha: The means are not all equal.

groups within variationNatural

means sample amongVariation F


groups within variationNatural

means sample amongVariation F

Variation among sample means is 0 if all k sample means are equal and gets larger the more spread out they are.

If F is large enough => evidence at least one population mean differs from others => reject null hypothesis.

p-value found using an F-distribution (more later)


Example 16.1 Seat Location and GPA

Q: Do best students sit in the front of a classroom?

Data on seat location and GPA for n = 384 students; 88 sit in front, 218 in middle, 78 in back

Students sitting in the front generally have slightly higher GPAs than others.


Example 16.1 Seat Location and GPA (cont)

The F-statistic is 6.69 and the p-value is 0.001.

p-value so small => reject H0 and conclude there are differences among the means.

H0: 1 = 2 = 3

Ha: The means are not all equal.



95% Confidence Intervals for 3 population means:

Interval for “front” does not overlap with the other two intervals => significant difference between mean GPA for front-row sitters and mean GPA for other students


Notation for Summary Statisticsk = number of groups , si, and ni are the mean, standard deviation, and sample size for the ith sample groupN = total sample size (N = n1 + n2 + … + nk)

x


Three seat locations => k = 3n1 = 88, n2 = 218, n3 = 78; N = 88+218+78 = 384

5105.0 ,5577.0 ,5491.0

9194.2 ,9853.2 ,2029.3

321

321

sss

xxx


Assumptions for the F-Test• Samples are independent random samples.• Distribution of response variable is a normal curve

within each population.• Different populations may have different means.• All populations have same standard deviation, .

e.g. How k = 3 populations might look …


Conditions for Using the F-Test

• F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different.

• Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large.

• A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations.



• The boxplot showed two outliers in the group of students who typically sit in the middle of a classroom, but there are 218 students in that group so these outliers don’t have much influence on the results.

• The standard deviations for the three groups are nearly the same.

• Data do not appear to be skewed.

Necessary conditions for F-test seem satisfied.


The Family of F-Distributions• Skewed distributions with minimum value of 0. • Specific F-distribution indicated by two parameters

called degrees of freedom: numerator degrees of freedom and denominator degrees of freedom.

• In one-way ANOVA, numerator df = k – 1, and denominator df = N – k


Determining the p-Value

Statistical Software reports the p-value in output.

Table A.4 provides critical values for 1% and 5% significance levels.

• If the F-statistic is > than the 5% critical value, the p-value < 0.05.

• If the F-statistic is > than the 1% critical value, the p-value < 0.01 .

• If the F-statistic is between the 1% and 5% critical values, the p-value is between 0.01 and 0.05.


Example 16.2 Testosterone and Occupation

Study: Compare mean testosterone levels for k = 7 occupational groups

Reported F-statistic was F = 2.5 and p-value < 0.05

N = 66 men: num df = k – 1 = 7 – 1 = 6den df = N – k = 66 – 7 = 59

Table A.4 with df of (6, 60):The 5% critical value is 2.25 and the F-statistic was larger so the the p-value < 0.05.


Multiple ComparisonsMultiple comparisons: two or more comparisons are made to examine specific pattern of differences among means.

Most common: all pairwise comparisons.

Ways to make inferences about each pair of means:

• Significance test to assess if two means significantly differ.

• Confidence interval for difference computed and if 0 is not in the interval, there is a statistically significant difference.


Multiple ComparisonsMany statistical tests done => increased risk of making at least one type I error (erroneously rejecting a null hypothesis). Several procedures to control the overall family type I error rate or overall family confidence level.

• Family error rate for set of significance tests is probability of making one or more type I errors when more than one significance test is done.

• Family confidence level for procedure used to create a set of confidence intervals is the proportion of times all intervals in set capture their true parameter values.


Example 16.1 Seat Location and GPA (cont)Pairwise Comparison Output:Tukey: Family confidence level of 0.95 Fisher: 0.95 level for each individual interval

Here, both give same conclusions:Only 1 interval covers 0, Middle – Back

Appears population mean GPAs differ for front and middle students and for front and back students.


16.2 Details of One-Way Analysis of Variance

Fundamental concept: the variation among the data values in the overall sample can be separated into:(1) differences between group means(2) natural variation among observations within a group

Total variation = Variation between groups + Variation within groups

ANOVA Table displays this information.


Measuring Variation Between Groups

Sum of squares for groups = SS Groups

1

Groups SSGroups MS

k

groups ii xxn 2Groups SS

Numerator of F-statistic = mean square for groups


Measuring Variation within Groups

Sum of squared errors = SS Error

kN

Error SSMSE

groups ii sn 21Errors SS

Denominator of F-statistic = mean square error

Pooled standard deviation: MSEps


Measuring Total Variation

Total sum of squares = SS Total = SSTO

values ij xx 2Total SS

SS Total = SS Groups + SS Error


General Format of a One-Way ANOVA Table


Example 16.3 Comparison of Weight Loss Programs

Program 3 appears to have the highest weight loss overall.

15

9

7

3

2

1

x

x

x


Example 16.3 Comparison of Weight Loss Programs (cont)

10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

1141015310931074

Groups SS

222

2

groups ii xxn

5713

114

1

Groups SSGroups MS

k



10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

148

101810121015

1071011109

107105109107

Total SS

222

222

2222

2

values ij xx



10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

34114148

Groups SSTotal SSError SS

-

-

857.4310

34Error SSMSE

kN

df 7 and 2 with 74.11857.4

57

MSE

Groups MSF



1

2

3

“Factor” used instead of Groups as the groups (weight-loss programs) form an explanatory factor for the response.

Note: Pooled StDev is 204.286.4MSE ps


Example 16.4 Top Speeds of Supercars

Data: top speeds for six runs on each of five supercars. Kitchens (1998, p. 783)


Example 16.4 Top Speeds (cont)


Example 16.4 Top Speeds (cont)

• F = 25.15 and p-value is 0.000 => reject null hypothesis that population mean speeds are same for all five cars.

• Conditions are satisfied. Data not skewed and no extreme outliers. Largest sample std dev (5.02 Viper) not more than twice as large as smallest std dev (2.92 Acura).

• MS Error =14.5 is an estimate of variance of top speed for hypothetical distribution of all possible runs with one car. Estimated standard deviation for each car is 3.81.

• Based on sample means and CIs: Porsche and Ferrari seem to be significantly faster than other cars.


95% Confidence Intervals for the Population Means

In one-way analysis of variance, a confidence interval for a population mean is

i

pi

n

stx *

where and

t* is such that the confidence level is the probability between -t* and t* in a t-distribution with df = N – k.

MSEps


16.3 Other MethodsWhen data are skewed or extreme outliers present …better to analyze the median instead of mean

Two such tests are:

1. Kruskal-Wallis Test

2. Mood’s Median Test

Also called nonparametric tests.

H0: Population medians are equal.

Ha: Population medians are not all equal.


Example 16.5 Drinks and Seat Location

Data: Seat location and number of alcoholic drinks per week

Students sitting in the back report drinking more.

Data appear skewed, sample standard deviations differ.


Example 16.5 Drinks and Seat (cont)

P = 0.000 => strong evidence that the population median number of drinks per week are not all equal.


Example 16.5 Drinks and Seat (cont)

P = 0.000 => the null hypothesis of equal population medians can be rejected.


16.4 Two-Way ANOVA (CD Topic S4)

Two-way analysis of variance: to examine how two categorical explanatory variables affect the mean of a quantitative response variable.

Main effect: overall effect of a single explanatory variable.

Interaction: effect on response variable of one explanatory variable depends upon the specific value or level for the other explanatory variable.


Example 16.6 Happy Faces and TipsQ: Does drawing a happy face on the restaurant

bill increase average tip to server?

Effect of drawing happy face depended on gender. Speculated customers felt happy face not gender appropriate for males.


Example 16.7 You’ve Got to Have Heart

Response: Weight gain in InfantsExplanatory: Heartbeat Status (Yes or No)

Initial weight (low, med, high)

Weight gain generally greater for heartbeat group.

There is a main effect for the heartbeat status.

Approximately parallel lines => little/no interaction


Example 16.6 Faces and Tips (cont)

Two-way ANOVA:Three F-statistics are made – one for each main effect and one for interaction.

Since interaction effect is significant => difficult to interpret the main effect.

Date post:	17-Dec-2015
Category:	Documents
Upload:	jesse-higgins
View:	256 times
Download:	6 times

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.

Documents