Chap 11-1
Business Statistics
Chapter 11Analysis of Variance
Chap 11-2
Chapter Goals
After completing this chapter, you should be able to:
Recognize situations in which to use analysis of variance Understand different analysis of variance designs Perform a single-factor hypothesis test and interpret results Conduct and interpret post-analysis of variance pairwise
comparisons procedures Set up and perform randomized blocks analysis Analyze two-factor analysis of variance test with replications
results
Chap 11-3
Chapter Overview
Analysis of Variance (ANOVA)
F-testF-test
Tukey-Kramer
testFisher’s Least
SignificantDifference test
One-Way ANOVA
Randomized Complete
Block ANOVA
Two-factor ANOVA
with replication
Chap 11-4
General ANOVA Setting Investigator controls one or more independent
variables Called factors (or treatment variables) Each factor contains two or more levels (or
categories/classifications) Observe effects on dependent variable
Response to levels of independent variable Experimental design: the plan used to test
hypothesis
Chap 11-5
One-Way Analysis of Variance
Evaluate the difference among the means of three or more populations
Examples: Accident rates for 1st, 2nd, and 3rd shift Expected mileage for five brands of tires
Assumptions Populations are normally distributed Populations have equal variances Samples are randomly and independently drawn
Chap 11-6
Completely Randomized Design
Experimental units (subjects) are assigned randomly to treatments
Only one factor or independent variable With two or more treatment levels
Analyzed by One-factor analysis of variance (one-way ANOVA)
Called a Balanced Design if all factor levels have equal sample size
Chap 11-7
Hypotheses of One-Way ANOVA
All population means are equal i.e., no treatment effect (no variation in means among
groups)
At least one population mean is different i.e., there is a treatment effect Does not mean that all population means are different
(some pairs may be the same)
k3210 μμμμ:H
same the are means population the of all Not:HA
Chap 11-8
One-Factor ANOVA
All Means are the same:The Null Hypothesis is True
(No Treatment Effect)
k3210 μμμμ:H
same the are μ all Not:H iA
321 μμμ
Chap 11-9
One-Factor ANOVA
At least one mean is different:The Null Hypothesis is NOT true
(Treatment Effect is present)
k3210 μμμμ:H
same the are μ all Not:H iA
321 μμμ 321 μμμ
or
(continued)
Chap 11-10
Partitioning the Variation Total variation can be split into two parts:
SST = Total Sum of SquaresSSB = Sum of Squares BetweenSSW = Sum of Squares Within
SST = SSB + SSW
Chap 11-11
Partitioning the Variation
Total Variation = the aggregate dispersion of the individual data values across the various factor levels (SST)
Within-Sample Variation = dispersion that exists among the data values within a particular factor level (SSW)
Between-Sample Variation = dispersion among the factor sample means (SSB)
SST = SSB + SSW
(continued)
Chap 11-12
Partition of Total Variation
Variation Due to Factor (SSB)
Variation Due to Random Sampling (SSW)
Total Variation (SST)
Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within Groups Variation
Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation
= +
Chap 11-13
Total Sum of Squares
k
i
n
jij
i
)xx(SST1 1
2
Where:
SST = Total sum of squares
k = number of populations (levels or treatments)
ni = sample size from population i
xij = jth measurement from population i
x = grand mean (mean of all data values)
SST = SSB + SSW
Chap 11-14
c1=c(254,263,241,237,251) c2=c(234,218,235,227,216) c3=c(200,222,197,206,204) y=as.vector(rbind(c1,c2,c3)) Use R to find mean of c1, c2, c3, y Find deviations from the means Find sum of squared deviations
Chap 11-15
Total Variation(continued)
Group 1 Group 2 Group 3
Response, X
X
2212
211 )xx(...)xx()xx(SST
kkn
Chap 11-16
Sum of Squares Between
Where:
SSB = Sum of squares between
k = number of populations
ni = sample size from population i
xi = sample mean from population i
x = grand mean (mean of all data values)
2
1
)xx(nSSB i
k
ii
SST = SSB + SSW
Chap 11-17
Between-Group Variation
Variation Due to Differences Among Groups
i j
2
1
)xx(nSSB i
k
ii
1
kSSBMSB
Mean Square Between = SSB/degrees of freedom
Chap 11-18
Between-Group Variation(continued)
Group 1 Group 2 Group 3
Response, X
X1X 2X
3X
2222
211 )xx(n...)xx(n)xx(nSSB kk
Chap 11-19
Sum of Squares Within
Where:
SSW = Sum of squares within
k = number of populations
ni = sample size from population i
xi = sample mean from population i
xij = jth measurement from population i
2
11
)xx(SSW iij
n
j
k
i
j
SST = SSB + SSW
Chap 11-20
Within-Group Variation
Summing the variation within each group and then adding over all groups
i
kNSSWMSW
Mean Square Within = SSW/degrees of freedom
2
11
)xx(SSW iij
n
j
k
i
j
Chap 11-21
Within-Group Variation(continued)
Group 1 Group 2 Group 3
Response, X
1X2X
3X
22212
2111 )xx(...)xx()xx(SSW kknk
Chap 11-22
One-Way ANOVA Table
Source of Variation
dfSS MS
Between Samples SSB MSB =
Within Samples N - kSSW MSW =
Total N - 1SST =SSB+SSW
k - 1 MSBMSW
F ratio
k = number of populationsN = sum of the sample sizes from all populationsdf = degrees of freedom
SSBk - 1SSWN - k
F =
Chap 11-23
One-Factor ANOVAF Test Statistic
Test statistic
MSB is mean squares between variancesMSW is mean squares within variances
Degrees of freedom df1 = k – 1 (k = number of populations) df2 = N – k (N = sum of sample sizes from all populations)
MSWMSBF
H0: μ1= μ2 = … = μ k
HA: At least two population means are different
Chap 11-24
Interpreting One-Factor ANOVA
F Statistic The F statistic is the ratio of the between
estimate of variance and the within estimate of variance The ratio must always be positive df1 = k -1 will typically be small df2 = N - k will typically be large
The ratio should be close to 1 if H0: μ1= μ2 = … = μk is true
The ratio will be larger than 1 if H0: μ1= μ2 = … = μk is false
Chap 11-25
One-Factor ANOVA F Test Example
You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean distance?
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
Chap 11-26
••••
•
One-Factor ANOVA Example: Scatter Diagram
270
260
250
240
230
220
210
200
190
•••••
•••••
Distance
1X
2X
3X
X
227.0 x
205.8 x 226.0x 249.2x 321
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
Club1 2 3
Chap 11-27
One-Factor ANOVA Example Computations
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
x1 = 249.2
x2 = 226.0
x3 = 205.8
x = 227.0
n1 = 5
n2 = 5
n3 = 5
N = 15
k = 3
SSB = 5 [ (249.2 – 227)2 + (226 – 227)2 + (205.8 – 227)2 ] = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6
MSB = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.325.275
93.32358.2F
Chap 11-28
F = 25.275
One-Factor ANOVA Example Solution
H0: μ1 = μ2 = μ3
HA: μi not all equal
= .05df1= 2 df2 = 12
Test Statistic:
Decision:
Conclusion:Reject H0 at = 0.05
There is evidence that at least one μi differs from the rest
0 = .05
F.05 = 3.885Reject H0Do not
reject H0
25.27593.3
2358.2MSWMSBF
Critical Value:
F = 3.885
Chap 11-29
R program c1=c(254,263,241,237,251) c2=c(234,218,235,227,216) c3=c(200,222,197,206,204) f=factor(rep(1:3,5)) y=as.vector(rbind(c1,c2,c3)) aov(y~f); plot(f,y) #gives box plot anova(lm(y~f)) #prints F statistic
Chap 11-30
SUMMARYGroups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVASource of Variation SS df MS F P-value F crit
Between Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885
Within Groups 1119.6 12 93.3
Total 5836.0 14
ANOVA -- Single Factor:Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
Chap 11-31
The Tukey-Kramer Procedure
Tells which population means are significantly different e.g.: μ1 = μ2 μ3
Done after rejection of equal means in ANOVA Allows pair-wise comparisons
Compare absolute mean differences with critical range
xμ 1 = μ 2 μ 3
Chap 11-32
Tukey-Kramer Critical Range
where:q = Value from standardized range table
with k and N - k degrees of freedom for the desired level of
MSW = Mean Square Within ni and nj = Sample sizes from populations (levels) i and j
ji n1
n1
2MSWqRange Critical
Chap 11-33
The Tukey-Kramer Procedure: Example1. Compute absolute mean
differences:Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204 20.2205.8226.0xx
43.4205.8249.2xx
23.2226.0249.2xx
32
31
21
2. Find the q value from the table in appendix J with k and N - k degrees of freedom for
the desired level of
3.77qα
Chap 11-34
The Tukey-Kramer Procedure: Example
5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at 5% level of significance.
16.28551
51
293.33.77
n1
n1
2MSWqRange Critical
jiα
3. Compute Critical Range:
20.2xx
43.4xx
23.2xx
32
31
21
4. Compare:
Chap 11-35
Tukey-Kramer in PHStat
Chap 11-36
Randomized Complete Block ANOVA
Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...
...but we want to control for possible variation from a second factor (with two or more levels)
Used when more than one factor may influence the value of the dependent variable, but only one is of key interest
Levels of the secondary factor are called blocks
Chap 11-37
Partitioning the Variation Total variation can now be split into three parts:
SST = Total sum of squaresSSB = Sum of squares between factor levelsSSBL = Sum of squares between blocksSSW = Sum of squares within levels
SST = SSB + SSBL + SSW
Chap 11-38
Sum of Squares for Blocking
Where:
k = number of levels for this factor
b = number of blocks
xj = sample mean from the jth block
x = grand mean (mean of all data values)
2
1
)xx(kSSBL j
b
j
SST = SSB + SSBL + SSW
Chap 11-39
Partitioning the Variation Total variation can now be split into three parts:
SST and SSB are computed as they were in One-Way ANOVA
SST = SSB + SSBL + SSW
SSW = SST – (SSB + SSBL)
Chap 11-40
Mean Squares
1
kSSB
between square MeanMSB
1
bSSBLblocking square MeanMSBL
)b)(k(SSW withinsquare MeanMSW
11
Chap 11-41
Randomized Block ANOVA Table
Source of Variation dfSS MS
Between Samples SSB MSB
Within Samples (k–1)(b-1)SSW MSW
Total N - 1SST
k - 1
MSBLMSW
F ratio
k = number of populations N = sum of the sample sizes from all populationsb = number of blocks df = degrees of freedom
Between Blocks SSBL b - 1 MSBL
MSBMSW
Chap 11-42
Blocking Test
Blocking test: df1 = b - 1 df2 = (k – 1)(b – 1)
MSBLMSW
...μμμ:H b3b2b10
equal are means block all Not:HA
F =
Reject H0 if F > F
Chap 11-43
Main Factor test: df1 = k - 1 df2 = (k – 1)(b – 1)
MSBMSW
k3210 μ...μμμ:H
equal are means population all Not:HA
F =
Reject H0 if F > F
Main Factor Test
Chap 11-44
Fisher’s Least Significant Difference
Test To test which population means are significantly
different e.g.: μ1 = μ2 ≠ μ3 Done after rejection of equal means in randomized
block ANOVA design Allows pair-wise comparisons
Compare absolute mean differences with critical range
x = 1 2 3
Chap 11-45
Fisher’s Least Significant Difference (LSD) Test
where: t/2 = Upper-tailed value from Student’s t-distribution
for /2 and (k -1)(n - 1) degrees of freedom MSW = Mean square within from ANOVA table
b = number of blocks k = number of levels of the main factor
b2MSWtLSD /2
Chap 11-46
...etc
xx
xx
xx
32
31
21
Fisher’s Least Significant Difference (LSD) Test
(continued)
b2MSWtLSD /2
If the absolute mean difference is greater than LSD then there is a significant difference between that pair of means at the chosen level of significance.
Compare:
?LSDxxIs ji
Chap 11-47
Two-Way ANOVA Examines the effect of
Two or more factors of interest on the dependent variable
e.g.: Percent carbonation and line speed on soft drink bottling process
Interaction between the different levels of these two factors (only if replications exist)
e.g.: Does the effect of one particular percentage of carbonation depend on which level the line speed is set?
Chap 11-48
Two-Way ANOVA
Assumptions
Populations are normally distributed Populations have equal variances Independent random samples are
drawn
(continued)
Chap 11-49
Two-Way ANOVA Sources of Variation
Two Factors of interest: A and B
a = number of levels of factor A
b = number of levels of factor B
N = total number of observations in all cells
n’= replications
Chap 11-50
Two-Way ANOVA Sources of Variation
SSTTotal Variation
SSA
Variation due to factor A
SSB
Variation due to factor B
SSAB
Variation due to interaction between A and B
SSEInherent variation (Error)
Degrees of Freedom:
a – 1
b – 1
(a – 1)(b – 1)
N – ab
N - 1
SST = SSA + SSB + SSAB + SSE(continued)
Chap 11-51
Two Factor ANOVA Equations
a
i
b
j
n
kijk )xx(SST
1 1 1
2
2
1
)xx(nbSSa
iiA
2
1
)xx(naSSb
jjB
Total Sum of Squares:
Sum of Squares Factor A:
Sum of Squares Factor B:
Chap 11-52
Two Factor ANOVA Equations
2
1 1
)xxxx(nSSa
i
b
jjiijAB
a
i
b
j
n
kijijk )xx(SSE
1 1 1
2
Sum of Squares Interaction Between A and B:
Sum of Squares Error:
(continued)
Chap 11-53
Two Factor ANOVA Equations
where:Mean Grand
nab
xx
a
i
b
j
n
kijk
1 1 1
Afactor of level each of Meannb
xx
b
j
n
kijk
i
1 1
B factor of level each of Meanna
xx
a
i
n
kijk
j
1 1
cell each of Meannx
xn
k
ijkij
1
a = number of levels of factor Ab = number of levels of factor Bn’ = number of replications in each cell
(continued)
Chap 11-54
Mean Square Calculations
1
aSS Afactor square MeanMS A
A
1
bSSB factor square MeanMS B
B
)b)(a(SSninteractio square MeanMS AB
AB 11
abNSSEerror square MeanMSE
Chap 11-55
Two-Way ANOVA:The F Test Statistic
F Test for Factor B Main Effect
F Test for Interaction Effect
H0: μA1 = μA2 = μA3 = • • •
HA: Not all μAi are equal
H0: factors A and B do not interact to affect the mean response
HA: factors A and B do interact
F Test for Factor A Main Effect
H0: μB1 = μB2 = μB3 = • • •
HA: Not all μBi are equal
Reject H0 if F > F
MSEMSF A
MSEMSF B
MSEMSF AB
Reject H0 if F > F
Reject H0 if F > F
Chap 11-56
Two-Way ANOVASummary Table
Source ofVariation
Sum ofSquares
Degrees of Freedom
Mean Squares
FStatistic
Factor A SSA a – 1 MSA = SSA /(a – 1)
MSA
MSE
Factor B SSB b – 1MSB
= SSB /(b – 1)MSB
MSE
AB(Interaction) SSAB (a – 1)(b – 1) MSAB
= SSAB / [(a – 1)(b – 1)]MSAB
MSE
Error SSE N – ab MSE = SSE/(N – ab)
Total SST N – 1
Chap 11-57
Features of Two-Way ANOVA F Test
Degrees of freedom always add up N-1 = (N-ab) + (a-1) + (b-1) + (a-1)(b-1) Total = error + factor A + factor B + interaction
The denominator of the F Test is always the same but the numerator is different
The sums of squares always add up SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B + interaction
Chap 11-58
Examples:Interaction vs. No Interaction
No interaction:
1 2
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels 1 2
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
Mea
n R
espo
nse
Mea
n R
espo
nse
Interaction is present:
Chap 11-59
Chapter Summary Described one-way analysis of variance
The logic of ANOVA ANOVA assumptions F test for difference in k means The Tukey-Kramer procedure for multiple comparisons
Described randomized complete block designs F test Fisher’s least significant difference test for multiple
comparisons Described two-way analysis of variance
Examined effects of multiple factors and interaction