ANOVA
The same slopes, but different intercepts
versicolor
verginica
setosa
2.0 2.5 3.0 3.5 4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
sepal.width
sepa
l.len
gth
kjkjkkj XY 50,,2,1 j
n
n
n
n
n
n
n
n
n
X
XX
XX
XX
Y
YY
YY
YY
3
31
2
21
1
12
11
3
31
2
21
1
12
11
321
3
31
2
21
1
12
11
1
10
00
00
0
01
10
00
1
11
11
11
),0(~ 2 Nkjiid
3,2,1k)50( n)3( K
3
32
21
11
dummy variables
factor (categoriacal) variable
versicolor
verginica
setosa
jjj XY 1111 jjj XY 22212 jjj XY 33313
for setosa for versicolor for verginica
kjkjkkj XY 50,,2,1 j
n
n
n
n
n
n
n
n
n
X
XX
XX
XX
Y
YY
YY
YY
3
31
2
21
1
12
11
3
31
2
21
1
12
11
321
3
31
2
21
1
12
11
1
10
00
00
0
01
10
00
1
11
11
11
),0(~ 2 Nkjiid
3,2,1k)50( n)3( K
3
32
21
11
dummy variables
factor (categoriacal) variable
versicolor
verginica
setosa
jjj XY 1111 jjj XY 22212 jjj XY 33313
for setosa for versicolor for verginica
kjkkjY
jjY 111
jjY 2212
jjY 3313
3210 : H )(: 3211 notH
0: 320 H )0(: 321 notH
ANOVA : Analysis of Variances A statistical method for comparisons of (population) means of many groups. For comparison of two groups, t-test is applicable. ANOVA is a generalizedmethod of t-test in this view. ANOVA does not aim to compare variances.
Equivalence of group means
R adopts this convention.
> is.factor(iris$Species)[1] TRUE> rout<- lm(Sepal.Length~Species,data=iris)> anova(rout)Analysis of Variance TableResponse: Sepal.Length Df Sum Sq Mean Sq F value Pr(>F) Species 2 63.212 31.606 119.26 < 2.2e-16 ***Residuals 147 38.956 0.265 > summary(rout)
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0060 0.0728 68.762 < 2e-16 ***Speciesversicolor 0.9300 0.1030 9.033 8.77e-16 ***Speciesvirginica 1.5820 0.1030 15.366 < 2e-16 ***
Residual standard error: 0.5148 on 147 degrees of freedomMultiple R-squared: 0.6187, Adjusted R-squared: 0.6135 F-statistic: 119.3 on 2 and 147 DF, p-value: < 2.2e-16
)0(: 321 notHThis results support . That is, group means of sepal.length are not the same (at least one group hasdifferent group mean to the other two groups.
)0(: 321 notH
02 03
Tomato data
weight treatment1 1.50 water2 1.90 water3 1.30 water4 1.50 water5 2.40 water6 1.50 water7 1.50 Nutrient8 1.20 Nutrient9 1.20 Nutrient10 2.10 Nutrient11 2.90 Nutrient12 1.60 Nutrient13 1.90 Nutrient+24D14 1.60 Nutrient+24D15 0.80 Nutrient+24D16 1.15 Nutrient+24D17 0.90 Nutrient+24D18 1.60 Nutrient+24D
Comparison for the weights of tomatoaccording to the treatment (trt).
There are 3 treatment groups; water, nutrient, and nutrient with 2,4D component.
The aim of this study is to see whether the nutrient and the 2,4D will increase(or decrease) the weight of tomato.
> x<- c(1.5,1.9,1.3,1.5,2.4,1.5,1.5,1.2,1.2,2.1,2.9,1.6, + 1.9,1.6,0.8,1.15,0.9,1.6)> tx<- rep(c("water", "Nutrient", "Nutrient+24D"), c(6, 6, 6))> ( tomato <- data.frame(weight=x, trt =tx ) )
> stripchart(weight~trt,pch=16, cex=1.4 ,col="red", data=tomato)> with(tomato, points(weight,trt, pch=16, cex=0.6 ,col="yellow"))
Tomato data
1.0 1.5 2.0 2.5
wat
erN
utrie
ntN
utrie
nt+2
4D
weight
> is.factor(tomato$trt)[1] TRUE
> (sout1<- lm(weight~trt,data=tomato) )
Call: lm(formula = weight ~ trt, data = tomato)
Coefficients: (Intercept) trtNutrient+24D trtwater 1.75000 -0.42500 -0.06667
> tomato$trt <- relevel(tomato$trt, ref="water") > (sout2<- lm(weight~trt,data=tomato) )
Call: lm(formula = weight ~ trt, data = tomato)
Coefficients: (Intercept) trtNutrient trtNutrient+24D 1.68333 0.06667 -0.35833
> anova(sout1)Analysis of Variance TableResponse: weight Df Sum Sq Mean Sq F value Pr(>F)trt 2 0.6269 0.31347 1.2019 0.328Residuals 15 3.9121 0.26081
sout1 and sout2 are the same analysis, but use different base levels.
jjY 111
jjY 2212
jjY 3313
jjY 1311
jjY 2322
jjY 333
n
n
n
n
n
n
Y
YY
YY
YY
3
31
2
21
1
12
11
321
3
31
2
21
1
12
11
1
10
00
00
0
01
10
00
1
11
11
11
jjY 111 jjY 2212 jjY 3313
for water for nutrient for nutrient+24D
n
n
n
n
n
n
Y
YY
YY
YY
3
31
2
21
1
12
11
321
3
31
2
21
1
12
11
1
11
11
11
0
01
10
00
0
00
01
11
jjY 1311 jjY 2322 jjY 333
> summary(sout1)
Call: lm(formula = weight ~ trt, data = tomato)
Residuals: Min 1Q Median 3Q Max -0.5500 -0.3500 -0.1792 0.2750 1.1500
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.75000 0.20849 8.394 4.74e-07 ***trtNutrient+24D -0.42500 0.29485 -1.441 0.170 trtwater -0.06667 0.29485 -0.226 0.824
> summary(sout2)
Call: lm(formula = weight ~ trt, data = tomato)
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.68333 0.20849 8.074 7.69e-07 ***trtNutrient 0.06667 0.29485 0.226 0.824 trtNutrient+24D -0.35833 0.29485 -1.215 0.243
Residual standard error: 0.5107 on 15 degrees of freedomMultiple R-squared: 0.1381, Adjusted R-squared: 0.02321 F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328
Everything is the same for sout1 and sout2except the base levels.
No sure evidence of and
01 02
> anova(sout2)Analysis of Variance Table
Response: weight Df Sum Sq Mean Sq F value Pr(>F)trt 2 0.6269 0.31347 1.2019 0.328Residuals 15 3.9121 0.26081
Source Sum of Sq.
df MS F P-value
Treatment SSTR K-1 MSTR MSTR/MSE F(K-1,N-K)
Error (Residuals) SSE N-K MSETotal SST N-1
ANOVA table
SST SSE SSTR
A company specializing in preparing students for college entrance exams had the business objective of improving its ACT preparatory course. Two factors of interest to the company are the length of the course ( a condensed 10-day period or a regular 30-day period) and the type of course (traditional classroom or online distance learning). The company collected data by randomly assigning 10 clients to the 4 cells of combinations of the two factors. What are the effects of the type of course and the length of the course on ACT scores?
ACT score data (artificial data)
Condensed (C )
Regular (R )
Traditional (T)
26, 18,27, 24,25,
19,21, 20,21, 18
34, 28,24,21,35, 23,31, 29,28,
26Online (O) 27, 21,29,
32,30, 20,24, 28,30, 29
24, 21,16, 19,22, 19,20,
24,23, 25
> y<-c(26,18,34,28,27,24,24,21,25,19,35,23,21,20,31,29,21,18,28,26,27,21,+ 24,21,29,32,16,19,30,20,22,19,24,28,20,24,30,29,23,25)> ltx<-c("C","C","R","R","C","C","R","R","C","C","R","R","C","C","R","R",+ "C","C","R","R","C","C","R","R","C","C","R","R","C","C","R","R","C","C",+ "R","R","C","C","R","R")> tpx<-c("T","T","T","T","T","T","T","T","T","T","T","T","T","T","T","T",+ "T","T","T","T","O","O","O","O","O","O","O","O","O","O","O","O","O","O",+ "O","O","O","O","O","O")> act<-data.frame(score=y, length=ltx, type=tpx)
n
n
n
n
n
n
n
n
Y
YY
YY
YY
YY
21
211
21
211
12
121
11
112
111
22211211
21
211
21
211
12
121
11
112
111
1
10
00
00
00
1
11
10
00
00
1
10
01
10
00
1
11
11
11
11
jjY 21211121
for Condensed & Online
for Condensed & Traditional
for Regular & Online
for Regular & Traditional
jjY 111111
jjY 12121112
jjY 2221121122
)10(10,...,2,1 nj
jjY 222221121122
When interaction effect is assumed :
No interaction model for ACT data
> head(act) score length type1 26 C T2 18 C T3 34 R T4 28 R T5 27 C T6 24 C T
This model appears not so adequate.
> aout1<- aov(score~length+type,data=act)> summary(aout1) Df Sum Sq Mean Sq F value Pr(>F)length 1 0.22 0.225 0.0098 0.9217type 1 5.63 5.625 0.2448 0.6237Residuals 37 850.13 22.976 > summary.lm(aout1)
Call:aov(formula = score ~ length + type, data = act)
Residuals: Min 1Q Median 3Q Max -8.225 -3.862 -0.225 3.250 10.025
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 24.075 1.313 18.340 <2e-16 ***lengthR 0.150 1.516 0.099 0.922 typeT 0.750 1.516 0.495 0.624 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.793 on 37 degrees of freedomMultiple R-squared: 0.006834, Adjusted R-squared: -0.04685 F-statistic: 0.1273 on 2 and 37 DF, p-value: 0.8808
Two-way ANOVA without interaction model
This result shows no interaction model appears not so meaningful for the ACT data. In this model, we may accept .
02112
In this model, all the effects looks significant.That is, , looks surely negative and is surely positive.
> aout2<- aov(score~length*type,data=act)> summary(aout2) Df Sum Sq Mean Sq F value Pr(>F) length 1 0.22 0.22 0.0159 0.9002 type 1 5.63 5.63 0.3987 0.5318 length:type 1 342.22 342.22 24.2569 1.888e-05 ***Residuals 36 507.90 14.11 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > summary.lm(aout2)
Call:aov(formula = score ~ length * type, data = act)
Residuals: Min 1Q Median 3Q Max -7.000 -2.450 0.100 2.775 7.100
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.000 1.188 22.731 < 2e-16 ***lengthR -5.700 1.680 -3.393 0.00169 ** typeT -5.100 1.680 -3.036 0.00444 ** lengthR:typeT 11.700 2.376 4.925 1.89e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.756 on 36 degrees of freedomMultiple R-squared: 0.4066, Adjusted R-squared: 0.3572 F-statistic: 8.224 on 3 and 36 DF, p-value: 0.0002686
Two-way ANOVA with interaction model
In R, length*type = length+type+length:type
In R, length:type means interaction effect between length and type.
This model looks well fitted.
12 2122
Conclusion: For traditional (T) type, course of regular (R) length, and for online (O) type, course of condensed (C) length showed better scores.
ACT score data
> with(act, interaction.plot(length, type,score, xlab="First factor"))
2223
2425
2627
28
First factor
mea
n of
sco
re
C R
type
TO
Decompositions of variations in two-way ANOVA
SST SSE SSTRA
No interaction model:
Interaction model:
SSTRB
SST SSE SSTRA SSTRB
SSTRAB
Source Sum of Sq.
df MS F P-value
Treatment A SSTRA KA-1 MSTRA MSTRA/MSE F(KA-1,N-KA-KB+1)
Treatment B SSTRB KB-1 MSTRB MSTRB/MSE F(KB-1,N-KA-KB+1)
Error (Residuals)
SSE N-KA-KB+1 MSE
Total SST N-1
ANOVA table
Source Sum of Sq.
df MS F P-value
Treatment A SSTRA KA-1 MSTRA MSTRA/MSE F(KA-1,N-KA-KB+1)
Treatment B SSTRB KB-1 MSTRB MSTRB/MSE F(KB-1,N-KA-KB+1)
Treatment A:B SSTRAB (KA-1)(KB-1) MSTRAB MSTRAB/MSE F(dfAB,N-KA-KB+1)
Error (Residuals)
SSE N-KAKB MSE
Total SST N-1
No interaction model:
Interaction model:
Yesterday, YD discovered the secret diary written by R. A. Fisher. R. A. Fisher made a note on his iris data in the diary. He mentioned that he collected the data in five days. In each day he got 10 irises for each 3 species (varieties) of iris by randomly picking from his garden, and measured the lengths of sepals and petals for the selected 30 flowers. From the note of the diary, YD recovered a new variable date which means the date when R. A. Fisher measured the flowers, and YD added the new variable to the iris data. The new dataset is named irix. Sizes of sepals and petals vary on the conditions changing day by day, such as temperature and humidity.
> dtx<- c(3,5,4,5,1,5,4,1,4,4,5,2,3,3,3,1,3,4,3,5,5,3,4,2,2,5,4,1,5,1,2,3,2,5,1,5,+ 1,2,4,3,4,2,2,2,1,1,1,4,2,3,3,2,1,2,1,4,3,1,3,5,1,1,2,2,4,5,4,2,3,2,1,2,2,2,3,1,+ 5,5,4,4,5,1,4,4,1,4,3,3,3,3,4,2,5,4,5,1,5,5,3,5,2,2,5,4,4,1,4,5,2,2,5,1,3,4,3,3,+ 3,3,5,4,1,3,1,4,4,5,2,4,5,5,1,1,2,5,4,1,3,3,5,2,5,1,2,1,3,3,1,2,4,2)> irix<-data.frame(iris,date=factor(dtx))> names(irix)<- c("sl","sw","pl","pw","spc","date") > head(irix) sl sw pl pw spc date1 5.1 3.5 1.4 0.2 setosa 32 4.9 3.0 1.4 0.2 setosa 53 4.7 3.2 1.3 0.2 setosa 44 4.6 3.1 1.5 0.2 setosa 5
> aout1<-aov(sl~spc,data=irix) > summary(aout1) Df Sum Sq Mean Sq F value Pr(>F) spc 2 63.212 31.606 119.26 < 2.2e-16 ***Residuals 147 38.956 0.265 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > aout2<-aov(sl~spc+date,data=irix) > summary(aout2) Df Sum Sq Mean Sq F value Pr(>F) spc 2 63.212 31.606 136.3884 < 2.2e-16 ***date 4 5.818 1.455 6.2765 0.0001108 ***Residuals 143 33.138 0.232 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
When the date variable is introduced to the model, the MSE is slightly lowered, and it increases the F-value of the effect of the other factors. This means, by eliminating the variation due to the date of the observations, inference for the effects of interest cab be done more precisely. In this example dates of observation is a kind of block.
Blocking to "remove" the effect of nuisance factorsFor randomized block designs, there are factors or variables that are of primary interest. However, there are also several other nuisance factors. Nuisance factors are those that may affect the measured result, but are not of primary interest. For example, in applying a treatment, nuisance factors might be the specific operator who prepared the treatment, the time of day the experiment was run, and the room temperature. A nuisance factor is used as a blocking factor if every level of the primary factor occurs the same number of times with each level of the nuisance factor. The analysis of the experiment will focus on the effect of varying levels of the primary factor within each block of the experiment.In the analysis of irix data, the date variable is used as blocking factor, each day of observationis a block. R. A. Fisher randomly selected the flowers in each day (in a block), while keeping thenumber of flowers of a species balanced (10 flowers for each species). This is an example of (balanced) randomized block design.
Blocks of irix data
1st day 2nd day 3rd day 4th day 5th day
id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc5 5 3.6 1.4 0.2 S 12 4.8 3.4 1.6 0.2 S 1 5.1 3.5 1.4 0.2 S 3 4.7 3.2 1.3 0.2 S 2 4.9 3 1.4 0.2 S8 5 3.4 1.5 0.2 S 24 5.1 3.3 1.7 0.5 S 13 4.8 3 1.4 0.1 S 7 4.6 3.4 1.4 0.3 S 4 4.6 3.1 1.5 0.2 S16 5.7 4.4 1.5 0.4 S 25 4.8 3.4 1.9 0.2 S 14 4.3 3 1.1 0.1 S 9 4.4 2.9 1.4 0.2 S 6 5.4 3.9 1.7 0.4 S28 5.2 3.5 1.5 0.2 S 31 4.8 3.1 1.6 0.2 S 15 5.8 4 1.2 0.2 S 10 4.9 3.1 1.5 0.1 S 11 5.4 3.7 1.5 0.2 S30 4.7 3.2 1.6 0.2 S 33 5.2 4.1 1.5 0.1 S 17 5.4 3.9 1.3 0.4 S 18 5.1 3.5 1.4 0.3 S 20 5.1 3.8 1.5 0.3 S35 4.9 3.1 1.5 0.2 S 38 4.9 3.6 1.4 0.1 S 19 5.7 3.8 1.7 0.3 S 23 4.6 3.6 1 0.2 S 21 5.4 3.4 1.7 0.2 S37 5.5 3.5 1.3 0.2 S 42 4.5 2.3 1.3 0.3 S 22 5.1 3.7 1.5 0.4 S 27 5 3.4 1.6 0.4 S 26 5 3 1.6 0.2 S45 5.1 3.8 1.9 0.4 S 43 4.4 3.2 1.3 0.2 S 32 5.4 3.4 1.5 0.4 S 39 4.4 3 1.3 0.2 S 29 5.2 3.4 1.4 0.2 S46 4.8 3 1.4 0.3 S 44 5 3.5 1.6 0.6 S 40 5.1 3.4 1.5 0.2 S 41 5 3.5 1.3 0.3 S 34 5.5 4.2 1.4 0.2 S47 5.1 3.8 1.6 0.2 S 49 5.3 3.7 1.5 0.2 S 50 5 3.3 1.4 0.2 S 48 4.6 3.2 1.4 0.2 S 36 5 3.2 1.2 0.2 S53 6.9 3.1 4.9 1.5 V 52 6.4 3.2 4.5 1.5 V 51 7 3.2 4.7 1.4 V 56 5.7 2.8 4.5 1.3 V 60 5.2 2.7 3.9 1.4 V55 6.5 2.8 4.6 1.5 V 54 5.5 2.3 4 1.3 V 57 6.3 3.3 4.7 1.6 V 65 5.6 2.9 3.6 1.3 V 66 6.7 3.1 4.4 1.4 V58 4.9 2.4 3.3 1 V 63 6 2.2 4 1 V 59 6.6 2.9 4.6 1.3 V 67 5.6 3 4.5 1.5 V 77 6.8 2.8 4.8 1.4 V61 5 2 3.5 1 V 64 6.1 2.9 4.7 1.4 V 69 6.2 2.2 4.5 1.5 V 79 6 2.9 4.5 1.5 V 78 6.7 3 5 1.7 V62 5.9 3 4.2 1.5 V 68 5.8 2.7 4.1 1 V 75 6.4 2.9 4.3 1.3 V 80 5.7 2.6 3.5 1 V 81 5.5 2.4 3.8 1.1 V71 5.9 3.2 4.8 1.8 V 70 5.6 2.5 3.9 1.1 V 87 6.7 3.1 4.7 1.5 V 83 5.8 2.7 3.9 1.2 V 93 5.8 2.6 4 1.2 V76 6.6 3 4.4 1.4 V 72 6.1 2.8 4 1.3 V 88 6.3 2.3 4.4 1.3 V 84 6 2.7 5.1 1.6 V 95 5.6 2.7 4.2 1.3 V82 5.5 2.4 3.7 1 V 73 6.3 2.5 4.9 1.5 V 89 5.6 3 4.1 1.3 V 86 6 3.4 4.5 1.6 V 97 5.7 2.9 4.2 1.3 V85 5.4 3 4.5 1.5 V 74 6.1 2.8 4.7 1.2 V 90 5.5 2.5 4 1.3 V 91 5.5 2.6 4.4 1.2 V 98 6.2 2.9 4.3 1.3 V96 5.7 3 4.2 1.2 V 92 6.1 3 4.6 1.4 V 99 5.1 2.5 3 1.1 V 94 5 2.3 3.3 1 V 100 5.7 2.8 4.1 1.3 V106 7.6 3 6.6 2.1 G 101 6.3 3.3 6 2.5 G 113 6.8 3 5.5 2.1 G 104 6.3 2.9 5.6 1.8 G 103 7.1 3 5.9 2.1 G112 6.4 2.7 5.3 1.9 G 102 5.8 2.7 5.1 1.9 G 115 5.8 2.8 5.1 2.4 G 105 6.5 3 5.8 2.2 G 108 7.3 2.9 6.3 1.8 G121 6.9 3.2 5.7 2.3 G 109 6.7 2.5 5.8 1.8 G 116 6.4 3.2 5.3 2.3 G 107 4.9 2.5 4.5 1.7 G 111 6.5 3.2 5.1 2 G123 7.7 2.8 6.7 2 G 110 7.2 3.6 6.1 2.5 G 117 6.5 3 5.5 1.8 G 114 5.7 2.5 5 2 G 119 7.7 2.6 6.9 2.3 G131 7.4 2.8 6.1 1.9 G 127 6.2 2.8 4.8 1.8 G 118 7.7 3.8 6.7 2.2 G 120 6 2.2 5 1.5 G 126 7.2 3.2 6 1.8 G132 7.9 3.8 6.4 2 G 133 6.4 2.8 5.6 2.2 G 122 5.6 2.8 4.9 2 G 124 6.3 2.7 4.9 1.8 G 129 6.4 2.8 5.6 2.1 G136 7.7 3 6.1 2.3 G 140 6.9 3.1 5.4 2.1 G 137 6.3 3.4 5.6 2.4 G 125 6.7 3.3 5.7 2.1 G 130 7.2 3 5.8 1.6 G142 6.9 3.1 5.1 2.3 G 143 5.8 2.7 5.1 1.9 G 138 6.4 3.1 5.5 1.8 G 128 6.1 3 4.9 1.8 G 134 6.3 2.8 5.1 1.5 G144 6.8 3.2 5.9 2.3 G 148 6.5 3 5.2 2 G 145 6.7 3.3 5.7 2.5 G 135 6.1 2.6 5.6 1.4 G 139 6 3 4.8 1.8 G147 6.3 2.5 5 1.9 G 150 5.9 3 5.1 1.8 G 146 6.7 3 5.2 2.3 G 149 6.2 3.4 5.4 2.3 G 141 6.7 3.1 5.6 2.4 G
sl: Sepal.Length, sw:Sepal.Width, pl:Petal.Length, pw:Petal.Width, S: setosa, V:versicolor, G: verginicaIn each block, the measurement is done by random order.
ID Control Treatment
1 0.7 1.9
2 -1.6 0.8
3 -0.2 1.1
4 -1.2 0.1
5 -0.1 -0.1
6 3.4 4.4
7 3.7 5.5
8 0.8 1.6
9 0.0 4.6
10 2.0 3.4
Student’s sleep data> sleep extra group ID1 0.7 1 12 -1.6 1 23 -0.2 1 34 -1.2 1 45 -0.1 1 56 3.4 1 67 3.7 1 78 0.8 1 89 0.0 1 910 2.0 1 1011 1.9 2 112 0.8 2 213 1.1 2 314 0.1 2 415 -0.1 2 516 4.4 2 617 5.5 2 718 1.6 2 819 4.6 2 920 3.4 2 10
blocks
blocking factor
Student’s sleep data
> t.test(extra ~ group, paired=T, data = sleep)
Paired t-test
data: extra by group t = -4.0621, df = 9, p-value = 0.002833
> summary(aov(extra~group+ID,data=sleep)) Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.482 16.5009 0.002833 **ID 9 58.078 6.453 8.5308 0.001901 **Residuals 9 6.808 0.756
Paired sample t-test is also done by ANOVA, by assigning the subject variable (personal variation) to blocking factor. Note that p-value 0.002833 for the group effect in the ANOVA table is the same with that of the paired t-test.
Root dry mass and shoot dry mass of rice are recorded according to the varieties of wild type(wt) and modified type (ANU843), and types of fertilizers (F10, NH4Cl, NH4NO3) used in cultivating. Two lots of fields (blocks) were used in raising the rice.
Rice data
The aim of this study is to see the effects of varieties of rice and the fertilizers
ID PlantNo Block RootDryMass ShootDryMas
s trt fert variety1 1 1 56 132 F10 F10 wt2 2 1 66 120 F10 F10 wt… … … … … … … …11 11 2 44 37 F10 F10 wt12 12 2 41 109 F10 F10 wt13 1 1 12 45 NH4Cl NH4Cl wt14 2 1 20 60 NH4Cl NH4Cl wt… … … … … … … …23 11 2 13 55 NH4Cl NH4Cl wt24 12 2 7 34 NH4Cl NH4Cl wt25 1 1 12 71 NH4NO3 NH4NO3 wt26 2 1 18 78 NH4NO3 NH4NO3 wt… … … … … … … …35 11 2 11 51 NH4NO3 NH4NO3 wt36 12 2 20 64 NH4NO3 NH4NO3 wt37 1 1 6 8 F10 +ANU843 F10 ANU84338 2 1 4 6 F10 +ANU843 F10 ANU843… … … … … … … …47 11 2 12 15 F10 +ANU843 F10 ANU84348 12 2 7 8 F10 +ANU843 F10 ANU84349 1 1 4 22 NH4Cl +ANU843 NH4Cl ANU84350 2 1 10 36 NH4Cl +ANU843 NH4Cl ANU843… … … … … … … …59 11 2 8 59 NH4Cl +ANU843 NH4Cl ANU84360 12 2 14 61 NH4Cl +ANU843 NH4Cl ANU84361 1 1 19 75 NH4NO3
+ANU843 NH4NO3 ANU843
62 2 1 18 75 NH4NO3 +ANU843 NH4NO3 ANU843
… … … … … … … …71 11 2 7 47 NH4NO3
+ANU843 NH4NO3 ANU843
72 12 2 15 79 NH4NO3 +ANU843 NH4NO3 ANU843
> library(DAAG)> rice
Rice data
Shoot dry mass (g)
F10
NH4Cl
NH4NO3
F10 +ANU843
NH4Cl +ANU843
NH4NO3 +ANU843
0 50 100
2040
6080
100
Level of first factor
mea
n of
Sho
otD
ryM
ass
F10 NH4Cl NH4NO3
variety
wtANU843
> library(lattice)> myfun<-function(x,y,...){ panel.dotplot(x,y,pch=1,col="gray40")+ panel.average(x, y, type="p", col="black", pch=3, cex=1.25) }> dotplot(trt ~ ShootDryMass,data=rice,panel=myfun, xlab="Shoot dry mass (g)")> with(rice, interaction.plot(fert,variety,ShootDryMass,xlab="Level of first factor"))
> aout<- aov(ShootDryMass ~ Block + variety * fert, data=rice)
> summary(aout) Df Sum Sq Mean Sq F value Pr(>F) Block 1 3528 3528.0 10.902 0.001563 ** variety 1 22684 22684.5 70.100 6.366e-12 ***fert 2 7019 3509.4 10.845 8.625e-05 ***variety:fert 2 38622 19311.2 59.676 1.933e-15 ***Residuals 65 21034 323.6
> summary.lm(aout)
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 129.333 8.211 15.752 < 2e-16 ***Block -14.000 4.240 -3.302 0.00156 ** varietyANU843 -101.000 7.344 -13.753 < 2e-16 ***fertNH4Cl -58.083 7.344 -7.909 4.24e-11 ***fertNH4NO3 -35.000 7.344 -4.766 1.10e-05 ***varietyANU843:fertNH4Cl 97.333 10.386 9.372 1.10e-13 ***varietyANU843:fertNH4NO3 99.167 10.386 9.548 5.42e-14 ***
Residual standard error: 17.99 on 65 degrees of freedomMultiple R-squared: 0.7736, Adjusted R-squared: 0.7527 F-statistic: 37.01 on 6 and 65 DF, p-value: < 2.2e-16
Two-way ANOVA with interaction and block effect
penetrometer
Apple firmness data
> data.frame(firmness=apple) firmness1 6.82 7.33 7.24 7.35 7.46 7.37 6.88 7.69 7.210 6.511 7.712 7.713 7.414 7.015 7.216 7.617 6.718 6.719 7.220 6.8
> apple<-c(6.8,7.3,7.2,7.3,7.4,7.3,6.8,7.6,7.2,6.5,7.7,7.7,7.4,7.0,7.2,7.6,6.7,6.7,7.2,6.8)
Purpose : to check whether there is significant difference between the two testers
IIIfruit 1 2 3 4 5 6 7 8 9 10
Tester A B
1st 7.05 7.25 7.35 7.20 6.85 7.70 7.20 7.40 6.70 7.00
Ifruit 1 2 3 4 5
Tester
A 7.05 7.25 7.35 7.20 6.85B 7.70 7.20 7.40 6.70 7.00
II
Tester fruit 1 2 3 4 5
A 1st 6.8 7.2 7.4 6.8 7.22nd 7.3 7.3 7.3 7.6 6.5
B 1st 7.7 7.4 7.2 6.7 7.22nd 7.7 7.0 7.6 6.7 6.8
Select 10 (or 5) apples randomly from a box of apples, and assignto the testers randomly.
IV
fruit 1 2 3 4 5 6 7 8 9 10Tester A B
1st 6.8 7.2 7.4 6.8 7.2 7.7 7.4 7.2 6.7 7.22nd 7.3 7.3 7.3 7.6 6.5 7.7 7.0 7.6 6.7 6.8
mean 7.05 7.25 7.35 7.20 6.85 7.70 7.20 7.40 6.70 7.00
> apple<-c(6.8,7.3,7.2,7.3,7.4,7.3,6.8,7.6,7.2,6.5,7.7,7.7,7.4,7.0,7.2,7.6,6.7,6.7,7.2,6.8)> apple.mean<-apply(matrix(apple,2,),2,mean) > tx1<- tx3<- rep(c("A","B"),e=5); tx2<- tx4<- rep(c("A","B"),e=10)> fx1<- factor(rep(1:5,2)); fx2<- rep(fx1,e=2)> fx3<- factor(1:10); fx4<- rep(fx3,e=2)> apple1<-data.frame(firmness=apple.mean,tester=tx1,fruit=fx1)> apple2<-data.frame(firmness=apple,tester=tx2,fruit=fx2)> apple3<-data.frame(firmness=apple.mean,tester=tx3,fruit=fx3)> apple4<-data.frame(firmness=apple,tester=tx4,fruit=fx4)
III
j 1st 2nd 3rd 4th 5th
ifruit 1 2 3 4 5A 7.05 7.25 7.35 7.20 6.85
fruit 6 7 8 9 10B 7.70 7.20 7.40 6.70 7.00
IVfruit 1 2 3 4 5 6 7 8 9 10
Tester A B1st 6.8 7.2 7.4 6.8 7.2 7.7 7.4 7.2 6.7 7.22nd 7.3 7.3 7.3 7.6 6.5 7.7 7.0 7.6 6.7 6.8
IIIfruit 1 2 3 4 5 6 7 8 9 10
Tester A B
1st 7.05 7.25 7.35 7.20 6.85 7.70 7.20 7.40 6.70 7.00
IV
j 1st 2nd 3rd 4th 5th
i
fruit 1 2 3 4 5A 6.8 7.2 7.4 6.8 7.2
7.3 7.3 7.3 7.6 6.5fruit 6 7 8 9 10B 7.7 7.4 7.2 6.7 7.2
7.7 7.0 7.6 6.7 6.8
Nested notation: j vs. i(j)
In table III & IV, 10 apples are tested,but they might be indexed by the variable j varying from 1 to 5. The meaning of j-th apple is changing according to the value of the variable i.In this case we need to use the notation i(j) instead of using j simply.
Random effect
Point of interests:
-Effects of specific tester A and B on the measurement (O)-Effects of specific apples (X)
Each measurement varies randomly
Each apple also has its effect on the measurement, but the apples tested are randomly selected ones.
Effects of apples are random (random effect).
Selecting randomly
Fixed effect and random effect
Points of interest
The 10 apples
A box of apples
An orchard of apples
There is (no) difference in the firmness of the 10 apples.
There is (no) difference in the firmness of apples in the box.
There is (no) difference in the firmness of apples in the orchard.
Conclusion
Tested 10 apples
Taking all apples
Selecting randomly
Fixed effect
Random effect
Ifruit 1 2 3 4 5
Tester A 7.05 7.25 7.35 7.20 6.85B 7.70 7.20 7.40 6.70 7.00
Table I :
> summary(aov(firmness~tester+fruit,data=apple1)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.00900 0.1056 0.7615fruit 4 0.391 0.09775 1.1466 0.4489Residuals 4 0.341 0.08525 > summary(aov(firmness~tester+Error(fruit),data=apple1))
Error: fruit Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 0.391 0.09775
Error: Within Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.00900 0.1056 0.7615Residuals 4 0.341 0.08525
> t.test(firmness~tester,data=apple1)
Welch Two Sample t-test
data: firmness by tester t = -0.3136, df = 5.962, p-value = 0.7645
> t.test(firmness~tester,paired=T,data=apple1)
Paired t-test
data: firmness by tester t = -0.3249, df = 4, p-value = 0.7615
Two-way ANOVA
Two-way ANOVA with a random effect
Paired sample t-test
Two sample t-test
(X)
(O)
(O)
( )
> summary(aov(firmness~tester+fruit,data=apple3)) Df Sum Sq Mean Sqtester 1 0.009 0.0090 fruit 8 0.732 0.0915> summary(aout3<-aov(firmness~tester+Error(fruit),data=apple3))Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915 > summary(aov(firmness~tester,data=apple3)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915 > coef(aout3)(Intercept) 7.17 fruit : testerB 0.06
Table III :
III
j 1st 2nd 3rd 4th 5th
i
fruit 1 2 3 4 5A 7.05 7.25 7.35 7.20 6.85
fruit 6 7 8 9 10B 7.70 7.20 7.40 6.70 7.00
> t.test(firmness~tester,var.equal=T,data=apple3) Two Sample t-testt = -0.3136, df = 8, p-value = 0.7618> t.test(firmness~tester,data=apple3) Welch Two Sample t-testt = -0.3136, df = 5.962, p-value = 0.7645
In two-way ANOVA,The effects of fruit and random errors are impossible to decompose.
Declaring the fruit effect is random !
For table III, the two-sample t-test assuming equal variances is equivalent to the ANOVA
( )
> summary(aov(firmness~tester+fruit,data=apple2)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.01800 0.1554 0.6994fruit 4 0.782 0.19550 1.6874 0.2086Residuals 14 1.622 0.11586
> summary(aout2<-aov(firmness~tester+Error(fruit),data=apple2))
Error: fruit Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 0.782 0.1955
Error: Within Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.01800 0.1554 0.6994Residuals 14 1.622 0.11586
II
Tester fruit 1 2 3 4 5
A 1st 6.8 7.2 7.4 6.8 7.22nd 7.3 7.3 7.3 7.6 6.5
B 1st 7.7 7.4 7.2 6.7 7.22nd 7.7 7.0 7.6 6.7 6.8
(O)
Check ! > coef(aout2) > summary.lm(aout$Within)
( )
> summary(aov(firmness~tester+Error(fruit),data=apple4))
Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.018 0.0984 0.7618Residuals 8 1.464 0.183
Error: Within Df Sum Sq Mean Sq F value Pr(>F)Residuals 10 0.94 0.094 > summary(aov(firmness~fruit,data=apple4)) Df Sum Sq Mean Sq F value Pr(>F)fruit 9 1.482 0.16467 1.7518 0.1975Residuals 10 0.940 0.09400
IV
j 1st 2nd 3rd 4th 5th
i
fruit 1 2 3 4 5A 6.8 7.2 7.4 6.8 7.2
7.3 7.3 7.3 7.6 6.5fruit 6 7 8 9 10B 7.7 7.4 7.2 6.7 7.2
7.7 7.0 7.6 6.7 6.8
> summary(aov(firmness~tester+Error(fruit),data=apple3))Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915
Compare with the result for table III
Student’s sleep data
> t.test(extra ~ group, paired=T, data = sleep) Paired t-testdata: extra by group t = -4.0621, df = 9, p-value = 0.002833
> summary(aov(extra~group+ID,data=sleep)) Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.482 16.5009 0.002833 **ID 9 58.078 6.453 8.5308 0.001901 **Residuals 9 6.808 0.756
> summary(aov(extra~group+Error(ID),data=sleep))
Error: ID Df Sum Sq Mean Sq F value Pr(>F)Residuals 9 58.078 6.4531
Error: Within Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.4820 16.501 0.002833 **Residuals 9 6.808 0.7564
Declaring the block is random, but no practical difference in this kind simple example.
Rat data : Sokal, R. R., and Rohlf, F. J. (1995) Biometry. W. H. Freeman and Co., New York.
A: control, B: compound 217,C: compound 217+sugar
> install.packages("asbio")> library(asbio)> data(rat)> ?rat> ratx<-rat> names(ratx)<- tolower(names(rat))> ratx$treatment<-factor(ratx$treatment)> ratx$rat<-factor(ratx$rat)> ratx$liver<-factor(ratx$liver)
Rat 1
Rat 2Treatment A
Rat 1
Livertissue 1
Glycogen reading 1Glycogen reading 2
Livertissue 2
Glycogen reading 1Glycogen reading 2
Livertissue 3
Glycogen reading 1Glycogen reading 2
Rat 3
Rat 4Treatment B
Rat 5
Rat 6Treatment C
> gly<-c(131,130,131,125,136,142,150,148,140,143,160,150,157,145,154,142,147,153,+ 151,155,147,147,162,152,134,125,138,138,135,136, 138,140,139,138,134,127)> trt<- rep(LETTERS[1:3],e=12)> rx1<- factor(rep(rep(1:2,e=6),3))> rx2<- factor(rep(1:6,e=6))> lx<- factor(rep(rep(1:3,e=2),6))> ratx<-data.frame(glycogen=gly,treatment=trt,rat=rx1,liver=lx)> raty<-data.frame(glycogen=gly,treatment=trt,rat=rx2,liver=lx)
> ratx glycogen treatment rat liver1 131 A 1 12 130 A 1 13 131 A 1 24 125 A 1 25 136 A 1 36 142 A 1 37 150 A 2 18 148 A 2 19 140 A 2 210 143 A 2 211 160 A 2 312 150 A 2 313 157 B 1 114 145 B 1 115 154 B 1 216 142 B 1 217 147 B 1 318 153 B 1 319 151 B 2 120 155 B 2 121 147 B 2 222 147 B 2 223 162 B 2 324 152 B 2 325 134 C 1 126 125 C 1 127 138 C 1 228 138 C 1 229 135 C 1 330 136 C 1 331 138 C 2 132 140 C 2 133 139 C 2 234 138 C 2 235 134 C 2 336 127 C 2 3
> raty glycogen treatment rat liver1 131 A 1 12 130 A 1 13 131 A 1 24 125 A 1 25 136 A 1 36 142 A 1 37 150 A 2 18 148 A 2 19 140 A 2 210 143 A 2 211 160 A 2 312 150 A 2 313 157 B 3 114 145 B 3 115 154 B 3 216 142 B 3 217 147 B 3 318 153 B 3 319 151 B 4 120 155 B 4 121 147 B 4 222 147 B 4 223 162 B 4 324 152 B 4 325 134 C 5 126 125 C 5 127 138 C 5 228 138 C 5 229 135 C 5 330 136 C 5 331 138 C 6 132 140 C 6 133 139 C 6 234 138 C 6 235 134 C 6 336 127 C 6 3
The data provided by R is ratx, but the meaning of data implies raty.
In ratx, the variable rat is nested in treatment.
Usage of aov for nested block structure (ratx data)
> summary(aov(glycogen~treatment+Error(rat/liver),data=ratx))
Error: rat Df Sum Sq Mean Sq F value Pr(>F)Residuals 1 413.44 413.44
Error: rat:liver Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 164.44 41.111
Error: Within Df Sum Sq Mean Sq F value Pr(>F) treatment 2 1557.6 778.78 18.251 8.437e-06 ***Residuals 28 1194.8 42.67
rat 1 rat 2 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3
Treatment A R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12Treatment B R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24Treatment C R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36
This case means the experiment with the same two rats, rat 1 and rat 2. More precise inference is possible for this case, if the experiment is possible.
Usage of aov for nested block structure (raty data)
> summary(aov(glycogen~treatment+Error(rat/liver),dat=raty))
Error: rat Df Sum Sq Mean Sq F value Pr(>F)treatment 2 1557.56 778.78 2.929 0.1971Residuals 3 797.67 265.89
Error: rat:liver Df Sum Sq Mean Sq F value Pr(>F)Residuals 12 594 49.5
Error: Within Df Sum Sq Mean Sq F value Pr(>F)Residuals 18 381 21.167
Treatment A Treatment
B Treatment C
Rat 1 Rat 2 Rat 3 Rat 4 Rat 5 Rat 6 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3R1 R2 R3 R4 R5 R6 R7 R8 R9 R1
0R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
Liver factor is also a nested factor. R recognizes that automatically, because the upper level factor rat is nested factor.Actually 18 pieces of liver tissues were taken. The sum of df of rat and rat:liver is (2+3+12) which is 18-1.
Thank you !!