PLS205 2014 6.1 Lab 6 (Topic 9)
PLS205 Lab 6 February 13, 2014
Laboratory Topic 9
∙ A word about factorials
∙ Specifying interactions among factorial effects in SAS
∙ The relationship between factors and treatment
∙ Interpreting results of an experiment with a factorial treatment structure
∙ Visualizing simple and main effects
∙ Visualizing three-way interactions
∙ APPENDIX: The Almost Practically Complete Analysis of Example 6.1
∙ APPENDIX 2: Graphing in Excel
A word about factorials
A factorial is not an experimental design. Why? Because the term "factorial" merely describes the
structure of the treatment effects (i.e. the factors), not how they are randomized. Specifically, a factorial
treatment structure is one in which all levels of every factor are present in all possible combinations with
all the levels of every other factor in the experiment (i.e. the crossing of factors is complete and
orthogonal). It is this complete, orthogonal structure that allows an experimenter to gain insight into
interactions among factors. Seen in this way, it becomes clear that any of the true experimental designs
(i.e. randomization strategies) we have discussed so far (CRD, RCBD, Latin Square) can be factorials,
provided the treatments are structured correctly.
A factorial is a complete, orthogonal structure of treatment effects
intended to provide insight into their interactions.
Specifying Interactions Among Factorial Effects in SAS
Specifications about designs with factorial treatment structures are entered through the Model statement
of the Proc GLM, and this syntax can assume one of two forms: Stars (*) or bars (|).
Stars are used to partition out specific interactions from the Treatment SS and are useful when certain
interactions must be used as error terms in custom F tests. Examples:
Model Resp = a b a*b specifies partitioning of SST into main effect A, me B and interaction AxB
Model Resp = a a*b a*b*c specifies me A and interactions AxB and AxBxC
Bars are used as a nice shortcut to partition the Treatment SS into all possible combinations of the
included factors. On a standard PC keyboard, the bar symbol (|) is typed as Shift-\. Examples:
Model Resp = a|b is equivalent to Model Resp = a b a*b
Model Resp = a|b|c is equivalent to Model Resp = a b c a*b a*c b*c a*b*c
An additional nice trick to know is the use of "@" in factorial model statements. The "@" symbol in
conjunction with bars (|) allows you to specify all possible combinations of model factors up to a certain
level (e.g. two-way effects), saving you lots of typing. An example:
Model Resp = Block a|b|c@2 is equivalent to Model Resp = Block a b c a*b a*c b*c
notice this excludes the three-way effect a*b*c
PLS205 2014 6.2 Lab 6 (Topic 9)
The Relationship Between Factors and Treatment
Until now, we have had only a single 'treatment' (the effect of which we are trying to understand) with
zero (CRD), one (RCBD), or two (LS) blocking variables (the effects of which we are trying to account
for but not really investigate). With factorials, we now have two or more 'factors' that are experimentally
equivalent to the single 'treatment' variable from the first half of the course. To illustrate this equivalence,
reconsider Example 1 from Lab 3 (Topics 4-5): An experiment with 6 treatments (L08, L12, L16, H08,
H12, H16), where L/H refers to Low/High temperatures and 8/12/16 refers to hours of light. This is
exactly equivalent to having temperature as one factor and light as another, organized as a factorial:
Model Growth = Treatment; df Treatment = 5
Model Growth = Temp Light Temp*Light; df Temp = 1
df Light = 2
df Temp*Light = 2
Sum = 5
What this is meant to show is that the old classification variable ‘Treatment’ is simply a combination of
two factors (light and temperature). Rewriting the model in terms of the factors does not affect the Model
df at all; it simply expands the class variable ‘Treatment’ into ‘Temp Light Temp*Light’. Before, we
accomplished this ‘opening up’ of the treatment through orthogonal contrasts. The insights gained
through each approach are equivalent.
Example 6.1 Two-Way ANOVA with interactions [Lab6ex1.sas]
In a study comparing the relative growth of five varieties of turfgrass (VARIETY) in three experimental
soil mixtures (SOIL), six pots were prepared with each VARIETY-SOIL combination. The 90 pots were
randomly allocated to six growth chambers (BLOCKS) and the dry matter yields were measured by
clipping the plants at the end of four weeks. In this experiment, the researchers are interested only in
these five varieties and three soil mixtures; so VARIETY and SOIL can be regarded as fixed factors.
Data RCBDFactorial;
Do Soil = 1 to 3;
Do Variety = 1 to 5;
Do Block = 1 to 6;
Input Yield @@;
Output;
End;
End;
End;
Cards;
22.1 24.1 19.1 22.1 25.1 18.1
27.1 15.1 20.6 28.6 15.1 24.6
22.3 25.8 22.8 28.3 21.3 18.3
19.8 28.3 26.8 27.3 26.8 26.8
20.0 17.0 24.0 22.5 28.0 22.5
13.5 14.5 11.5 6.0 27.0 18.0
16.9 17.4 10.4 19.4 11.9 15.4
15.7 10.2 16.7 19.7 18.2 12.2
PLS205 2014 6.3 Lab 6 (Topic 9)
15.1 6.5 17.1 7.6 13.6 21.1
21.8 22.8 18.8 21.3 16.3 14.3
19.0 22.0 20.0 14.5 19.0 16.0
20.0 22.0 25.5 16.5 18.0 17.5
16.4 14.4 21.4 19.9 10.4 21.4
24.5 16.0 11.0 7.5 14.5 15.5
11.8 14.3 21.3 6.3 7.8 13.8
;
Proc GLM Data = RCBDFactorial;
Class Soil Variety Block;
Model Yield = Soil|Variety Block; * This Model includes all main effects
as well as the Method*Variety interaction;
Proc GLM Data = RCBDFactorial;
Class Soil Variety Block;
Model Yield = Soil|Variety|Block@2; * Exploratory model to examine the
one-way block interactions (see discussion below);
Run;
Quit;
NOTE: This initial analysis enables us to see if the interaction is significant and decide:
Main or simple effects?
Take a look at the resultant ANOVA table:
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 19 1361.153778 71.639673 3.45 <.0001
Error 70 1451.637778 20.737683
Corrected Total 89 2812.791556
R-Square Coeff Var Root MSE Yield Mean
0.483916 24.69855 4.553865 18.43778
Source DF Type III SS Mean Square F Value Pr > F
Soil 2 953.1562222 476.5781111 22.98 <.0001
Variety 4 11.3804444 2.8451111 0.14 0.9680
Soil*Variety 8 374.4882222 46.8110278 2.26 0.0330
Block 5 22.1288889 4.4257778 0.21 0.9557
This is an RCBD with 6 blocks. Even though there are six replications per Method-Variety combination
(which allows us to include their interaction in the model), there is only one replication per Method-
Variety-Block combination. The upshot of this is that the Block*Factor interactions are inside the
experimental error for this ANOVA. In other words, if the model statement had been:
Model Yield = Soil|Variety|Block;
there would have been no variation left to estimate the error (dfe = 0), because:
Block * Soil = 10 df
Block * Variety = 20 df
Block * Soil * Variety = 40 df
70 df = dfe
PLS205 2014 6.4 Lab 6 (Topic 9)
We exclude the one-way Block interactions from the model because, in general, we don't care about them
(remember, we block to reduce the error term, not to gain understanding of the effect of blocking). In
other words, this is a choice we make. Excluding the two-way Block interaction is not a choice, however;
it cannot be a part of the model because it is the only term we have for our error. Of course, we still want
to check these Block*Treatment interactions to see if they are significant. It they are not significant it is
justifiable to relegate these interactions to the error term. If they are significant, you can attempt a
transformation or be aware that they will contribute to a larger MSE when taken out of the model.. To
test the Block*Treatment interactions they can simply be placed into an exploratory model (the second
Proc GLM above):
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 49 2040.397111 41.640757 2.16 0.0069
Error 40 772.394444 19.309861
Corrected Total 89 2812.791556
R-Square Coeff Var Root MSE Yield Mean
0.725399 23.83313 4.394299 18.43778
Source DF Type III SS Mean Square F Value Pr > F
Soil 2 953.1562222 476.5781111 24.68 <.0001
Variety 4 11.3804444 2.8451111 0.15 0.9631
Soil*Variety 8 374.4882222 46.8110278 2.42 0.0308
Block 5 22.1288889 4.4257778 0.23 0.9476
Soil*Block 10 242.2944444 24.2294444 1.25 0.2881 NS
Variety*Block 20 436.9488889 21.8474444 1.13 0.3589 NS
A little side note about what interactions to include in your model
Although one can do exploratory work with different interactions in the model and then merge
the Block*Treatment into the error, you should always keep the treatment interactions in the
model, whether significant or not. This makes it very clear to the reader the status of the
interaction and saves you from having to do a lot of explaining. Sometimes in higher order
factorials (e.g. four factors) the higher order interactions (e.g. 4-way interactions) are excluded
from the model if they are not significant to simplify the model.
Interpreting Results of an Experiment with a Factorial Treatment Structure
The above ANOVA results indicate that there are significant differences among soil mixtures but not
among varieties. More importantly, however, it shows that the interaction between these two factors is
significant (i.e. the effects of soil are different for the different varieties, and vice versa).
Because the interaction is significant, it is not appropriate to analyze the main effects.
One must compare the soil means separately for each variety (simple effects).
Example 6.1b [Lab6ex1b.sas]
Proc Sort Data = RCBDFactorial; To analyze simple effects, you must first sort
By Variety; by one of the factors (in this case, Variety)
Proc GLM Data = RCBDFactorial; and then run an ANOVA for each level of that factor
PLS205 2014 6.5 Lab 6 (Topic 9)
Class Soil Block;
Model Yield = Soil Block;
Means Soil / Tukey;
By Variety;
Run; Quit;
The above code tells SAS to generate five different ANOVAs, one for each variety. The results:
Variety Treatment Block MSD Tukey
1 0.0519 NS 0.1822 NS 6.45 1 = 3 3 = 2
2 0.0746 NS 0.5530 NS 7.15 1 = 3 = 2
3 0.0130 ** 0.3708 NS 5.90 1 = 3 3 = 2
4 0.0041 *** 0.6843 NS 8.38 1 3=2
5 0.0144 ** 0.8428 NS 7.50 1 = 2 2 = 3
By investigating the simple effects, we see that only some varieties are significantly affected
by the seed treatment. The MSE and means separation tests vary across varieties.
Visualizing Simple and Main Effects
Example 6.1c
proc gplot data=rcbdfactorial ;
** Main effect plots **;
axis1 offset=(5 pct,5 pct);
axis2 offset = (5 pct,5 pct);
symbol1 i=std1mtj v=none color=BLUE;
plot Yield * Soil = 1 /
description="Means plot of Yield by Soil";
run;
axis1 offset=(5 pct,5 pct);
axis2 offset = (5 pct,5 pct);
symbol1 i=std1mtj v=none color=RED;
plot Yield * Variety = 1 /
description="Means plot of Yield by Variety";
run;
** Two-way Plots **;
axis1 offset=(5 pct,5 pct);
axis2 offset = (5 pct,5 pct);
symbol1 i=std1mtj v=none color=BLUE;
symbol2 i=std1mtj v=none color=BLACK;
symbol3 i=std1mtj v=none color=GREEN;
symbol4 i=std1mtj v=none color=ORANGE;
symbol5 i=std1mtj v=none color=RED;
plot Yield * Soil = Variety /
description="Means plot of Yield by Soil and Variety";
run;
axis1 offset=(5 pct,5 pct);
axis2 offset = (5 pct,5 pct);
symbol1 i=std1mtj v=none color=BLUE;
symbol2 i=std1mtj v=none color=BLACK;
symbol3 i=std1mtj v=none color=GREEN;
PLS205 2014 6.6 Lab 6 (Topic 9)
plot Yield * Variety = Soil /
description="Means plot of Yield by Variety and Soil";
run;
quit;
The symbol statement `i=std1mtj' determine various details in the plot:
For each mean, an interval of length 1 standard error (std1) to either side of the mean (m) is
shown. Each interval has a top and bottom line (t), and the means are joined (j).
`color' is for color (the options include black, red, blue, green, cyan, gold).
`v' determines the symbol used for the individual observations, in this example (v=none) the individual
observations are not shown in the figure.
Don’t get too worried about the code! It is simply connecting the data points and giving standard
errors.
Remember that standard error bars give you an idea of the distribution of observations about each mean.
Plots of the main effects
PLS205 2014 6.7 Lab 6 (Topic 9)
These plots show the main effects of soil and variety on yield. In the case of having a NS interaction, the
implication is that each factor affects the response variable independent of the other; so consideration of
the main effects alone would be sufficient. [NOTE: Of course, this is not the case in this example.]
The interaction plots
PLS205 2014 6.8 Lab 6 (Topic 9)
The non-parallel nature of the lines in this interaction plot demonstrates visually the significant interaction
we found in the ANOVA.
DON’T FORGET: As always, we need to test assumptions in these tests. In this particular
example, there are eight different ANOVAs (one for variety for each of the three soil mixtures
and one for each soil mixture for each of the five varieties), the assumptions of each of which
must be met. See the appendix at the end of this lab for the full, un-cut procedure.
Example 6.2 Three-Way ANOVA with one replication [Lab6ex2.sas]
The following is the code for a generic CRD with a 3x5x2 factorial treatment structure:
Data ThreeFact;
Input a b c resp @@;
Cards;
1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113
1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143
1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63
1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167
1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128
;
Proc GLM Data = ThreeFact;
Class a b c;
Model Resp = a|b|c;
Run;
Quit;
Running the program like this will make you sad because there are zero degrees of freedom for the error
term and thus no estimation of the error SS. The result? A bunch of dots. The solution to this problem is
to assume that there is no three-way interaction, allowing us to then use the three-way interaction as an
estimate of the experimental error. To do this, modify the model statement above as follows:
Model Resp = a|b|c@2;
PLS205 2014 6.9 Lab 6 (Topic 9)
and re-run the program. The results:
Source DF Type III SS Mean Square F Value Pr > F
a 2 3599.266667 1799.633333 620.56 <.0001 ***
b 4 6423.133333 1605.783333 553.72 <.0001 ***
c 1 5333.333333 5333.333333 1839.08 <.0001 ***
a*b 8 9675.066667 1209.383333 417.03 <.0001 ***
a*c 2 5692.466667 2846.233333 981.46 <.0001 ***
b*c 4 7987.000000 1996.750000 688.53 <.0001 ***
You should also be able to determine which assumptions to test here and how to do them.
Visualizing Three-Way Interactions
Can't we do better than just assume a three-way interaction to be NS? What is a three-way interaction
anyway? Though words may only confuse the issue here, one way to think about it might be:
A three-way interaction exists if the character of the interaction between two factors differs
among the different levels of a third factor.
Difficult to articulate but easy to visualize. Walk through the following steps to see how one can cleverly
visualize three-way interactions in a two-dimensional plot:
1. Open the Word file ThreeWayInteraction.doc. Familiarize yourself as to how the new
dependent variable C1-C2 was created:
A B C C1-C2 Resp
1 1 1 30 61
1 1 2 30 31
The new variable (C1-C2) is simply the effect of C1 relative to C2 for any given combination
of levels of Factors A and B. [Side note: If C had three levels (C1, C2, C3) instead of just
two, the procedure outlined here would have to be carried out for three new variables (C1-C2,
C1-C3, and C2-C3) instead of just one.]
2. Set up your graph with C1-C2 as the DEPENDENT variable (Y-axis) and A and B as the
CLASS variables (A on the X-axis, and B as the “group” variable). See the example below.
The C1-C2 variable replaces the response variable as the dependent variable.
a1 a2 a3
b1 30 11 -32
b1 -29 -42 -94
b3 43 25 -22
b4 -43 -59 -108
b5 -1 -12 -67
The output (it’s like seeing in four dimensions!)
PLS205 2014 6.10 Lab 6 (Topic 9)
One way to think about this: Each line represents one level of B, and the average of each line
represents the effect of C for each level of B. While these averages differ among lines (i.e. B*C is
significant), their differences are fairly constant across all levels of A.
In other words, the roughly parallel nature of the lines in this interaction plot shows us that the difference
in the effects of C at the different levels of B do not vary significantly across the levels of A.
[Translation: No significant three-way interaction, so we are justified in using A*B*C as our error term.]
phew!
APPENDIX: The Almost Practically Complete Analysis for Example 6.1
Step 1: Decide if you need to analyze simple effects Data RCBDFactorial;
Do Soil = 1 to 3;
Do Variety = 1 to 5;
Do Block = 1 to 6;
Input Yield @@;
Output;
End;
End;
End;
Cards;
-120
-100
-80
-60
-40
-20
0
20
40
60
a1 a2 a3
C1
-C2
b1
b1
b3
b4
b5
PLS205 2014 6.11 Lab 6 (Topic 9)
22.1 24.1 19.1 22.1 25.1 18.1
27.1 15.1 20.6 28.6 15.1 24.6
22.3 25.8 22.8 28.3 21.3 18.3
19.8 28.3 26.8 27.3 26.8 26.8
20.0 17.0 24.0 22.5 28.0 22.5
13.5 14.5 11.5 6.0 27.0 18.0
16.9 17.4 10.4 19.4 11.9 15.4
15.7 10.2 16.7 19.7 18.2 12.2
15.1 6.5 17.1 7.6 13.6 21.1
21.8 22.8 18.8 21.3 16.3 14.3
19.0 22.0 20.0 14.5 19.0 16.0
20.0 22.0 25.5 16.5 18.0 17.5
16.4 14.4 21.4 19.9 10.4 21.4
24.5 16.0 11.0 7.5 14.5 15.5
11.8 14.3 21.3 6.3 7.8 13.8
;
Proc GLM Data = RCBDFactorial;
Class Soil Variety Block;
Model Yield = Soil|Variety Block;
Proc GLM Data = RCBDFactorial;
Class Soil Variety Block;
Model Yield = Soil|Variety|Block@2;
Run;
Quit;
Notice there are 2 Proc GLM's in this code. The first features the model we're interested in, and we run it
to see if there is a significant Soil*Variety interaction (i.e. to see if we should analyze main or simple
effects). The second is what we call an "exploratory model" to check the significance of the two-way
block interactions. The output:
First Proc GLM
Source DF Type III SS Mean Square F Value Pr > F
Soil 2 953.1562222 476.5781111 22.98 <.0001 ***
Variety 4 11.3804444 2.8451111 0.14 0.9680
Soil*Variety 8 374.4882222 46.8110278 2.26 0.0330 *
Block 5 22.1288889 4.4257778 0.21 0.9557
There is a significant Soil*Variety interaction, so we must look at simple effects.
Second and third Proc GLM results
Source DF Type III SS Mean Square F Value Pr > F
Method*Block 10 242.2944444 24.2294444 1.25 0.2881 NS
Variety*Block 20 436.9488889 21.8474444 1.13 0.3589 NS
Neither 2-way block interaction is significant, so we're justified in merging them into the error (and
gaining 30 df by doing so).
Step 2: Analyze the simple effect of Soil (i.e. for each Variety separately) Data RCBDFactorial;
Do Soil = 1 to 3;
Do Variety = 1 to 5;
Do Block = 1 to 6;
Input Yield @@;
Output;
End;
End;
End;
Cards;
PLS205 2014 6.12 Lab 6 (Topic 9)
22.1 24.1 19.1 22.1 25.1 18.1
27.1 15.1 20.6 28.6 15.1 24.6
22.3 25.8 22.8 28.3 21.3 18.3
19.8 28.3 26.8 27.3 26.8 26.8
20.0 17.0 24.0 22.5 28.0 22.5
13.5 14.5 11.5 6.0 27.0 18.0
16.9 17.4 10.4 19.4 11.9 15.4
15.7 10.2 16.7 19.7 18.2 12.2
15.1 6.5 17.1 7.6 13.6 21.1
21.8 22.8 18.8 21.3 16.3 14.3
19.0 22.0 20.0 14.5 19.0 16.0
20.0 22.0 25.5 16.5 18.0 17.5
16.4 14.4 21.4 19.9 10.4 21.4
24.5 16.0 11.0 7.5 14.5 15.5
11.8 14.3 21.3 6.3 7.8 13.8
;
Proc Sort Data = RCBDFactorial;
By Variety;
Proc GLM Data = RCBDFactorial;
Class Soil Block;
Model Yield = Soil Block;
Means Soil / Tukey;
By Variety;
Output Out = PR r = res p = pred;
Proc Print Data = PR;
Proc Univariate normal data = PR;
Var res;
By Variety;
Proc GLM data = RCBDFactorial;
Class Soil;
Model Yield = Soil;
Means Soil / hovtest = Levene;
By Variety;
Proc GLM Data = PR;
Class Soil Block;
Model Yield = Soil Block pred*pred;
By Variety;
Proc Plot Data = PR;
Plot res*pred;
By Variety;
Run;
Quit;
The first Proc GLM carries out five separate ANOVA's, one for each Variety; it also generates predicted
and residual values. The Proc Univariate tests for normality of residuals within each ANOVA. The
second Proc GLM conducts Levene's Tests for Soil within each level of variety. And the last Proc GLM
tests for nonadditivity within each of the five models. The output is extensive but can be organized as
shown on the next page:
Normality of residuals (Variety 1 – Variety 5)
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.962536 Pr < W 0.6512
Shapiro-Wilk W 0.954449 Pr < W 0.4990 Shapiro-Wilk W 0.954754 Pr < W 0.5043 Shapiro-Wilk W 0.991391 Pr < W 0.9996 Shapiro-Wilk W 0.945644 Pr < W 0.3608
Homogeneity of variances (Variety 1 – Variety 5)
Levene's Test for Homogeneity of Yield Variance
ANOVA of Squared Deviations from Group Means
PLS205 2014 6.13 Lab 6 (Topic 9)
Sum of Mean
Source DF Squares Square F Value Pr > F
Method 2 4967.3 2483.7 2.16 0.1502 Method 2 1479.1 739.5 3.59 0.0532
Method 2 128.7 64.3531 0.37 0.6985 Method 2 1425.9 713.0 0.93 0.4157 Method 2 734.1 367.1 0.93 0.4161
Nonadditivity (Variety 1 – Variety 5)
Source DF Type I SS Mean Square F Value Pr > F
pred*pred 1 53.2354203 53.2354203 4.25 0.0693 pred*pred 1 8.3170588 8.3170588 0.38 0.5517 pred*pred 1 0.1770860 0.1770860 0.01 0.9171 pred*pred 1 105.6521057 105.6521057 5.44 0.0446 pred*pred 1 56.8633800 56.8633800 3.05 0.1148
The only assumption we violate is Nonadditivity within Variety 4 (though a few others are close). At this
point, you could try to transform the data for Variety 4 to bring that subset of your data into alignment
with the ANOVA assumptions. But since you're already able to detect differences among soils within
Variety 4 (see summary of Tukey separations below), you may decide that transforming is not worth it.
Variety Treatment Block MSD Tukey
1 0.0519 NS 0.1822 NS 6.45 1 = 3 3 = 2
2 0.0746 NS 0.5530 NS 7.15 1 = 3 = 2
3 0.0130 ** 0.3708 NS 5.90 1 = 3 3 = 2
4 0.0041 *** 0.6843 NS 8.38 1 3=2
5 0.0144 ** 0.8428 NS 7.50 1 = 2 2 = 3
To be truly comprehensive in our analysis, we should also analyze the differences among varieties for
each of the soils. To do this, simply sort by Soil instead of Variety and replace all the "by Variety"
commands with "by Soil" commands in the code; other changes are necessary in the class and model
statements, resulting in a final code like the one below:
Data RCBDFactorial;
Do Soil = 1 to 3;
Do Variety = 1 to 5;
Do Block = 1 to 6;
Input Yield @@;
Output;
End;
End;
End;
Cards;
22.1 24.1 19.1 22.1 25.1 18.1
27.1 15.1 20.6 28.6 15.1 24.6
22.3 25.8 22.8 28.3 21.3 18.3
19.8 28.3 26.8 27.3 26.8 26.8
20.0 17.0 24.0 22.5 28.0 22.5
13.5 14.5 11.5 6.0 27.0 18.0
16.9 17.4 10.4 19.4 11.9 15.4
15.7 10.2 16.7 19.7 18.2 12.2
15.1 6.5 17.1 7.6 13.6 21.1
21.8 22.8 18.8 21.3 16.3 14.3
19.0 22.0 20.0 14.5 19.0 16.0
20.0 22.0 25.5 16.5 18.0 17.5
16.4 14.4 21.4 19.9 10.4 21.4
PLS205 2014 6.14 Lab 6 (Topic 9)
24.5 16.0 11.0 7.5 14.5 15.5
11.8 14.3 21.3 6.3 7.8 13.8
;
Proc Sort Data = RCBDFactorial;
By Soil;
Proc GLM Data = RCBDFactorial;
Class Variety Block;
Model Yield = Variety Block;
Means Variety / Tukey;
By Soil;
Output Out = PR r = res p = pred;
Proc Univariate normal data = PR;
Var res;
By Soil;
Proc GLM data = RCBDFactorial;
Class Variety;
Model Yield = Variety;
Means Variety / hovtest = Levene;
By Soil;
Proc GLM Data = PR;
Class Variety Block;
Model Yield = Variety Block pred*pred;
By Soil;
Proc Plot data = PR;
Plot res*pred;
By Soil;
Run;
Quit;
And the results:
Normality of residuals (Method 1 – Method 3)
Test --Statistic--- -----p Value------
Shapiro-Wilk W 0.975394 Pr < W 0.6943
Shapiro-Wilk W 0.977548 Pr < W 0.7573
Shapiro-Wilk W 0.976278 Pr < W 0.7204
Homogeneity of variances (Method 1 – Method 3)
Levene's Test for Homogeneity of Yield Variance
ANOVA of Squared Deviations from Group Means
Sum of Mean
Source DF Squares Square F Value Pr > F
Variety 4 2008.6 502.2 2.47 0.0705
Variety 4 4763.3 1190.8 1.40 0.2620
Variety 4 1963.4 490.8 0.87 0.4950
Nonadditivity (Method 1 – Method 3)
Source DF Type I SS Mean Square F Value Pr > F
pred*pred 1 1.27330875 1.27330875 0.07 0.7914
pred*pred 1 72.8265041 72.8265041 2.90 0.1049
pred*pred 1 17.2174555 17.2174555 1.07 0.3129
All assumptions are nicely met, so we can report the ANOVA results for Variety without reservations:
Method Treatment Block MSD Tukey
1 0.3947 NS 0.7011 NS 7.10 4=3=5=2=1
PLS205 2014 6.15 Lab 6 (Topic 9)
2 0.4435 NS 0.9244 NS 9.06 5=3=2=1=4
3 0.0347 * 0.0950 NS 6.93 2=1=3=4
1=3=4=5
Interesting. While for the overall ANOVA no differences among Varieties was detected, here we see that
within Method 3, differences are in fact present.
This is an "almost practically complete analysis" because a complete analysis would require commentary
(i.e. interpretation) of all the results generated above, a discussion as to which variety-method
combinations are recommended or not recommended, etc. One should also make efforts to visualize the
data using bar charts or interaction plots. The things to realize is that, even for a simple example like this,
the necessary analysis can be substantial.
An added thorn: This analysis of simple effects involves an enormous number of
independent questions: 8 Shapiro-Wilk tests, 8 Levene’s tests, 8 non-additivity tests, 45
Tukey pairwise comparisons! This has major implications in terms of the experiment-
wise error rate, so be aware!
APPENDIX 2: Graphing in Excel
The same results can be easily obtained in excel by organizing the data into series (rows) as below:
Var1 Var2 Var3 Var4 Var5
Soil1 21.8 21.9 23.1 26.0 22.3
Soil2 15.1 15.2 15.5 13.5 19.2
Soil3 18.4 19.9 17.3 14.8 12.6
and then selecting insert->line->2D-line
PLS205 2014 6.16 Lab 6 (Topic 9)
Errors can be added by selecting Chart tools->layout ->error bars -> custom ->specify value and selecting
rows for each series from a Table organized as above with SE.
SE Var1 Var2 Var3 Var4 Var5
Soil1 1.1 2.4 1.4 1.3 1.5
Soil2 2.9 1.4 1.5 2.3 1.4
Soil3 1.1 1.4 1.8 2.3 2.2
The non-parallel nature of the lines in this interaction plot demonstrates visually the significant interaction
we found in the ANOVA.
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
30.0
Var1 Var2 Var3 Var4 Var5
Soil1
Soil2
Soil3