Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | hisham-ali |
View: | 220 times |
Download: | 0 times |
of 138
7/30/2019 Ou of Syllabus
1/138
LECTURE NOTES
Multiple Factor Designs
Sept 8-Sept 20
7/30/2019 Ou of Syllabus
2/138
Day7-Two or More Factor Designs
RCBD
LSD ANCOVA
Factorial Designs
7/30/2019 Ou of Syllabus
3/138
Introduction
What happens when theres more than onefactor?
Vary one factor at a time
Study the factors jointly
Situations One controllable confounding factor (RCBD)
More than one controllable confounding factor (LSD)
One or more recordable but uncontrollable factors(ANCOVA)
Several factors of interest (Factorial Design)
7/30/2019 Ou of Syllabus
4/138
?
7/30/2019 Ou of Syllabus
5/138
Confounding factors are factors which we stronglyexpect to have an influence on the dependent
variable (Y) but which are not the primary factorthat we wish to test for effects on Y.
A nuisance or confounding factor is a factor thatprobably has some effect on the response, but it is
of no interest to the experimenterhowever, thevariability it transmits to the response needs to beminimized
Typical nuisance factors include batches of rawmaterial, pieces of test equipment, time (shifts,days, etc.), different experimental units Physical entities with similar characteristics (plots of
land, genetically similar animals or litter mates)
7/30/2019 Ou of Syllabus
6/138
These variables are not being controlled by
the analyst but can have an effect on theoutcome of the treatment being studied
If they are unknown we hope to have controlledthem via randomization
If they are known and controllableblocking
If they are known but uncontrollable -ANCOVA
7/30/2019 Ou of Syllabus
7/138
Blocking is a type ofconstrained randomization thatcan be used to control confounding by creating a "block"orhomogenous strata within which we will be able toexamine all of the treatments.
This avoids the possibility that the treatments could beimbalanced with respect to the confounding factor(s)resulting in a confusion as whether the results we get are
due more to an unfortunate arrangement of theconfounders than the treatment effects themselves.
RCBD- Randomized Complete Block Design
7/30/2019 Ou of Syllabus
8/138
The key objective in blocking the
experimental units is To make them as homogenous as possible
within blocks with respect to the responsevariable under study
To make the different blocks as hetrogenousas possible with respect to the responsevariable under study
When each treatment is included only oncein each block, it is called RCBD
7/30/2019 Ou of Syllabus
9/138
The reason for blocking is that we hope to reducethe error sum of squares by explaining anadditional component of that error with thevariation within blocks.
If successful this results in a much smaller value for the
MSE (more precise results than CRD). A smaller MSE will make the value of the test statistic
bigger, howevera smaller number of degrees offreedom for MSE will make the critical value of the100(1-) percentile point of the F distribution larger.
Generally though, the change in MSE will have a muchgreater effect on the test statistic making the test morepowerful.
7/30/2019 Ou of Syllabus
10/138
This variance reducing design, in addition totesting factor of interest, it could also help withunderstanding
If process is robust to nuisance conditions Blocking is necessary in future experiments
It is highly desirable that the Experimental unitswithin each block are processed togetherwhenever this will help to reduce experimentalerror variability
Example: if the experimenter might change theadministration of the experiment to the subjects overtime, consecutive processing of EUs block by blockwill reduce such sources of variation from the withinblocks leading to more precise results
7/30/2019 Ou of Syllabus
11/138
In setting up a randomized block experiment witha levels of the treatment factor and b blocks, we
can have the block represent either a random or afixed factor.
Random blocks would correspond to a situation
where we have sampled a group of b levels from abigger population of possible blocking levels. Hence the eventual conclusions we draw from the
experiment can be extrapolated to the larger populationfrom which we sampled.
Fixed blocks correspond to the situation where wehave chosen to examine specific set of b blockinglevels
and the results of our experiment only apply to those blevels (no extrapolation to other levels).
7/30/2019 Ou of Syllabus
12/138
Example 1: experiment on the effects of vitamin C on the prevention of colds. 868 children randomly assigned: treatment (500mg,1000mg of
vitamin c) and a placebo (identical tablet with no vitamin C) on adaily basis.
response of interest = number of colds contracted by each child. The study showed No difference in average number of colds in the
treatment groups and the placebo group. Other factors that may affect the number of colds contracted might
include, gender, age, nutritional habits of the child, etc. These factors that may affect the response but are not of primaryinterest to the investigator are referred to as nuisance orconfounding factors.
In blocked experiments the heterogeneous experimental units are
divided into homogenous subgroups called blocks and separateexperiments are conducted in each block. For example blocking by genderwould mean doing the experiment
on males and females separately.
7/30/2019 Ou of Syllabus
13/138
Example 2:
An investigator is interested in testing the effects ofdrugs A and B on the lymphocyte count in mice bycomparing A,B and Placebo,P.
In designing the experiment, he assumed mice from thesame litterwould be more homogenous in their
response than would mice from different litters.
He arranged the experiment in an RCBD design withthree litter-mates forming each block and a total of 7blocks.
In each block the litter mates were randomly assignedto the treatments resulting in the following data afterconclusion of the experiment (lymphocyte count given inunits of 1000 per cubic mm of blood)
7/30/2019 Ou of Syllabus
14/138
lymphocyte count in mice
Blocks
treatment 1 2 3 4 5 6 7 mean
P 5.4 4.0 7.0 5.8 3.5 7.6 5.5 5.54
A 6.0 4.8 6.9 6.4 5.5 9.0 6.8 6.49
B 5.1 3.9 6.5 5.6 3.9 7.0 5.4 5.34
mean 5.50 4.23 6.80 5.93 4.30 7.87 5.90 5.79
7/30/2019 Ou of Syllabus
15/138
Analysis assuming CRD:effects of treatment on the lymphocyte count in mice
Source DFSum of
SquaresMean
SquareF
Value Pr > F
Treatment 2 5.22 2.61 1.47 0.256
Error 18 32.02 1.78
Corrected
Total
20 37.24
What will be the difference if we analyze the data
taking into account the blocking by litter effect?
7/30/2019 Ou of Syllabus
16/138
The RCBD Model
y ij i j ij
i= 1,2,, a j= 1,2,,b
yij = the observation inith
treatment in thej
th
block
m = overall mean
i = the effect of thei
th
treatmentj = the effect of thej
th block
ij = random error
No interactionbetween blocks
and treatments
7/30/2019 Ou of Syllabus
17/138
Properties of the model
Sum ofi is zero Sum ofj is zero
E(ij) = 0 which implies E(Yij) = mij =
and
Var(Yij) = 2
Yij ~ N(mij , 2)
Cov(ij, ik) = 0 Cov(ij, lk) = 0
i jm
j k
and j k i l
0j
1
0
a
ii
7/30/2019 Ou of Syllabus
18/138
The additive model implies that the expected values
of observations in different blocks for the sametreatment may differ, but the treatment effects arethe same for all blocks
There is a possibility for interaction between blocks
and treatment (Tukeys additivity test)
( )ij i jE Y m
7/30/2019 Ou of Syllabus
19/138
Statistical Inference
Under the stated assumptions, we could useOLS or MLE to estimate parameters
Hypothesis
Partition sum of squares - Two way ANOVA
1 20 :
1 : 0
aH
H N ot H
m m m
7/30/2019 Ou of Syllabus
20/138
SST(total sum of squares)
SStr(treatment
sum of squares)
SSE(error sum of squares)
SSB
(sum of squares
blocks)
SSE
(sum of squares
error)
TWO WAY ANOVA
7/30/2019 Ou of Syllabus
21/138
Individual
observations
.
.
.
.
.
.
.
.
.
.
.
.
Single Independent Variable
Blocking
Variable
.
.
.
.
.
Randomized Block Design
7/30/2019 Ou of Syllabus
22/138
partition the total sum of squares (SST) ,
in to three components (SStr, SSb andSSE)
2 22 2
.. . .. . .. . . ..1 1
( ) ( ) ( ) ( )a b a b a b
ij i j ij i ji j i j i j
SStr SST SSb SSE
Y Y b Y Y a Y Y Y Y Y Y
7/30/2019 Ou of Syllabus
23/138
TWO way ANOVA for RCBDThe degrees of freedom for the sums of squares in
are as follows:
Ratios of sums of squares to their degrees offreedom result in mean squares, and
We could use Cochrans theorem to decide about thedistribution of the ratio of the mean squares
used to test the hypothesis H0:equal treatmentmeans
S S T S S tr S S B S S E
1 ( 1) ( 1) [( 1)( 1)]ab a b a b
7/30/2019 Ou of Syllabus
24/138
Expected mean squares
E(MSE) =2
E(MStr) = 2 +
Exercise: 95% CI for2
Thus, we could test the treatment effecthypothesis H0: all i=0 vs H1: not H0 by thestatistic
2 2
2( )
1 1
a a
i ii ib b
a a
m m
2
2( )
1
jj
a
E M SBb
~ ( 1, ( 1)( 1); 0)M S tr
F F a a bM S E
7/30/2019 Ou of Syllabus
25/138
Under the specific alternative hypothesis with given
values for the i's, this test statistic has a non-centralF distribution with non-centrality parameter given bylambda,
Where
~ ( 1, ( 1)( 1); )M S tr
F F a a b
M S E
2
2, which is same as in the CR D
a
ii
b
7/30/2019 Ou of Syllabus
26/138
ANOVA Table
Source of
variationDegrees of
freedomaSums of
squares (SSQ)Mean
square (MS)F
Blocks (B) b-1 SSB SSB/(b-1) MSB/MSE
Treatments (Tr) a-1 SStr SStr/(a-1) MStr/MSE
Error (E) (a-1)*(b-1) SSE SSE/((a-1)*(b-1))
Total (Tot) a*b-1 SST
awhere a=number of treatments and b=number of blocks or replications.
7/30/2019 Ou of Syllabus
27/138
Exercise: If only two treatments are investigated
(a=2) in RCBD, it can be shown that the F
test for treatment effects given above isequivalent to the two sided t-test forpaired observations
7/30/2019 Ou of Syllabus
28/138
Blocking effect
the test will be more powerful here for the same values of b (r inthe previous case) and the i's, because if the blocking was
appropriate (i.e. if that factor had a pronounced effect on outcome)we will usually have a much smaller error variance than if we didnot block, resulting in a much bigger non-centrality value.
However, we will be looking at power under a slightly differentcondition of having a smaller df for MSE.
While this will require a larger critical value of significance(reducing the power if all other things were held equal), this isusually more than made up for by the large reduction in errorvariance 2 achieved by the design.
Beware that if your blocks had no really important effect on
outcome, you could potentially lose power by blocking. Thus, it is important to block only when you have solid evidence
that you are likely to gain something by adding this feature.
7/30/2019 Ou of Syllabus
29/138
Blocking effect
Successful blocking minimizes variance
among units within blocks whilemaximizing the variance among blocks.
Since, precision usually decreases as thenumber of experimental units per blockincreases, block size should be kept as
small as possible.
7/30/2019 Ou of Syllabus
30/138
Analysis under RCBD: testing the effects of drugsA and B on the lymphocyte count in mice
Source DFSum of
SquaresMean
SquareF
Value Pr > F
Treatment 2 5.22 2.61 17.93 0.00005Blocking 6 30.28 5.05 34.71
7/30/2019 Ou of Syllabus
31/138
Do we need to test block effect?
Usually not of interest (blocked for a reason)
Blocks are not randomized to experimental units Can compute ratio of variation explained by blocking to understand
the impact of blocking
Trade-off: reduction in variance vs loss in degrees of freedom
Relative efficiency
Ultimately, the loss in df will have little effect as long as a moderatenumber of error degrees of freedom are available
2
: 2
2
( 1)( 3)
( 3)( 1)
, are error variances
( 1)( 1)
( 1)
R C B D C R D C R D
R C B D C R D
R C B D C R D R C B D
R C B D
C R D
v vR Ev v
where
v a b
v a b
7/30/2019 Ou of Syllabus
32/138
2 ( 1) ( 1)
1cr d
b M SB b a M SE
ab
Please change t by a.
7/30/2019 Ou of Syllabus
33/138
Assumptions
Normality Histogram and probability plot (qqplot)
Additivity
Tukeys test of additivity (block x trt interaction)
If significant, it means block effect different fordifferent treatments
Log transformation could eliminate interaction (non-addititvity)
eg. E(yij)=mij then log(yij)= m+ I + j +eij
Constant variance (check by treatment and block)
7/30/2019 Ou of Syllabus
34/138
He introduced the word "bit" as acontraction of binary digit.
He used the term "software" in acomputing context in a 1958 article
And also pioneered many statisticalmethods
articulated the important distinctionbetween exploratory data analysis and
He retired in 1985. In 2000, he died inNew Brunswick, New Jersey.
?
http://en.wikipedia.org/wiki/Bithttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/New_Brunswick%2C_New_Jerseyhttp://en.wikipedia.org/wiki/Image:John_Tukey.jpghttp://en.wikipedia.org/wiki/New_Brunswick%2C_New_Jerseyhttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Bit7/30/2019 Ou of Syllabus
35/138
Checking Additive assumption
(four approaches)
Tukeys test of additivity (block x trt interaction) Plot of residuals against fitted values
A curvilinear pattern of the residuals suggests thepresence ofinteraction and also suggests non-
constancy of variances A more effective plot is plot of the responses Yij
by blocks X-axis is treatment, y-axis is response and the
overlayed lines are for each block. Lack of parallelism is strong indication that blocks and
treatment interact in their effects on the response
7/30/2019 Ou of Syllabus
36/138
Interaction test with a single replication per cell
Having a single replication per cell has so far
prevented us from testing for an interaction effectsimilar to what is done in factorial designs.
But, there is a special type of interaction that we
could test, called Tukeys one degrees of freedomtest of non-additivity.
We are interested to know if this model is anybetter than just the simple additive model?
, 1, .., ; 1, ..,ij i j i j ij
Y i a j bm
7/30/2019 Ou of Syllabus
37/138
If the iand j were known, we might use aleast squares approach to obtain an
estimate of lambda.
That is, find estimate of lambda such thatminimizes the above expression,
2
1 1
[ ( )]a b
ij i j i ji j
Y m
, 1, .., ; 1, ..,ij i j i j ij
Y i a j bm
7/30/2019 Ou of Syllabus
38/138
Taking the usual estimates of theparameters , iand j, and minimizing this
we get,
2
1 1
[ ( )]a b
ij i j i ji j
Y m
1 1
2 2
1 1
a b
i j ij
i ja b
i ji j
Y
7/30/2019 Ou of Syllabus
39/138
Let
dij = and since
then we can think of this as a contrast in the means for Yij withone replication per cell.
Re-writing the above expression after proper substitution
we get Tukeys sum of squares for non-additivity,
And, since this is a contrast, it will have one degrees offreedom.
0a b
i j
i j
2
1 1
2
1 1
{ }a b
ij iji j
n o n a d d a b
iji j
d Y
SSd
i j
7/30/2019 Ou of Syllabus
40/138
Let,- SSremainder= SSE SSnonadd- (a-1)(b-1)-1 = (abab+1)-1=ab-a-b.
SSnonadd and SSremainderare orthogonal to each otherand hence are statistically independent (Cochrans Thm).
Thus we can test
H0: = 0 vs H1: not H0
with ~ (1, , 0)/ ( )
nonadd
nonadd
remainder
S SF F ab a b
S S a b a b
7/30/2019 Ou of Syllabus
41/138
Q. What do we do if we accept H0 or do not reject H0? A. most statisticians would pool SSnonass and SSremainder into SSE
and do the usual tests for the main effect of treatment and blocking In general in order to reduce type II error rate, a liberal type I error
rate is used for interaction test (alpha>10%)
Q. what do we do if we reject H0?
A1. if the true model structure was not additive and went ahead and didthe usual main effects test using F=MStr/MSE, then
we will have a type-I error level that is smallerthan the nominal.
We would get too few significant results and the testing procedure wouldbe conservative.
However, if we get a significant result with this test we might
feel that it would also have been significant with the proper kind
of test based on a non-additive model.
A2. Make efforts to remove it via transformations of Y (eg. Sqrt, log)
D 8 R i A h RCBD
7/30/2019 Ou of Syllabus
42/138
Day 8-Regression Approach to RCBD Example: consider RCBD with 3 blocks and 2 levels of Treatment B.
Block as fixed effect and model of the form
yij = + i +j + eij
y X
1
2
3
1
2
B
m
7/30/2019 Ou of Syllabus
43/138
Regression approach to test additivity
1. Fit additive model
2. Obtain residuals, rij3. Fit additive model
4. Obtain residuals from (3) rij
5. Tukey sum of squares is
6. F=TSS/MSE ~F(1,ab-a-b,0)
2
..
2ij
ij i j ij
y
yy
m
ij i j ijy m
2 2'
ij ijT SS r r
7/30/2019 Ou of Syllabus
44/138
Multiple Comparison in RCBD
Similar to procedures in CRD
g(n) is replaced by a(b) in formula
Degree of freedom for error is (b-1)(a-1)
1 / 2;( 1)( 1)
1 ; ,( 1)( 1)
:
:
: ( 1) (1 ; 1, ( 1)( 1)
a b
a a b
t test t
T ukey q
Scheffe a F a a b
7/30/2019 Ou of Syllabus
45/138
There are some variations to simpleRCBD.
RCBD with replicates within blocks
Incomplete block designs (block designs withfewer EUs per block than treatments)
More than two directional blocking (LSD,
GLSD,)
Variations to simple RCBD
7/30/2019 Ou of Syllabus
46/138
An experiment was designed to study theperformance offour different detergents forcleaning clothes.
The following cleanness readings (higher =
cleaner) were obtained with specially designedequipment forthree different types of commonstains (blocking factor).
Is there a difference among the detergents?;
Example: Deteregent Study
R li t d RCBD
7/30/2019 Ou of Syllabus
47/138
Replicated RCBD
Advantages of replicated RCBD: The natural block size may result in more units per
block than there are treatments thus allowing forwithin block replication
Within block replication allows for the separation ofblock*treatment interaction from experimental error,
which may improve the interpretation of results whenthe block*treatment interaction is significant
The within block replication may be used to assignextra replication to selected treatments to increasesensitivity for comparisons of interest
**with the key disadvantage that a large blocksize (if it is not the natural block size) reducesthe effectiveness of the blocking
It is also called Generalized RCBD
7/30/2019 Ou of Syllabus
48/138
In replicated RCBD the model is,
Yijk= + i+ j+ ()ij + ijk,
i=1,,a;
j=1,,b;
k=1,,s
where
Yijk is the response for the kth subject in j-thblock and i-th group ,
()ij is the interaction effect of the ithtreatment with jth block
ANOVA Table
7/30/2019 Ou of Syllabus
49/138
ANOVA TableSource of
variationDegrees of
freedomaSums of
squares (SS)Mean
square (MS)F
Blocks (B) b-1 SSB SSB/(b-1) MSB/MSE
Treatments (Tr) a-1 SStr SStr/(a-1) MSTr/MSE
Block*Treatment (B*T) (a-1)*(b-1) SSBT SSBT/(a-1)*(b-1) MSBT/MSE
Experimental Error (E) a*b(s-1) SSE SSE/a*b(s-1)
Total (Tot) a*b*s-1 SST
awhere a=number of treatments, b=number of blocks and s=number of replications.
7/30/2019 Ou of Syllabus
50/138
Expected Mean Squares
Both treatment and block are fixed
E(MSE) =2
E(MStr) = 2 + sb
E(MSb) = 2 + sa
E(MStb) = 2 + s
2
1
a
ii
a
2
1
b
jj
b
2
1
( )
( 1)( 1)
b a
ijj i
a b
10 : ... 0
/
aH
F M str M SE
E t d M S
7/30/2019 Ou of Syllabus
51/138
Expected Mean Squares
Only blocks are random and hence
interaction too is random
E(MSE) =2
E(MStr) = 2 + s2tb +sb
E(MSb) = 2 + sa2b
E(MStb) = 2 + s2tb
2
1
a
ii
a
10 : ... 0
/
aH
F M str M Stb
E t d M S
7/30/2019 Ou of Syllabus
52/138
Expected Mean Squares
Both effects are random
E(MSE) =2
E(MStr) = 2 + sb2t + s2tb
E(MSb) = 2 + sa2t + s2tb
E(MStb) = 2 + s2tb2
0 : 0
/
H
F M str M S tb
7/30/2019 Ou of Syllabus
53/138
RCBD (one replication) with random block effect
If the blocks are random effects then
j ~N(0 , b2)
E(ij) = 0 which implies E(Yij) = mi =
and
Var(Yij) = 2 + b2
Yij ~ N(mi , 2 + b
2)
Cov(ij, ik) = b2
( ) Cov(ij, lk) = b
2 ( )
im
j k
and j k i l
E t d M S
7/30/2019 Ou of Syllabus
54/138
Expected Mean Squares
For RCBD with single replication, if only
blocks are random
E(MSE) =2
E(MStr) = 2 + b
E(MSB) = 2 + a2b
2
1
a
ii
a
10 : ... 0
/
aH
F M str M SE
- Think of an unbiased estimator for2b
I l t Bl k D i (IBD)
7/30/2019 Ou of Syllabus
55/138
Incomplete Block Design (IBD)
We will see the analysis methods for IBD later
IBD is an RCBD design in which there are fewerexperimental units per block than treatments.
One type of this design is balanced incomplete blockdesign (BIBD).
In this design every treatment pair occurs within a blockexactly the same number of times.
The reason for this type of design is to take advantage ofgreater efficiency of smaller block sizes.
Example: suppose we are interested in testing four
mosquito repellents. The natural block size is two (our two arms), but we have 4 trts.
The following design is proposed where the data are numbers ofmosquito bites during a specified period of time.
BIBD design example
7/30/2019 Ou of Syllabus
56/138
BIBD design example
Treatments
Subject A B C D1 12 9
2 9 7
3 18 11
4 3 4
5 5 9
6 13 10
Treatment
mean13 9 8 10
- It is incomplete because not all treatments occur in each block- it is considered balanced in the sense that each pair of treatments occurs together
(with blocks) exactly the same number of times.
Example 1: Insurance premium example
7/30/2019 Ou of Syllabus
57/138
Example 1: Insurance premium example
An analyst in insurance company A studiedthe premium for auto insurance in six cities.
The six cities were selected to representdifferent regions (East, West) and differentsizes (small, medium and large).
Response= three months premium chargesfor a certain category of risk.
The interest is to study the effect of city size
controlling for geographical region.
ANOVA
Source df SS MS F Pvalue
city 2 9300 4650 93 0.0106
Region 1 1350 1350 27 0.0351
Error 2 100 50
region
city E W ave
small 140 100 120
med 210 180 195
large 220 200 210
ave 190 160 175
7/30/2019 Ou of Syllabus
58/138
Data Example: Insurance premium exampledata insurance;
input premium cityregion;
datalines;
140 1 1
100 1 2
210 2 1
180 2 2
220 3 1200 3 2
;
procglmdata=insurance;
class city region;
model premium = cityregion;
means city region
/tukey;
run;
quit;
ANOVA
Source df SS MS F Pvalue
city 2 9300 4650 93 .Region 1 1350 1350 27 .
City*Reg 2 100 50
Error 0 . .
Tukeys testObs msa msb ssab ssrem f p_value1 4650 450 87.0968 12.9032 6.75 0.23391
7/30/2019 Ou of Syllabus
59/138
Creating RCBD in SAS
7/30/2019 Ou of Syllabus
60/138
A.
B. use randomized block design and randomize the four
treatments to four flowers within each type
7/30/2019 Ou of Syllabus
61/138
7/30/2019 Ou of Syllabus
62/138
7/30/2019 Ou of Syllabus
63/138
7/30/2019 Ou of Syllabus
64/138
Text Book Example
(Page 121 of JL)
7/30/2019 Ou of Syllabus
65/138
=number of lever presses/elapsed time of the session
Trt= 5 dosages of drug in mg/kg
What is the EU?
7/30/2019 Ou of Syllabus
66/138
7/30/2019 Ou of Syllabus
67/138
Two Way ANOVA
7/30/2019 Ou of Syllabus
68/138
Examining Trend: since factor is quantitative
7/30/2019 Ou of Syllabus
69/138
7/30/2019 Ou of Syllabus
70/138
7/30/2019 Ou of Syllabus
71/138
2 ( 1) ( 1)( 1)
( 1)
9(0.185) 9(5 1)(0.0083)0.0408
5 (10 1)
crd
b M S B a b M S E
a b
7/30/2019 Ou of Syllabus
72/138
2 2
2
crd rcb
crd
Summary
7/30/2019 Ou of Syllabus
73/138
Summary
Non-replicated RCBD:
When experimental units represent physical entities Smaller blocks of EUs usually result in greater homogeneity
The larger the blocks, the less homogenous
Replicated RCBD When EUs represent trials rather than physical
entities and the experimental runs can be madequickly,
larger block sizes may not increase variability of EUs with ina block
Summary
7/30/2019 Ou of Syllabus
74/138
Summary
Advantages of replicated RCBD
More error degrees of freedom
Interaction and error are not confounded
Can separate error and interaction SS
Easier assessment of additivity
Is good ifblocks are expensive butobservations are cheap
Consider example: tee height (golf)
7/30/2019 Ou of Syllabus
75/138
Example of Generalized RCBDPage 128 of text book
Objective
To determine if tee height affects golf driving distance
7/30/2019 Ou of Syllabus
76/138
(a)purpose
To recommend whattee height to use
(b) Identify sources ofvariation
tee heightGolfer and ability level
brand ball club wind speedrepeat swings
c) Choose rule to assign experimental units to treatmentf
7/30/2019 Ou of Syllabus
77/138
factors
Complete Block Design: randomize the order that each golfer
Hit a ball from each of the tee height
Blocks will be Golfers (takes into account differencesin ability levels and clubs)
random sample of golfers? (9 golfers)
Treatment Factor tee height,each golferwill hit 5 balls from each tee heightin a randomized order
d) Measurements to be made:1) distance
7/30/2019 Ou of Syllabus
78/138
7/30/2019 Ou of Syllabus
79/138
Note:
-In the middle table pvalue
7/30/2019 Ou of Syllabus
80/138
-Since treatment effect is significant we could investigate further on pairwise-Note that the error term is block*trt
-Conclusion: tee your golf ball up so that half of the ball is above the crownof the driver club-face to maximize distance
7/30/2019 Ou of Syllabus
81/138
The power of the F test for treatment effects for RCBD involves the samenon-centrality parameter as for CRD
But, the two lead to different power levels. Why?
variance (2) will differ for the two designs
degrees of freedom associated with denominator also differ
E l
7/30/2019 Ou of Syllabus
82/138
Example:
Consider the text book example for d-ampthamine
7/30/2019 Ou of Syllabus
83/138
7/30/2019 Ou of Syllabus
84/138
7/30/2019 Ou of Syllabus
85/138
Day9: Latin Square Design (LSD)
-Due to Fisher (1935)
- agricultural experiments (eg fertility gradient of plots)
-Industrial experiments (eg. Wear life of auto tires)
-Pharmaceutical (eg. Bioequivalence study)
The Latin Square Design
7/30/2019 Ou of Syllabus
86/138
This design is used to simultaneously control (or eliminate) twoindependent sources of nuisance variability
It is called Latin because we usually specify the treatment by theLatin letters
Square because it always has the same number of levels (t) for therow and column nuisance factors
A significant assumption is that the three factors (treatments and two
nuisance factors) do not interact More restrictive than the RCBD
Each treatment appears once and only once in each row and column
If you can block on two sources of variation (rows x columns) youcan reduce experimental error when compared to the RCBD
It further reduces variability increasingSensitivity to detect treatment effect
A
B C D
A
B C D A
BC D
A
B CD
7/30/2019 Ou of Syllabus
87/138
In LSD every treatment occurs in every row and column
Also every row occurs in every column and vise versa
Ad t d Di d t
7/30/2019 Ou of Syllabus
88/138
Advantages and Disadvantages
Advantage: Allows the experimenter to control two sources of
variation
Disadvantages: Error degree of freedom ([t-1]x[t-2]) is small if there
are only a few treatments
The experiment becomes very large if the number of
treatments is large The statistical analysis is complicated by missing
blocks and mis-assigned treatments
Th LSD M d l
7/30/2019 Ou of Syllabus
89/138
The LSD Model
k i jij k ij k y m i= 1,2,, t j= 1,2,, t
yij(k) = the observation inithrow and thejthcolumn
receiving thekthtreatment
m= overall mean
k= the effect of theithtreatment
i = the effect of theithrow
ij(k)
= random error
k= 1,2,, t
j = the effect of thejthcolumn
No interactionbetween rows,
columns and
treatments
7/30/2019 Ou of Syllabus
90/138
A Latin Square experiment is assumed to bea three-factor experiment.
The factors are rows, columns andtreatments.
It is assumed that there is no interaction
between rows, columns and treatments.
We can partition the sum of squares into
four components
SST=SSR+SSC+SStr+SSE
Usual F test under H0 using Cochrans
theorem
The ANOVA Table for a Latin Square Experiment
7/30/2019 Ou of Syllabus
91/138
The ANOVA Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value
Treat SStr t-1 MStr MStr/MSE
Rows SSRow t-1 MSRow MSRow /MSECols SSCol t-1 MSCol MSCol /MSE
Error SSE(t-1)(t-2) MSE
Total SST t2
- 1
7/30/2019 Ou of Syllabus
92/138
LSD Text book Example
7/30/2019 Ou of Syllabus
93/138
Purpose: to test the bioequivalence of three formulations(A=solution, B=tablet, C=capsule) of a drug
Response: concentration of the drug in the blood as a function oftime since dosing
Three volunteers took drug in succession after washout period
After dosing, blood samples taken every hour forfour hours
Since there may be variation from subject to subject metabolism,subject is row factor
Since metabolism also could vary from time to time, time is column
7/30/2019 Ou of Syllabus
94/138
7/30/2019 Ou of Syllabus
95/138
7/30/2019 Ou of Syllabus
96/138
7/30/2019 Ou of Syllabus
97/138
The Graeco-Latin Square DesignThis design is used to simultaneously control (or
eliminate) three sources of nuisance variability
It is called Graeco-Latin because we usuallyspecify the third nuisance factor, represented by
the Greek letters, orthogonal to the Latin lettersA significant assumption is that the four factors
(treatments, nuisance factors) do not interact
If this assumption is violated, as with the Latin
square design, it will not produce valid results
GRAECO LATIN Square Design
7/30/2019 Ou of Syllabus
98/138
A Greaco-Latin square consists of two latin squares(one using the letters A, B, C, the other using greek
letters a, b, c, ) such that when the two latin squareare supper imposed on each other the letters of onesquare appear once and only once with the letters ofthe other square. The two Latin squares are calledmutually orthogonal.
Example: a 7 x 7 Greaco-Latin SquareA B C D E F G
B C D E F G A
C D E F G A B
D E F G A B C
E F G A B C DF G A B C D E
G A B C D E F
The GLSD Model
7/30/2019 Ou of Syllabus
99/138
k l i jij kl ij kl y m
i= 1,2,, t j= 1,2,, t
yij(kl) = the observation inithrow and thejthcolumn
receiving thekth
Latin treatment and thelth
Greektreatment
k= 1,2,, t l= 1,2,, t
m = overall mean
7/30/2019 Ou of Syllabus
100/138
m overall mean
k= the effect of thekth
Latin treatment
i
= the effect of theithrow
ij(k) = random error
j = the effect of thejthcolumn
No interaction between rows, columns,
Latin treatments and Greek treatments
l= the effect of thelthGreek treatment
7/30/2019 Ou of Syllabus
101/138
A Greaco-Latin Square experiment is
assumed to be a four-factor experiment. The factors are rows, columns, Latin
treatments and Greek treatments.
It is assumed that there is no interactionbetween rows, columns, Latin treatments
and Greek treatments.
7/30/2019 Ou of Syllabus
102/138
Analysis of Covariance
ANCOVA
Introduction
7/30/2019 Ou of Syllabus
103/138
Consider factorxwhich is correlated with y
BUT NOT with treatment Can measurexbut can't control/predict it
(as with blocks)
Nuisance factorxcalled a covariate ANCOVA adjusts yfor effect of covariatex
(retrospective adjustment for bias)
Without adjustment, effects ofxmay inflate 2
alter treatment comparison
Introduction
7/30/2019 Ou of Syllabus
104/138
ANCOVA combines regression and ANOVA Response variable is continuous
One or more explanatory factors (the treatments)
One or more continuous explanatory variables
The goal of ANCOVA is to reduce the error variance. This
increases the powerof tests and narrows the confidenceintervals.
Analysis of covariance adjusts formeasurable variables
that affect the response buthave nothing to do with thefactors (treatments) in the experiment.
Model Description
7/30/2019 Ou of Syllabus
105/138
Consider single covariate in CRD
Constant slope model is
Assumptions
xijnot affected by treatment
xand yare linearly related
Constant slope Errors are normally and independently distributed
Equality of error variance for different trts
ij i ij ijy xm
Model Description
7/30/2019 Ou of Syllabus
106/138
Non-constant slope model is
Additional assumptionsxijnot affected by treatment
xand yare linearly related
There is interaction between x and treatmentand hence non constant slope
( )ij i ij i ij ij
y x xm
Examples
7/30/2019 Ou of Syllabus
107/138
p
Pretest/Posttest score analysis: The gain in score y
may be associated with the pretest scorex. Analysis ofcovariance provides a way to control for pre-testdifferences. That way, one does not need a group ofstudents with similar pretest scores and randomlyassign them to a control and treatment group.
Weight gain experiments in animals: If wishing tocompare different feeds, the weight gain ymay beassociated with the original weight of the animal.
Comparing competing drug products: The effect ofthe drugA after two hours (measured on a scale from1 to 10) may be associated with the initial state of thesubject. Variables describing the initial state may beused as covariates.
Properties of ANCOVA Model
7/30/2019 Ou of Syllabus
108/138
While in ANOVA, E(Yij)=mi, in ANCOVA this is not truebecause of depends on Xij
Mean differences are the same at any value of x
Constancy of slopes: this is a crucial assumption sincethe difference between means can not be summarizedby a single number on the main effects, if violated
If treatments interact with x, resulting in non-parallellines,ANCOVA is not appropriate. In this case, separatetreatment regression lines need to be estimated andthen compared.
( )ij i ij ij
E y xm m
1 2 1 2m m
General Approach to ANCOVA
7/30/2019 Ou of Syllabus
109/138
pp
First look at the effect ofxij. If it isnt significant,
do an ANOVA and be done with it. Check to see thatxij is not significantly affected
by the factor values.
Test to see that is not significantly different for
all factor levels. This is an interaction between the factors and
the covariates.
If there is an interaction STOP!
If both tests pass, do the ANCOVA.
Model estimates
7/30/2019 Ou of Syllabus
110/138
Centering of X by its mean
..
. .
2
.
. .. .
( )( )
( )
ij i ij i
ij i
i i i
y
y y x x
x x
y y x
m
..( )
ij i ij ijy x xm
Inference
7/30/2019 Ou of Syllabus
111/138
H0: 1=2==g=0
Compare treatment means after adjusting fordifferences among treatments due todifferences in covariate levels.
We are not interested in testing whethercovariate (x) is significant or not
We could compute efficiency of modeling x
( | ) / ( 1)
/ ( 1)
SS trt x g F
SSE N g
ANCOVA Example
7/30/2019 Ou of Syllabus
112/138
Example: Data in the following example are selected
from a larger experiment on the use of drugs in thetreatment of leprosy (Snedecor and Cochran; 1967,p. 422).
Variables in the study are as follows: Drug: two antibiotics (A and D) and a control (F)
PreTreatment: a pretreatment score of leprosy bacilli PostTreatmenta posttreatment score of leprosy bacilli
Ten patients are selected for each treatment (Drug), andsix sites on each patient are measured for leprosy bacilli.
The covariate (a pretreatment score) is included in themodel for increased precision in determining the effect ofdrug treatments on the posttreatment count of bacilli.
ANCOVA Example
7/30/2019 Ou of Syllabus
113/138
data DrugTest;
input Drug $ PreTreatment PostTreatment @@;datalines;
A 11 6 A 8 0 A 5 2 A 14 8 A 19 11
A 6 4 A 10 13 A 6 1 A 11 8 A 3 0
D 6 0 D 6 2 D 7 3 D 8 1 D 18 18D 8 4 D 19 14 D 8 9 D 5 1 D 15 9
F 16 13 F 13 10 F 11 18 F 9 5 F 21 23
F 16 12 F 12 5 F 12 16 F 7 1 F 12 20 ;
ANCOVA Example
7/30/2019 Ou of Syllabus
114/138
perform ANOVA and compute Drug LS-
means
proc glm data=DrugTest;
class Drug;model PostTreatment = Drug / solution;
lsmeans Drug / stderr pdiff cov out=adjmeans;
run;proc print data=adjmeans; run;
ANCOVA Example
7/30/2019 Ou of Syllabus
115/138
perform a parallel-slopes analysis of covariancewith PROC GLM, and compute Drug LS-means
proc glm data=DrugTest;class Drug;model PostTreatment = Drug PreTreatment / solution;
lsmeans Drug / stderr pdiff cov out=adjmeans; run;proc print data=adjmeans; run;
This model assumes that the slopes relating
posttreatment scores to pretreatment scores areparallel for all drugs. You can check this assumption by including the
interaction, Drug*PreTreatment
ANCOVA Example
7/30/2019 Ou of Syllabus
116/138
The new graphical features of PROC GLM enable you to visualizethe fitted analysis of covariance model.
ods graphics on;proc glm data=DrugTest plot=meanplot(cl);class Drug;model PostTreatment = Drug PreTreatment;lsmeans Drug / pdiff;
run;ods graphics off;
the SAS statements PLOTS=MEANPLOT(CL) option addconfidence limits for the individual LS-means.
If you also specify the PDIFF option in the LSMEANS statement, the
output also includes a plot appropriate for the type of LS-meandifferences computed. In this case, the default is to compare all LS-means with each other pairwise, so the plot is a "diffogram" or"mean-mean scatter plot" (Hsu 1996),
ANCOVA Example
7/30/2019 Ou of Syllabus
117/138
ANCOVA Example
7/30/2019 Ou of Syllabus
118/138
Summary of graphs
7/30/2019 Ou of Syllabus
119/138
The analysis of covariance plot, Fig 1
Shows that the control (drug F) has higherposttreatment scores across the range ofpretreatment scores,
while the fitted models for the two antibiotics (drugs A
and D) nearly coincide. Similarly, while the diffogram, Fig 2 indicates
none of the LS-mean differences are significant,
the difference between the LS-means for the two
antibiotics is much closer to zero than the differencesbetween either one and the control.
Plot 1
7/30/2019 Ou of Syllabus
120/138
Plot 1
Plot 2
7/30/2019 Ou of Syllabus
121/138
Plot 2
Example2: with interaction
7/30/2019 Ou of Syllabus
122/138
Example2: with interaction
This model assumes that the slopes relating posttreatmentscores to pretreatment scores are parallel for all drugs.
7/30/2019 Ou of Syllabus
123/138
The Type I SS for Drug (293.6) gives the between-drug
sums of squares that are obtained for the analysis-of-variance model PostTreatment=Drug. This measures the difference between arithmetic means of
posttreatment scores for different drugs, disregarding thecovariate.
The Type III SS for Drug (68.5537) gives the Drug sum ofsquares adjusted for the covariate. This measures the differences between Drug LS-means,
controlling for the covariate. The Type I test is highly significant (p=0.001), but the Type
III test is not. This indicates that, while there is astatistically significant difference between the arithmeticdrug means, this difference is reduced to below the level ofbackground noise when you take the pretreatment scoresinto account.
From the table of parameter estimates, you can derive the least-squares predictive formula model for estimating posttreatment scorebased on pretreatment score and drug
7/30/2019 Ou of Syllabus
124/138
based on pretreatment score and drug.
The above results show the LS-means, which are, in a sense, themeans adjusted for the covariate.
The STDERR option in the LSMEANS statement causes the standarderror of the LS-means and the probability of getting a largertvalueunder the hypothesis: H0: LS-mean = 0 to be included in this table as
well. Specifying the PDIFF option causes all probability values for thehypothesis: H0: LS-mean(i) = LS-mean(j) to be displayed, where theindexes iandjare numbered treatment levels.
SAS applications
R n 1 constant slopes
7/30/2019 Ou of Syllabus
125/138
Run 1: constant slopesPROCGLM;
CLASS TRT;
MODEL Y=TRT X;
LSMEANS TRT/DIFF;
RUN;
Run 2: separate slopesPROCGLM;CLASS TRT;
MODEL Y=TRT X X*TRT/NOINT SOLUTION;
RUN;
Run 3: separate slopes (inflates TRT sum of squares-
order matters)PROCGLM;
CLASS TRT;
MODEL Y=X TRT X*TRT/NOINT SOLUTION;
RUN;
Consider the previous example
R 4 l
7/30/2019 Ou of Syllabus
126/138
Run 4: separate slopes
Test for equal slopes: Ho: all i equal ()PROCGLM;
CLASS TRT;
MODEL Y=TRT X X*TRT/NOINT SOLUTION;
CONTRAST 'EQUAL SLOPES' X*TRT 100 -1,
X*TRT 010 -1,
X*TRT001-1;
RUN;
Run 5: equal slopes model:
7/30/2019 Ou of Syllabus
127/138
Run 5: equal slopes model:
yij= + i+ xij+ ij
PROC GLM;
CLASS TRT;
MODEL Y=TRT X/SOLUTION;
*LSMEANS TRT/DIFF;
*LSMEANS TRT/AT X=0;
ESTIMATE 'INTCPT T=1' INTERCEPT 1 TRT 1 0 0 0;ESTIMATE 'INTCPT T=2' INTERCEPT 1 TRT 0 1 0 0;
ESTIMATE 'INTCPT T=3' INTERCEPT 1 TRT 0 0 1 0;
ESTIMATE 'INTCPT T=C' INTERCEPT 1 TRT 0 0 0 1;
ESTIMATE 'MEAN AT T=1' INTERCEPT 1 TRT 1 0 0 0 X 346.75;
ESTIMATE 'MEAN AT T=2' INTERCEPT 1 TRT 0 1 0 0 X 371.375;
ESTIMATE 'MEAN AT T=3' INTERCEPT 1 TRT 0 0 1 0 X 380.375;ESTIMATE 'MEAN AT T=C' INTERCEPT 1 TRT 0 0 0 1 X 414.125;
RUN;
Run 1: ANOVA without Covariate
7/30/2019 Ou of Syllabus
128/138
Run 2: ANOVA with Covariate, equal slopes
7/30/2019 Ou of Syllabus
129/138
Run 3: ANOVA with Covariate, separate slopes
7/30/2019 Ou of Syllabus
130/138
Note
7/30/2019 Ou of Syllabus
131/138
The total variation in the response (SST) is equal to
the sum of the: Variation explained by the treatment (SSA), plus the
Variation explained by the covariate, plus the
Variation explained by the interaction between the factorlevels and the covariate (hopefully small), plus the
Variation explained by the error term.
Since the factor levels and the covariate aredependent in non-orthogonal data, fitting thecovariate first inflates the variation explained by the
treatment, potentially producing an invalidpositiveresult.
So put the treatment variable firstin the model.
ANCOVA
7/30/2019 Ou of Syllabus
132/138
Can incorporate covariate into any model
For example: constant slope model for a two-factor model
Assume constant slope for each (i j)combination
Can include interaction terms to vary slope
Plot yvsxfor each combination
( )ijk i j ij ijk ijk
y xm
Summary
7/30/2019 Ou of Syllabus
133/138
y
If you have covariates, use them. They willimprove your confidence intervals or identify thatyou have a problem.
Order matters in fitting.
In ANCOVA, fit the treatment variable first.Youre interested in the effect of the treatment,not of the control variable.
If the interaction between the treatment andcontrol variables is significant, stop!
It means the slopes differ significantly, which is a(nasty) problem.
Summary
7/30/2019 Ou of Syllabus
134/138
Effectiveness of ANCOVA can be measured as
ANCOVA and ANOVA need not necessarily leadto the same conclusion on treatment effect
If X is pre-treatment measure of Y and If the
slope for Y on x regression is known to be one
then we could do ANOVA on Y-Xinstead of
ANCOVA
A N O V A A N C O V A
A N O V A
M S E M SER E
M S E
Summary
7/30/2019 Ou of Syllabus
135/138
Unequal ns or unbalanced Designs
under the MCAR (Missing data completely atrandom) assumption:
SAS Type III Sum of Squares provides a test of thepartial effects,
all submodels are compared to the overall model
0
ij i ij ij
i i ij
Y x
x
m
Summary
7/30/2019 Ou of Syllabus
136/138
SAS Type I SS
SAS model statement: (testing the equality ofslopes assumption in ancova)
model y= trt cov trt*cov;
SS(trt | )
SS(cov | , trt)SS(trt*cov | , trt, cov)
For Type I SS, the sum of all effects add up tothe model SS:
SS(trt)+SS(cov)+SS(trt*cov)+SS(error)=SS(total)
SSs are also independent
Summary
7/30/2019 Ou of Syllabus
137/138
SAS Type II SS
SAS model statement: (testing the equality ofslopes assumption in ancova)
model y= trt cov trt*cov;
SS(trt | ,cov)
SS(cov | , trt)SS(trt*cov | , trt, cov)
ForType II SS do NOT necessarily add uptomodel SS:
SS(trt)+SS(cov)+SS(trt*cov)+SS(error)SS(total)
SSs are NOT independent
Summary
7/30/2019 Ou of Syllabus
138/138
SAS Type III: Partial Sum of Squares
SAS model statement: (testing the equality ofslopes assumption in ancova)
model y= trt cov trt*cov;
SS(trt | ,cov, trt*cov)
SS(cov | , trt, trt*cov)SS(trt*cov | , trt, cov)
For Type III SS do NOT necessarily add uptomodel SS:
SS(trt)+SS(cov)+SS(trt*cov)+SS(error)SS(total)
SS NOT i d d t