Ou of Syllabus

7/30/2019 Ou of Syllabus

1/138

LECTURE NOTES

Multiple Factor Designs

Sept 8-Sept 20


2/138

Day7-Two or More Factor Designs

RCBD

LSD ANCOVA

Factorial Designs


3/138

Introduction

What happens when theres more than onefactor?

Vary one factor at a time

Study the factors jointly

Situations One controllable confounding factor (RCBD)

More than one controllable confounding factor (LSD)

One or more recordable but uncontrollable factors(ANCOVA)

Several factors of interest (Factorial Design)


4/138

?


5/138

Confounding factors are factors which we stronglyexpect to have an influence on the dependent

variable (Y) but which are not the primary factorthat we wish to test for effects on Y.

A nuisance or confounding factor is a factor thatprobably has some effect on the response, but it is

of no interest to the experimenterhowever, thevariability it transmits to the response needs to beminimized

Typical nuisance factors include batches of rawmaterial, pieces of test equipment, time (shifts,days, etc.), different experimental units Physical entities with similar characteristics (plots of

land, genetically similar animals or litter mates)


6/138

These variables are not being controlled by

the analyst but can have an effect on theoutcome of the treatment being studied

If they are unknown we hope to have controlledthem via randomization

If they are known and controllableblocking

If they are known but uncontrollable -ANCOVA


7/138

Blocking is a type ofconstrained randomization thatcan be used to control confounding by creating a "block"orhomogenous strata within which we will be able toexamine all of the treatments.

This avoids the possibility that the treatments could beimbalanced with respect to the confounding factor(s)resulting in a confusion as whether the results we get are

due more to an unfortunate arrangement of theconfounders than the treatment effects themselves.

RCBD- Randomized Complete Block Design


8/138

The key objective in blocking the

experimental units is To make them as homogenous as possible

within blocks with respect to the responsevariable under study

To make the different blocks as hetrogenousas possible with respect to the responsevariable under study

When each treatment is included only oncein each block, it is called RCBD


9/138

The reason for blocking is that we hope to reducethe error sum of squares by explaining anadditional component of that error with thevariation within blocks.

If successful this results in a much smaller value for the

MSE (more precise results than CRD). A smaller MSE will make the value of the test statistic

bigger, howevera smaller number of degrees offreedom for MSE will make the critical value of the100(1-) percentile point of the F distribution larger.

Generally though, the change in MSE will have a muchgreater effect on the test statistic making the test morepowerful.


10/138

This variance reducing design, in addition totesting factor of interest, it could also help withunderstanding

If process is robust to nuisance conditions Blocking is necessary in future experiments

It is highly desirable that the Experimental unitswithin each block are processed togetherwhenever this will help to reduce experimentalerror variability

Example: if the experimenter might change theadministration of the experiment to the subjects overtime, consecutive processing of EUs block by blockwill reduce such sources of variation from the withinblocks leading to more precise results


11/138

In setting up a randomized block experiment witha levels of the treatment factor and b blocks, we

can have the block represent either a random or afixed factor.

Random blocks would correspond to a situation

where we have sampled a group of b levels from abigger population of possible blocking levels. Hence the eventual conclusions we draw from the

experiment can be extrapolated to the larger populationfrom which we sampled.

Fixed blocks correspond to the situation where wehave chosen to examine specific set of b blockinglevels

and the results of our experiment only apply to those blevels (no extrapolation to other levels).


12/138

Example 1: experiment on the effects of vitamin C on the prevention of colds. 868 children randomly assigned: treatment (500mg,1000mg of

vitamin c) and a placebo (identical tablet with no vitamin C) on adaily basis.

response of interest = number of colds contracted by each child. The study showed No difference in average number of colds in the

treatment groups and the placebo group. Other factors that may affect the number of colds contracted might

include, gender, age, nutritional habits of the child, etc. These factors that may affect the response but are not of primaryinterest to the investigator are referred to as nuisance orconfounding factors.

In blocked experiments the heterogeneous experimental units are

divided into homogenous subgroups called blocks and separateexperiments are conducted in each block. For example blocking by genderwould mean doing the experiment

on males and females separately.


13/138

Example 2:

An investigator is interested in testing the effects ofdrugs A and B on the lymphocyte count in mice bycomparing A,B and Placebo,P.

In designing the experiment, he assumed mice from thesame litterwould be more homogenous in their

response than would mice from different litters.

He arranged the experiment in an RCBD design withthree litter-mates forming each block and a total of 7blocks.

In each block the litter mates were randomly assignedto the treatments resulting in the following data afterconclusion of the experiment (lymphocyte count given inunits of 1000 per cubic mm of blood)


14/138

lymphocyte count in mice

Blocks

treatment 1 2 3 4 5 6 7 mean

P 5.4 4.0 7.0 5.8 3.5 7.6 5.5 5.54

A 6.0 4.8 6.9 6.4 5.5 9.0 6.8 6.49

B 5.1 3.9 6.5 5.6 3.9 7.0 5.4 5.34

mean 5.50 4.23 6.80 5.93 4.30 7.87 5.90 5.79


15/138

Analysis assuming CRD:effects of treatment on the lymphocyte count in mice

Source DFSum of

SquaresMean

SquareF

Value Pr > F

Treatment 2 5.22 2.61 1.47 0.256

Error 18 32.02 1.78

Corrected

Total

20 37.24

What will be the difference if we analyze the data

taking into account the blocking by litter effect?


16/138

The RCBD Model

y ij i j ij

i= 1,2,, a j= 1,2,,b

yij = the observation inith

treatment in thej

th

block

m = overall mean

i = the effect of thei

th

treatmentj = the effect of thej

th block

ij = random error

No interactionbetween blocks

and treatments


17/138

Properties of the model

Sum ofi is zero Sum ofj is zero

E(ij) = 0 which implies E(Yij) = mij =

and

Var(Yij) = 2

Yij ~ N(mij , 2)

Cov(ij, ik) = 0 Cov(ij, lk) = 0

i jm

j k

and j k i l

0j

1

0

a

ii


18/138

The additive model implies that the expected values

of observations in different blocks for the sametreatment may differ, but the treatment effects arethe same for all blocks

There is a possibility for interaction between blocks

and treatment (Tukeys additivity test)

( )ij i jE Y m


19/138

Statistical Inference

Under the stated assumptions, we could useOLS or MLE to estimate parameters

Hypothesis

Partition sum of squares - Two way ANOVA

1 20 :

1 : 0

aH

H N ot H

m m m


20/138

SST(total sum of squares)

SStr(treatment

sum of squares)

SSE(error sum of squares)

SSB

(sum of squares

blocks)

SSE

(sum of squares

error)

TWO WAY ANOVA


21/138

Individual

observations

.

.

.

.

.

.

.

.

.

.

.

.

Single Independent Variable

Blocking

Variable

.

.

.

.

.

Randomized Block Design


22/138

partition the total sum of squares (SST) ,

in to three components (SStr, SSb andSSE)

2 22 2

.. . .. . .. . . ..1 1

( ) ( ) ( ) ( )a b a b a b

ij i j ij i ji j i j i j

SStr SST SSb SSE

Y Y b Y Y a Y Y Y Y Y Y


23/138

TWO way ANOVA for RCBDThe degrees of freedom for the sums of squares in

are as follows:

Ratios of sums of squares to their degrees offreedom result in mean squares, and

We could use Cochrans theorem to decide about thedistribution of the ratio of the mean squares

used to test the hypothesis H0:equal treatmentmeans

S S T S S tr S S B S S E

1 ( 1) ( 1) [( 1)( 1)]ab a b a b


24/138

Expected mean squares

E(MSE) =2

E(MStr) = 2 +

Exercise: 95% CI for2

Thus, we could test the treatment effecthypothesis H0: all i=0 vs H1: not H0 by thestatistic

2 2

2( )

1 1

a a

i ii ib b

a a

m m

2

2( )

1

jj

a

E M SBb

~ ( 1, ( 1)( 1); 0)M S tr

F F a a bM S E


25/138

Under the specific alternative hypothesis with given

values for the i's, this test statistic has a non-centralF distribution with non-centrality parameter given bylambda,

Where

~ ( 1, ( 1)( 1); )M S tr

F F a a b

M S E

2

2, which is same as in the CR D

a

ii

b


26/138

ANOVA Table

Source of

variationDegrees of

freedomaSums of

squares (SSQ)Mean

square (MS)F

Blocks (B) b-1 SSB SSB/(b-1) MSB/MSE

Treatments (Tr) a-1 SStr SStr/(a-1) MStr/MSE

Error (E) (a-1)*(b-1) SSE SSE/((a-1)*(b-1))

Total (Tot) a*b-1 SST

awhere a=number of treatments and b=number of blocks or replications.


27/138

Exercise: If only two treatments are investigated

(a=2) in RCBD, it can be shown that the F

test for treatment effects given above isequivalent to the two sided t-test forpaired observations


28/138

Blocking effect

the test will be more powerful here for the same values of b (r inthe previous case) and the i's, because if the blocking was

appropriate (i.e. if that factor had a pronounced effect on outcome)we will usually have a much smaller error variance than if we didnot block, resulting in a much bigger non-centrality value.

However, we will be looking at power under a slightly differentcondition of having a smaller df for MSE.

While this will require a larger critical value of significance(reducing the power if all other things were held equal), this isusually more than made up for by the large reduction in errorvariance 2 achieved by the design.

Beware that if your blocks had no really important effect on

outcome, you could potentially lose power by blocking. Thus, it is important to block only when you have solid evidence

that you are likely to gain something by adding this feature.


29/138

Blocking effect

Successful blocking minimizes variance

among units within blocks whilemaximizing the variance among blocks.

Since, precision usually decreases as thenumber of experimental units per blockincreases, block size should be kept as

small as possible.


30/138

Analysis under RCBD: testing the effects of drugsA and B on the lymphocyte count in mice

Source DFSum of

SquaresMean

SquareF

Value Pr > F

Treatment 2 5.22 2.61 17.93 0.00005Blocking 6 30.28 5.05 34.71


31/138

Do we need to test block effect?

Usually not of interest (blocked for a reason)

Blocks are not randomized to experimental units Can compute ratio of variation explained by blocking to understand

the impact of blocking

Trade-off: reduction in variance vs loss in degrees of freedom

Relative efficiency

Ultimately, the loss in df will have little effect as long as a moderatenumber of error degrees of freedom are available

2

: 2

2

( 1)( 3)

( 3)( 1)

, are error variances

( 1)( 1)

( 1)

R C B D C R D C R D

R C B D C R D

R C B D C R D R C B D

R C B D

C R D

v vR Ev v

where

v a b

v a b


32/138

2 ( 1) ( 1)

1cr d

b M SB b a M SE

ab

Please change t by a.


33/138

Assumptions

Normality Histogram and probability plot (qqplot)

Additivity

Tukeys test of additivity (block x trt interaction)

If significant, it means block effect different fordifferent treatments

Log transformation could eliminate interaction (non-addititvity)

eg. E(yij)=mij then log(yij)= m+ I + j +eij

Constant variance (check by treatment and block)


34/138

He introduced the word "bit" as acontraction of binary digit.

He used the term "software" in acomputing context in a 1958 article

And also pioneered many statisticalmethods

articulated the important distinctionbetween exploratory data analysis and

He retired in 1985. In 2000, he died inNew Brunswick, New Jersey.

?
http://en.wikipedia.org/wiki/Bithttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/New_Brunswick%2C_New_Jerseyhttp://en.wikipedia.org/wiki/Image:John_Tukey.jpghttp://en.wikipedia.org/wiki/New_Brunswick%2C_New_Jerseyhttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Bit


35/138

Checking Additive assumption

(four approaches)

Tukeys test of additivity (block x trt interaction) Plot of residuals against fitted values

A curvilinear pattern of the residuals suggests thepresence ofinteraction and also suggests non-

constancy of variances A more effective plot is plot of the responses Yij

by blocks X-axis is treatment, y-axis is response and the

overlayed lines are for each block. Lack of parallelism is strong indication that blocks and

treatment interact in their effects on the response


36/138

Interaction test with a single replication per cell

Having a single replication per cell has so far

prevented us from testing for an interaction effectsimilar to what is done in factorial designs.

But, there is a special type of interaction that we

could test, called Tukeys one degrees of freedomtest of non-additivity.

We are interested to know if this model is anybetter than just the simple additive model?

, 1, .., ; 1, ..,ij i j i j ij

Y i a j bm


37/138

If the iand j were known, we might use aleast squares approach to obtain an

estimate of lambda.

That is, find estimate of lambda such thatminimizes the above expression,

2

1 1

[ ( )]a b

ij i j i ji j

Y m

, 1, .., ; 1, ..,ij i j i j ij

Y i a j bm


38/138

Taking the usual estimates of theparameters , iand j, and minimizing this

we get,

2

1 1

[ ( )]a b

ij i j i ji j

Y m

1 1

2 2

1 1

a b

i j ij

i ja b

i ji j

Y


39/138

Let

dij = and since

then we can think of this as a contrast in the means for Yij withone replication per cell.

Re-writing the above expression after proper substitution

we get Tukeys sum of squares for non-additivity,

And, since this is a contrast, it will have one degrees offreedom.

0a b

i j

i j

2

1 1

2

1 1

{ }a b

ij iji j

n o n a d d a b

iji j

d Y

SSd

i j


40/138

Let,- SSremainder= SSE SSnonadd- (a-1)(b-1)-1 = (abab+1)-1=ab-a-b.

SSnonadd and SSremainderare orthogonal to each otherand hence are statistically independent (Cochrans Thm).

Thus we can test

H0: = 0 vs H1: not H0

with ~ (1, , 0)/ ( )

nonadd

nonadd

remainder

S SF F ab a b

S S a b a b


41/138

Q. What do we do if we accept H0 or do not reject H0? A. most statisticians would pool SSnonass and SSremainder into SSE

and do the usual tests for the main effect of treatment and blocking In general in order to reduce type II error rate, a liberal type I error

rate is used for interaction test (alpha>10%)

Q. what do we do if we reject H0?

A1. if the true model structure was not additive and went ahead and didthe usual main effects test using F=MStr/MSE, then

we will have a type-I error level that is smallerthan the nominal.

We would get too few significant results and the testing procedure wouldbe conservative.

However, if we get a significant result with this test we might

feel that it would also have been significant with the proper kind

of test based on a non-additive model.

A2. Make efforts to remove it via transformations of Y (eg. Sqrt, log)

D 8 R i A h RCBD


42/138

Day 8-Regression Approach to RCBD Example: consider RCBD with 3 blocks and 2 levels of Treatment B.

Block as fixed effect and model of the form

yij = + i +j + eij

y X

1

2

3

1

2

B

m


43/138

Regression approach to test additivity

1. Fit additive model

2. Obtain residuals, rij3. Fit additive model

4. Obtain residuals from (3) rij

5. Tukey sum of squares is

6. F=TSS/MSE ~F(1,ab-a-b,0)

2

..

2ij

ij i j ij

y

yy

m

ij i j ijy m

2 2'

ij ijT SS r r


44/138

Multiple Comparison in RCBD

Similar to procedures in CRD

g(n) is replaced by a(b) in formula

Degree of freedom for error is (b-1)(a-1)

1 / 2;( 1)( 1)

1 ; ,( 1)( 1)

:

:

: ( 1) (1 ; 1, ( 1)( 1)

a b

a a b

t test t

T ukey q

Scheffe a F a a b


45/138

There are some variations to simpleRCBD.

RCBD with replicates within blocks

Incomplete block designs (block designs withfewer EUs per block than treatments)

More than two directional blocking (LSD,

GLSD,)

Variations to simple RCBD


46/138

An experiment was designed to study theperformance offour different detergents forcleaning clothes.

The following cleanness readings (higher =

cleaner) were obtained with specially designedequipment forthree different types of commonstains (blocking factor).

Is there a difference among the detergents?;

Example: Deteregent Study

R li t d RCBD


47/138

Replicated RCBD

Advantages of replicated RCBD: The natural block size may result in more units per

block than there are treatments thus allowing forwithin block replication

Within block replication allows for the separation ofblock*treatment interaction from experimental error,

which may improve the interpretation of results whenthe block*treatment interaction is significant

The within block replication may be used to assignextra replication to selected treatments to increasesensitivity for comparisons of interest

**with the key disadvantage that a large blocksize (if it is not the natural block size) reducesthe effectiveness of the blocking

It is also called Generalized RCBD


48/138

In replicated RCBD the model is,

Yijk= + i+ j+ ()ij + ijk,

i=1,,a;

j=1,,b;

k=1,,s

where

Yijk is the response for the kth subject in j-thblock and i-th group ,

()ij is the interaction effect of the ithtreatment with jth block

ANOVA Table


49/138

ANOVA TableSource of

variationDegrees of

freedomaSums of

squares (SS)Mean

square (MS)F

Blocks (B) b-1 SSB SSB/(b-1) MSB/MSE

Treatments (Tr) a-1 SStr SStr/(a-1) MSTr/MSE

Block*Treatment (B*T) (a-1)*(b-1) SSBT SSBT/(a-1)*(b-1) MSBT/MSE

Experimental Error (E) a*b(s-1) SSE SSE/a*b(s-1)

Total (Tot) a*b*s-1 SST

awhere a=number of treatments, b=number of blocks and s=number of replications.


50/138

Expected Mean Squares

Both treatment and block are fixed

E(MSE) =2

E(MStr) = 2 + sb

E(MSb) = 2 + sa

E(MStb) = 2 + s

2

1

a

ii

a

2

1

b

jj

b

2

1

( )

( 1)( 1)

b a

ijj i

a b

10 : ... 0

/

aH

F M str M SE

E t d M S


51/138


Only blocks are random and hence

interaction too is random

E(MSE) =2

E(MStr) = 2 + s2tb +sb

E(MSb) = 2 + sa2b

E(MStb) = 2 + s2tb

2

1

a

ii

a

10 : ... 0

/

aH

F M str M Stb

E t d M S


52/138


Both effects are random

E(MSE) =2

E(MStr) = 2 + sb2t + s2tb

E(MSb) = 2 + sa2t + s2tb

E(MStb) = 2 + s2tb2

0 : 0

/

H

F M str M S tb


53/138

RCBD (one replication) with random block effect

If the blocks are random effects then

j ~N(0 , b2)

E(ij) = 0 which implies E(Yij) = mi =

and

Var(Yij) = 2 + b2

Yij ~ N(mi , 2 + b

2)

Cov(ij, ik) = b2

( ) Cov(ij, lk) = b

2 ( )

im

j k

and j k i l

E t d M S


54/138


For RCBD with single replication, if only

blocks are random

E(MSE) =2

E(MStr) = 2 + b

E(MSB) = 2 + a2b

2

1

a

ii

a

10 : ... 0

/

aH

F M str M SE

- Think of an unbiased estimator for2b

I l t Bl k D i (IBD)


55/138

Incomplete Block Design (IBD)

We will see the analysis methods for IBD later

IBD is an RCBD design in which there are fewerexperimental units per block than treatments.

One type of this design is balanced incomplete blockdesign (BIBD).

In this design every treatment pair occurs within a blockexactly the same number of times.

The reason for this type of design is to take advantage ofgreater efficiency of smaller block sizes.

Example: suppose we are interested in testing four

mosquito repellents. The natural block size is two (our two arms), but we have 4 trts.

The following design is proposed where the data are numbers ofmosquito bites during a specified period of time.

BIBD design example


56/138

BIBD design example

Treatments

Subject A B C D1 12 9

2 9 7

3 18 11

4 3 4

5 5 9

6 13 10

Treatment

mean13 9 8 10

- It is incomplete because not all treatments occur in each block- it is considered balanced in the sense that each pair of treatments occurs together

(with blocks) exactly the same number of times.

Example 1: Insurance premium example


57/138

Example 1: Insurance premium example

An analyst in insurance company A studiedthe premium for auto insurance in six cities.

The six cities were selected to representdifferent regions (East, West) and differentsizes (small, medium and large).

Response= three months premium chargesfor a certain category of risk.

The interest is to study the effect of city size

controlling for geographical region.

ANOVA

Source df SS MS F Pvalue

city 2 9300 4650 93 0.0106

Region 1 1350 1350 27 0.0351

Error 2 100 50

region

city E W ave

small 140 100 120

med 210 180 195

large 220 200 210

ave 190 160 175


58/138

Data Example: Insurance premium exampledata insurance;

input premium cityregion;

datalines;

140 1 1

100 1 2

210 2 1

180 2 2

220 3 1200 3 2

;

procglmdata=insurance;

class city region;

model premium = cityregion;

means city region

/tukey;

run;

quit;

ANOVA

Source df SS MS F Pvalue

city 2 9300 4650 93 .Region 1 1350 1350 27 .

City*Reg 2 100 50

Error 0 . .

Tukeys testObs msa msb ssab ssrem f p_value1 4650 450 87.0968 12.9032 6.75 0.23391


59/138

Creating RCBD in SAS


60/138

A.

B. use randomized block design and randomize the four

treatments to four flowers within each type


61/138


62/138


63/138


64/138

Text Book Example

(Page 121 of JL)


65/138

=number of lever presses/elapsed time of the session

Trt= 5 dosages of drug in mg/kg

What is the EU?


66/138


67/138

Two Way ANOVA


68/138

Examining Trend: since factor is quantitative


69/138


70/138


71/138

2 ( 1) ( 1)( 1)

( 1)

9(0.185) 9(5 1)(0.0083)0.0408

5 (10 1)

crd

b M S B a b M S E

a b


72/138

2 2

2

crd rcb

crd

Summary


73/138

Summary

Non-replicated RCBD:

When experimental units represent physical entities Smaller blocks of EUs usually result in greater homogeneity

The larger the blocks, the less homogenous

Replicated RCBD When EUs represent trials rather than physical

entities and the experimental runs can be madequickly,

larger block sizes may not increase variability of EUs with ina block

Summary


74/138

Summary

Advantages of replicated RCBD

More error degrees of freedom

Interaction and error are not confounded

Can separate error and interaction SS

Easier assessment of additivity

Is good ifblocks are expensive butobservations are cheap

Consider example: tee height (golf)


75/138

Example of Generalized RCBDPage 128 of text book

Objective

To determine if tee height affects golf driving distance


76/138

(a)purpose

To recommend whattee height to use

(b) Identify sources ofvariation

tee heightGolfer and ability level

brand ball club wind speedrepeat swings

c) Choose rule to assign experimental units to treatmentf


77/138

factors

Complete Block Design: randomize the order that each golfer

Hit a ball from each of the tee height

Blocks will be Golfers (takes into account differencesin ability levels and clubs)

random sample of golfers? (9 golfers)

Treatment Factor tee height,each golferwill hit 5 balls from each tee heightin a randomized order

d) Measurements to be made:1) distance


78/138


79/138

Note:

-In the middle table pvalue


80/138

-Since treatment effect is significant we could investigate further on pairwise-Note that the error term is block*trt

-Conclusion: tee your golf ball up so that half of the ball is above the crownof the driver club-face to maximize distance


81/138

The power of the F test for treatment effects for RCBD involves the samenon-centrality parameter as for CRD

But, the two lead to different power levels. Why?

variance (2) will differ for the two designs

degrees of freedom associated with denominator also differ

E l


82/138

Example:

Consider the text book example for d-ampthamine


83/138


84/138


85/138

Day9: Latin Square Design (LSD)

-Due to Fisher (1935)

- agricultural experiments (eg fertility gradient of plots)

-Industrial experiments (eg. Wear life of auto tires)

-Pharmaceutical (eg. Bioequivalence study)

The Latin Square Design


86/138

This design is used to simultaneously control (or eliminate) twoindependent sources of nuisance variability

It is called Latin because we usually specify the treatment by theLatin letters

Square because it always has the same number of levels (t) for therow and column nuisance factors

A significant assumption is that the three factors (treatments and two

nuisance factors) do not interact More restrictive than the RCBD

Each treatment appears once and only once in each row and column

If you can block on two sources of variation (rows x columns) youcan reduce experimental error when compared to the RCBD

It further reduces variability increasingSensitivity to detect treatment effect

A

B C D

A

B C D A

BC D

A

B CD


87/138

In LSD every treatment occurs in every row and column

Also every row occurs in every column and vise versa

Ad t d Di d t


88/138

Advantages and Disadvantages

Advantage: Allows the experimenter to control two sources of

variation

Disadvantages: Error degree of freedom ([t-1]x[t-2]) is small if there

are only a few treatments

The experiment becomes very large if the number of

treatments is large The statistical analysis is complicated by missing

blocks and mis-assigned treatments

Th LSD M d l


89/138

The LSD Model

k i jij k ij k y m i= 1,2,, t j= 1,2,, t

yij(k) = the observation inithrow and thejthcolumn

receiving thekthtreatment

m= overall mean

k= the effect of theithtreatment

i = the effect of theithrow

ij(k)

= random error

k= 1,2,, t

j = the effect of thejthcolumn

No interactionbetween rows,

columns and

treatments


90/138

A Latin Square experiment is assumed to bea three-factor experiment.

The factors are rows, columns andtreatments.

It is assumed that there is no interaction

between rows, columns and treatments.

We can partition the sum of squares into

four components

SST=SSR+SSC+SStr+SSE

Usual F test under H0 using Cochrans

theorem

The ANOVA Table for a Latin Square Experiment


91/138

The ANOVA Table for a Latin Square Experiment

Source S.S. d.f. M.S. F p-value

Treat SStr t-1 MStr MStr/MSE

Rows SSRow t-1 MSRow MSRow /MSECols SSCol t-1 MSCol MSCol /MSE

Error SSE(t-1)(t-2) MSE

Total SST t2

- 1


92/138

LSD Text book Example


93/138

Purpose: to test the bioequivalence of three formulations(A=solution, B=tablet, C=capsule) of a drug

Response: concentration of the drug in the blood as a function oftime since dosing

Three volunteers took drug in succession after washout period

After dosing, blood samples taken every hour forfour hours

Since there may be variation from subject to subject metabolism,subject is row factor

Since metabolism also could vary from time to time, time is column


94/138


95/138


96/138


97/138

The Graeco-Latin Square DesignThis design is used to simultaneously control (or

eliminate) three sources of nuisance variability

It is called Graeco-Latin because we usuallyspecify the third nuisance factor, represented by

the Greek letters, orthogonal to the Latin lettersA significant assumption is that the four factors

(treatments, nuisance factors) do not interact

If this assumption is violated, as with the Latin

square design, it will not produce valid results

GRAECO LATIN Square Design


98/138

A Greaco-Latin square consists of two latin squares(one using the letters A, B, C, the other using greek

letters a, b, c, ) such that when the two latin squareare supper imposed on each other the letters of onesquare appear once and only once with the letters ofthe other square. The two Latin squares are calledmutually orthogonal.

Example: a 7 x 7 Greaco-Latin SquareA B C D E F G

B C D E F G A

C D E F G A B

D E F G A B C

E F G A B C DF G A B C D E

G A B C D E F

The GLSD Model


99/138

k l i jij kl ij kl y m

i= 1,2,, t j= 1,2,, t

yij(kl) = the observation inithrow and thejthcolumn

receiving thekth

Latin treatment and thelth

Greektreatment

k= 1,2,, t l= 1,2,, t

m = overall mean


100/138

m overall mean

k= the effect of thekth

Latin treatment

i

= the effect of theithrow

ij(k) = random error

j = the effect of thejthcolumn

No interaction between rows, columns,

Latin treatments and Greek treatments

l= the effect of thelthGreek treatment


101/138

A Greaco-Latin Square experiment is

assumed to be a four-factor experiment. The factors are rows, columns, Latin

treatments and Greek treatments.

It is assumed that there is no interactionbetween rows, columns, Latin treatments

and Greek treatments.


102/138

Analysis of Covariance

ANCOVA

Introduction


103/138

Consider factorxwhich is correlated with y

BUT NOT with treatment Can measurexbut can't control/predict it

(as with blocks)

Nuisance factorxcalled a covariate ANCOVA adjusts yfor effect of covariatex

(retrospective adjustment for bias)

Without adjustment, effects ofxmay inflate 2

alter treatment comparison

Introduction


104/138

ANCOVA combines regression and ANOVA Response variable is continuous

One or more explanatory factors (the treatments)

One or more continuous explanatory variables

The goal of ANCOVA is to reduce the error variance. This

increases the powerof tests and narrows the confidenceintervals.

Analysis of covariance adjusts formeasurable variables

that affect the response buthave nothing to do with thefactors (treatments) in the experiment.

Model Description


105/138

Consider single covariate in CRD

Constant slope model is

Assumptions

xijnot affected by treatment

xand yare linearly related

Constant slope Errors are normally and independently distributed

Equality of error variance for different trts

ij i ij ijy xm

Model Description


106/138

Non-constant slope model is

Additional assumptionsxijnot affected by treatment

xand yare linearly related

There is interaction between x and treatmentand hence non constant slope

( )ij i ij i ij ij

y x xm

Examples


107/138

p

Pretest/Posttest score analysis: The gain in score y

may be associated with the pretest scorex. Analysis ofcovariance provides a way to control for pre-testdifferences. That way, one does not need a group ofstudents with similar pretest scores and randomlyassign them to a control and treatment group.

Weight gain experiments in animals: If wishing tocompare different feeds, the weight gain ymay beassociated with the original weight of the animal.

Comparing competing drug products: The effect ofthe drugA after two hours (measured on a scale from1 to 10) may be associated with the initial state of thesubject. Variables describing the initial state may beused as covariates.

Properties of ANCOVA Model


108/138

While in ANOVA, E(Yij)=mi, in ANCOVA this is not truebecause of depends on Xij

Mean differences are the same at any value of x

Constancy of slopes: this is a crucial assumption sincethe difference between means can not be summarizedby a single number on the main effects, if violated

If treatments interact with x, resulting in non-parallellines,ANCOVA is not appropriate. In this case, separatetreatment regression lines need to be estimated andthen compared.

( )ij i ij ij

E y xm m

1 2 1 2m m

General Approach to ANCOVA


109/138

pp

First look at the effect ofxij. If it isnt significant,

do an ANOVA and be done with it. Check to see thatxij is not significantly affected

by the factor values.

Test to see that is not significantly different for

all factor levels. This is an interaction between the factors and

the covariates.

If there is an interaction STOP!

If both tests pass, do the ANCOVA.

Model estimates


110/138

Centering of X by its mean

..

. .

2

.

. .. .

( )( )

( )

ij i ij i

ij i

i i i

y

y y x x

x x

y y x

m

..( )

ij i ij ijy x xm

Inference


111/138

H0: 1=2==g=0

Compare treatment means after adjusting fordifferences among treatments due todifferences in covariate levels.

We are not interested in testing whethercovariate (x) is significant or not

We could compute efficiency of modeling x

( | ) / ( 1)

/ ( 1)

SS trt x g F

SSE N g

ANCOVA Example


112/138

Example: Data in the following example are selected

from a larger experiment on the use of drugs in thetreatment of leprosy (Snedecor and Cochran; 1967,p. 422).

Variables in the study are as follows: Drug: two antibiotics (A and D) and a control (F)

PreTreatment: a pretreatment score of leprosy bacilli PostTreatmenta posttreatment score of leprosy bacilli

Ten patients are selected for each treatment (Drug), andsix sites on each patient are measured for leprosy bacilli.

The covariate (a pretreatment score) is included in themodel for increased precision in determining the effect ofdrug treatments on the posttreatment count of bacilli.

ANCOVA Example


113/138

data DrugTest;

input Drug $ PreTreatment PostTreatment @@;datalines;

A 11 6 A 8 0 A 5 2 A 14 8 A 19 11

A 6 4 A 10 13 A 6 1 A 11 8 A 3 0

D 6 0 D 6 2 D 7 3 D 8 1 D 18 18D 8 4 D 19 14 D 8 9 D 5 1 D 15 9

F 16 13 F 13 10 F 11 18 F 9 5 F 21 23

F 16 12 F 12 5 F 12 16 F 7 1 F 12 20 ;

ANCOVA Example


114/138

perform ANOVA and compute Drug LS-

means

proc glm data=DrugTest;

class Drug;model PostTreatment = Drug / solution;

lsmeans Drug / stderr pdiff cov out=adjmeans;

run;proc print data=adjmeans; run;

ANCOVA Example


115/138

perform a parallel-slopes analysis of covariancewith PROC GLM, and compute Drug LS-means

proc glm data=DrugTest;class Drug;model PostTreatment = Drug PreTreatment / solution;

lsmeans Drug / stderr pdiff cov out=adjmeans; run;proc print data=adjmeans; run;

This model assumes that the slopes relating

posttreatment scores to pretreatment scores areparallel for all drugs. You can check this assumption by including the

interaction, Drug*PreTreatment

ANCOVA Example


116/138

The new graphical features of PROC GLM enable you to visualizethe fitted analysis of covariance model.

ods graphics on;proc glm data=DrugTest plot=meanplot(cl);class Drug;model PostTreatment = Drug PreTreatment;lsmeans Drug / pdiff;

run;ods graphics off;

the SAS statements PLOTS=MEANPLOT(CL) option addconfidence limits for the individual LS-means.

If you also specify the PDIFF option in the LSMEANS statement, the

output also includes a plot appropriate for the type of LS-meandifferences computed. In this case, the default is to compare all LS-means with each other pairwise, so the plot is a "diffogram" or"mean-mean scatter plot" (Hsu 1996),

ANCOVA Example


117/138

ANCOVA Example


118/138

Summary of graphs


119/138

The analysis of covariance plot, Fig 1

Shows that the control (drug F) has higherposttreatment scores across the range ofpretreatment scores,

while the fitted models for the two antibiotics (drugs A

and D) nearly coincide. Similarly, while the diffogram, Fig 2 indicates

none of the LS-mean differences are significant,

the difference between the LS-means for the two

antibiotics is much closer to zero than the differencesbetween either one and the control.

Plot 1


120/138

Plot 1

Plot 2


121/138

Plot 2

Example2: with interaction


122/138

Example2: with interaction

This model assumes that the slopes relating posttreatmentscores to pretreatment scores are parallel for all drugs.


123/138

The Type I SS for Drug (293.6) gives the between-drug

sums of squares that are obtained for the analysis-of-variance model PostTreatment=Drug. This measures the difference between arithmetic means of

posttreatment scores for different drugs, disregarding thecovariate.

The Type III SS for Drug (68.5537) gives the Drug sum ofsquares adjusted for the covariate. This measures the differences between Drug LS-means,

controlling for the covariate. The Type I test is highly significant (p=0.001), but the Type

III test is not. This indicates that, while there is astatistically significant difference between the arithmeticdrug means, this difference is reduced to below the level ofbackground noise when you take the pretreatment scoresinto account.

From the table of parameter estimates, you can derive the least-squares predictive formula model for estimating posttreatment scorebased on pretreatment score and drug


124/138

based on pretreatment score and drug.

The above results show the LS-means, which are, in a sense, themeans adjusted for the covariate.

The STDERR option in the LSMEANS statement causes the standarderror of the LS-means and the probability of getting a largertvalueunder the hypothesis: H0: LS-mean = 0 to be included in this table as

well. Specifying the PDIFF option causes all probability values for thehypothesis: H0: LS-mean(i) = LS-mean(j) to be displayed, where theindexes iandjare numbered treatment levels.

SAS applications

R n 1 constant slopes


125/138

Run 1: constant slopesPROCGLM;

CLASS TRT;

MODEL Y=TRT X;

LSMEANS TRT/DIFF;

RUN;

Run 2: separate slopesPROCGLM;CLASS TRT;

MODEL Y=TRT X X*TRT/NOINT SOLUTION;

RUN;

Run 3: separate slopes (inflates TRT sum of squares-

order matters)PROCGLM;

CLASS TRT;

MODEL Y=X TRT X*TRT/NOINT SOLUTION;

RUN;

Consider the previous example

R 4 l


126/138

Run 4: separate slopes

Test for equal slopes: Ho: all i equal ()PROCGLM;

CLASS TRT;

MODEL Y=TRT X X*TRT/NOINT SOLUTION;

CONTRAST 'EQUAL SLOPES' X*TRT 100 -1,

X*TRT 010 -1,

X*TRT001-1;

RUN;

Run 5: equal slopes model:


127/138

Run 5: equal slopes model:

yij= + i+ xij+ ij

PROC GLM;

CLASS TRT;

MODEL Y=TRT X/SOLUTION;

*LSMEANS TRT/DIFF;

*LSMEANS TRT/AT X=0;

ESTIMATE 'INTCPT T=1' INTERCEPT 1 TRT 1 0 0 0;ESTIMATE 'INTCPT T=2' INTERCEPT 1 TRT 0 1 0 0;

ESTIMATE 'INTCPT T=3' INTERCEPT 1 TRT 0 0 1 0;

ESTIMATE 'INTCPT T=C' INTERCEPT 1 TRT 0 0 0 1;

ESTIMATE 'MEAN AT T=1' INTERCEPT 1 TRT 1 0 0 0 X 346.75;

ESTIMATE 'MEAN AT T=2' INTERCEPT 1 TRT 0 1 0 0 X 371.375;

ESTIMATE 'MEAN AT T=3' INTERCEPT 1 TRT 0 0 1 0 X 380.375;ESTIMATE 'MEAN AT T=C' INTERCEPT 1 TRT 0 0 0 1 X 414.125;

RUN;

Run 1: ANOVA without Covariate


128/138

Run 2: ANOVA with Covariate, equal slopes


129/138

Run 3: ANOVA with Covariate, separate slopes


130/138

Note


131/138

The total variation in the response (SST) is equal to

the sum of the: Variation explained by the treatment (SSA), plus the

Variation explained by the covariate, plus the

Variation explained by the interaction between the factorlevels and the covariate (hopefully small), plus the

Variation explained by the error term.

Since the factor levels and the covariate aredependent in non-orthogonal data, fitting thecovariate first inflates the variation explained by the

treatment, potentially producing an invalidpositiveresult.

So put the treatment variable firstin the model.

ANCOVA


132/138

Can incorporate covariate into any model

For example: constant slope model for a two-factor model

Assume constant slope for each (i j)combination

Can include interaction terms to vary slope

Plot yvsxfor each combination

( )ijk i j ij ijk ijk

y xm

Summary


133/138

y

If you have covariates, use them. They willimprove your confidence intervals or identify thatyou have a problem.

Order matters in fitting.

In ANCOVA, fit the treatment variable first.Youre interested in the effect of the treatment,not of the control variable.

If the interaction between the treatment andcontrol variables is significant, stop!

It means the slopes differ significantly, which is a(nasty) problem.

Summary


134/138

Effectiveness of ANCOVA can be measured as

ANCOVA and ANOVA need not necessarily leadto the same conclusion on treatment effect

If X is pre-treatment measure of Y and If the

slope for Y on x regression is known to be one

then we could do ANOVA on Y-Xinstead of

ANCOVA

A N O V A A N C O V A

A N O V A

M S E M SER E

M S E

Summary


135/138

Unequal ns or unbalanced Designs

under the MCAR (Missing data completely atrandom) assumption:

SAS Type III Sum of Squares provides a test of thepartial effects,

all submodels are compared to the overall model

0

ij i ij ij

i i ij

Y x

x

m

Summary


136/138

SAS Type I SS

SAS model statement: (testing the equality ofslopes assumption in ancova)

model y= trt cov trt*cov;

SS(trt | )

SS(cov | , trt)SS(trt*cov | , trt, cov)

For Type I SS, the sum of all effects add up tothe model SS:

SS(trt)+SS(cov)+SS(trt*cov)+SS(error)=SS(total)

SSs are also independent

Summary


137/138

SAS Type II SS



SS(trt | ,cov)

SS(cov | , trt)SS(trt*cov | , trt, cov)

ForType II SS do NOT necessarily add uptomodel SS:

SS(trt)+SS(cov)+SS(trt*cov)+SS(error)SS(total)

SSs are NOT independent

Summary


138/138

SAS Type III: Partial Sum of Squares



SS(trt | ,cov, trt*cov)

SS(cov | , trt, trt*cov)SS(trt*cov | , trt, cov)

For Type III SS do NOT necessarily add uptomodel SS:

SS(trt)+SS(cov)+SS(trt*cov)+SS(error)SS(total)

SS NOT i d d t

Date post:	14-Apr-2018
Category:	Documents
Upload:	hisham-ali
View:	220 times
Download:	0 times

Ou of Syllabus

Documents