1/87 Group 5 AMS 572 Professor: Wei Zhu. Foram Sanghvi :Brief review of ANOVA Shihui Xiang:...

transcript

Group 5AMS 572Professor: Wei Zhu

Foram Sanghvi :Brief review of ANOVAShihui Xiang: Introduction to Repeated Measures DesignQianzhu Wu: One-way repeated measures ANOVAYue Tang: Using the repeated statement of proc anovaYan Xu: Two-Factor ANOVA with repeated Measures on One FactorWeina Gao: Two-Factor experience with Repeated Measure on both factorsYi Hu: Three-Factor experiments with a repeated measure on the last factorXiaoke Fei: Three-Factor experiments with repeated measure on two factors Yuzhou Song: Mixed Model

2 ／ 87

Foram Sanghvi

3 ／ 87

The One-way ANOVA can test the equality of several population means.It is an extension of the pooled variance t-testThat is:

H0 (null hypothesis) : µ1 = µ2 = µ3 =…….. = µn

Ha (alternative hypothesis): At least one of means differs from the rest. Assumptions:

Equal population variancesNormal populationIndependent samples 4 ／ 87

Conclusion: Reject H0 if Fo>Fa-1,N-a

5 ／ 87

～ Fa-1,N-a

MSA =Variance of group mean

MSE =Mean of within group variance

Total sample size N=

Sample mean:

Grand mean:

Yij =observed response from experimental unit i when receiving effect j

~N(µi ,σ2 )

6 ／ 87

The most distinct disadvantage to the analysis of variance (ANOVA) method is that it requires two assumptions to be made:

All population means from each data group must be (roughly) equal.

All variances from each data group must be (roughly) equal.

The normal subject-to-subject variation may strongly affect the error sum of squares.

The most distinct disadvantage to the analysis of variance (ANOVA) method is that it requires two assumptions to be made: 1. All population means from each data group must be (roughly) equal. 2. All variances from each data group must be (roughly) equal. Obviously, we rarely have this luxury in real-world applications.

7 ／ 87

-- A repeated measures design is one in which at least one of the factors consists of repeated measurements on the same subjects or experimental units, under different conditions.

A repeated measures design involves measuring subjects at different points in time (typically after different treatments)It can be viewed as an extension of the paired-samples t-test (which involved only two related measures)Thus, the measures—unlike in “regular” ANOVA—are correlated, i.e., the observations are not independent

• Data collected in a sequence of evenly spaced points in time

• Treatments are assigned to experimental units

By collecting data from the same participants under repeated conditions the individual differences can be eliminated or reduced as a source of between group differences.Also, the sample size is not divided between conditions or groups and thus inferential testing becomes more powerful.This design also proves to be economical when sample members are difficult to recruit because each member is measured under all conditions.

As with any ANOVA, repeated measures ANOVA tests the equality of means. However, repeated measures ANOVA is used when all members of a random sample are measured under a number of different conditions.

• As the sample is exposed to each condition, the measurement of the dependent variable is repeated.

• Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: the data violate the ANOVA assumption of independence.

• The simplest example of a repeated measures design is a paired t-test. Each subject is measured twice (time 1 and time 2) on the same variable or each pair of matched participants are assigned to one of two treatment levels.

• If we observe participants at more than two time-points, then we need to conduct a repeated measures ANOVA.

What we would like to do is to decompose the variability into ：

(1) A random effect (2) A fixed effect

The effect of participants is always a random effect. We will only consider situations where the factor is a fixed effect

Yij = μj +Si+εij μj = The fixed effect. Si= The random effect of subject i.

εij = The random error independent of Si

Assumptions of a repeated measures design

For a repeated measures design, we start with the same assumptions as a paired t-test :Participants are independent and randomly selected from the populationNormality (actually symmetry).Due to having more than two measurements on each participant, we have an additional assumption on the variances.

The assumptions we have to check for a repeated measures design are:

1.Participants are independent and randomly selected from the population

2.Normality (actually symmetry)

3. Compound symmetry23/87

Consider the following experiment: We have four drugs (1,2,3 and 4)

that relieve pain. Each subject is given each of the four drugs. The subject’s pain tolerance is then measured. Enough time is allowed to pass between successive drug administrations so that we can be sure there’s no residual effect from the previous drug.

The null hypothesis is:

Mean(1)=Mean(2)=Mean(3)=Mean(4)

In the one-way analysis of variance without a repeated measure, we would have each subject receive only one of the four drugs. In this design, each subjects is measured under each of the drug conditions. This has several important advantages.

26／ 87

Each subject acts as his own control. i.e. : drugs effects are calculated by recording deviations between each drug score and the average drug score for each subject. The normal subject-to-subject variation can thus be removed from the error sum of squares.

DATA PAIN; INPUT SUBJ DRUG PAIN;DATALINES;1 1 51 2 91 3 61 4 112 1 72 2 12……;

SAS code without using repeated statement

PROC ANOVA DATA=PAIN; TITLE ‘without repeated statement'; CLASS SUBJ DRUG; MODEL PAIN=SUBJ DRUG; MEANS DRUG/DUNCAN;RUN;

DATA PAIN; INPUT SUBJ @; DO DRUG = 1 to 4; INPUT PAIN @; OUTPUT; END;DATALINES;1 5 9 6 112 7 12 8 93 11 12 10 144 3 8 5 8;

reconstructreconstruct

DATA PAIN; INPUT SUBJ @;

DATALINES;

DO DRUG = 1 to 4; INPUT PAIN @; OUTPUT;END;

iterative loopiterative loop

To keep reading from the same line of dataTo keep reading from the same line of data

1 5 9 6 112 7 12 8 93 11 12 10 144 3 8 5 8;

a lot easier!a lot easier!

Remark 1: about the DO statement

the general form:

Do variable = start TO end BY increment;

(SAS Statements)

initial vinitial valuealue

ending ending valuevalue

Default: 1Default: 1

Remark 1: about the DO statement

in our example:

initial vainitial value: 1lue: 1

ending vending value: 4alue: 4

DO DRUG = 1 to 4; INPUT PAIN @; OUTPUT;END;

to keep reading from the same line of datato keep reading from the same line of data

return to “DO”return to “DO”

Remark 2: about the ANOVA procedure

PROC ANOVA DATA=PAIN; TITLE ‘without repeated statement'; CLASS SUBJ DRUG; MODEL PAIN=SBJ DRUG; MEANS DRUG/DUNCAN;RUN; No “|” : they are No “|” : they are

each main effects each main effects and no interaction and no interaction terms between theterms between the

SAS code using the REPEATED Statement

DATA REPEAT; INPUT PAIN1-PAIN4;DATALINES;5 9 6 117 12 8 911 12 10 143 8 5 8;PROC ANOVA DATA=REPEAT; TITLE 'using repeated statement'; MODEL PAIN1-PAIN4 = / NOUNI; REPEATED DRUG 4 (1 2 3 4);RUN;

Remark 1 : about the data set

We need the data set in the form:

SUBJ PAIN1 PAIN2 PAIN3 PAIN4

NOTICE that it does not haNOTICE that it does not have a DRUG variableve a DRUG variable

Remark 2 : about the REPEATED Statement

The general form:

REPEATED factor_name CONTRAST(n);

To compute pairwise cTo compute pairwise comparisonsomparisons

•N is a number from 1 to k, with k being # levels of repeated factor;•To get all pairwise contrasts, we need k-1 repeated statements

Remark 2 : about the REPEATED Statement

In our example:

PROC ANOVA DATA=REPEAT; TITLE 'using repeated statement'; MODEL PAIN1-PAIN4 = / NOUNI; REPEATED DRUG 4 CONTRAST(1) / SUMMARY; REPEATED DRUG 4 CONTRAST(2) / SUMMARY; REPEATED DRUG 4 CONTRAST(3) / SUMMARY;RUN;

Request ANOVA tables Request ANOVA tables for each contrastfor each contrast

Remark 3 : more explanation of the ANOVA procedure

PROC ANOVA DATA=REPEAT; TITLE 'using repeated statement'; MODEL PAIN1-PAIN4 = / NOUNI; REPEATED DRUG 4 (1 2 3 4);RUN;

•No CLASS: our data set does not have an independent variable

•NOUNI: not to conduct a separate analysis for each of the four PAIN

•4: the repeated factor “DRUG” has four levels; optional

•(1 2 3 4): the labels we want printed for each level of DRUG

Remark 4 : comparison of the DATA steps

DATA PAIN; INPUT SUBJ DRUG PAIN;DATALINES;1 1 51 2 91 3 61 4 112 1 72 2 12……;

DATA PAIN; INPUT SUBJ @; DO DRUG = 1 to 4; INPUT PAIN @; OUTPUT; END;DATALINES;1 5 9 6 112 7 12 8 93 11 12 10 144 3 8 5 8;

DATA REPEAT; INPUT PAIN1-PAIN4;DATALINES;5 9 6 117 12 8 911 12 10 143 8 5 8;

1 80 83

2 85 86

3 83 88

4 82 94

5 87 93

6 84 98

Subject

Control

Treatment

PRE POST

Factor B: TIMEFactor A: GROUP

Repeated Repeated

Total Variancedf=N-1

Between subjects

Within subjects

Treatmentdf=a-1

Error due to subjects within

treatmentdf=a(n-1)

Timedf=b-1

Treatment × timedf =(a-1)×(b-1)

Error or residualdf =a×(n-1)×(b-1)

a: # of treatment groupsb: # of time pointsn: # of subjects per treatmentN=a×b×n: total # of measurements

Source d.f. SS MS

Factor A a-1 SSA MSA = SSA/(a-1)

Factor B b-1 SSB MSB = SSB/(b-1)

AB interaction (a-1)(b-1) SSAB MSAB = SSAB/(a-1)(b-1)

Subjects (within A) a(n-1) SSWA MSWA = SSWA/a(n-1)

Error a(n-1)(b-1) SSE MSE = SSE/a(n-1)(b-1)

Total nab-1 SST

Data prepost;Input subj group $ pretest postest;datalines;1 c 80 832 c 85 863 c 83 884 t 82 945 t 87 936 t 84 98;run;

proc anova data=prepost;title 'Two-way ANOVA with a Repeated Measure on One Factor';class group;model pretest postest = group/nouni;repeated time 2 (0 1);means group;run;

Statistic Value F Value Num DF Den DF Pr > F

Wilks' Lambda 0.13216314 26.27 1 4 0.0069 Pillai's Trace 0.86783686 26.27 1 4 0.0069 Hotelling-Lawley Trace 6.56640625 26.27 1 4 0.0069 Roy's Greatest Root 6.56640625 26.27 1 4 0.0069

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time*group Effect

Statistic Value F Value Num DF Den DF Pr > F

Wilks' Lambda 0.32611465 8.27 1 4 0.0452 Pillai's Trace 0.67388535 8.27 1 4 0.0452 Hotelling-Lawley Trace 2.06640625 8.27 1 4 0.0452 Roy's Greatest Root 2.06640625 8.27 1 4 0.0452

Tests of Hypotheses for Between Subjects Effects

Source DF Anova SS Mean Square F Value Pr > F

group 1 90.75000000 90.75000000 11.84 0.0263 Error 4 30.66666667 7.66666667

Univariate Tests of Hypotheses for Within Subject Effects

Source DF Anova SS Mean Square F Value Pr > F

time 1 140.0833333 140.0833333 26.27 0.0069 time*group 1 44.0833333 44.0833333 8.27 0.0452 Error(time) 4 21.3333333 5.3333333

Level of -----------pretest----------- -----------postest----------- group N Mean Std Dev Mean Std Dev

c 3 82.6666667 2.51661148 85.6666667 2.51661148 t 3 84.3333333 2.51661148 95.0000000 2.64575131

Two-factor ANOVAThe subject are taken under the levers of both factorsSubject

B1 B2 … Bb

1 A1 Y111 Y112 … Y11B

… … … … … …

I Aa YI11 YI12 YI1B

… … … … … …

1 Aa Y1a1 Y1a2 … Y1aB

… … … … … …

I Aa YIa1 YIa2 … YIaB

… … … … … …

1 AA Y1A1 Y1A2 … Y1AB

… … … … … …

I AA YIA1 YIA2 … YIAB

A and B denote the two factors and Yiab denote the measurement taken from ith subject when the level of factor A takes on the value a and that of B takes on the value b.

Two Factors Model

• All the groups have equal variances

iabiab a b ab i ia ibY e

Random effects due to subjects

Fixed effects

of factors

~ (0, ), ~ (0, ), ~ (0, )

~ (0, ),

i i a b

The fixed model estimated as followed:

RM Anova Table:Source DF SS MS F-

ValueFactor A A—1 Sa

Factor B B—1 Sb

Subjects I—1 Si

A*Subjects (A—1)(I—1) Sai

B*Subjects (A—1)(I—1) Sbi

A*B (A—1)(B—1) Sab

Error dABI Se

Total ABI—1 St

( 1)( 1)

Example:A group of subjects is treated in the morning and afternoon of two different days. On one of the days, the subjects receive a strong sleeping aid the night before the experiment is to be conducted; on the other, a placebo.

treatcontrol drug

subject reaction subject reaction

TimeA.m.

1 65 1 70

2 72 2 78

3 90 3 97

P.M 1 55 1 60

2 64 2 68

3 80 3 85

data repeat;

input react1-react4;

datalines;

65 70 55 60

72 78 64 68

90 97 80 85

SAS Code

proc anova data=repeat;

model react1-react4= /nouni;

repeated time 2, treat 2

A portion of output from SAS

Interpretation• According to the observed p-values, except

the interactions, we can reject that time and

treat are not significantly different.

•The drug increase reaction time

• Reaction time is no longer in the morning

compared to the afternoon

• The interaction of treat and time is not

significant

Consider a marketing experiment:

•Male and female subjects are offered one of three different brands of coffee.

•Each brand is tasted twice; once after breakfast, the other time after dinner.

•The preference of each brand is measured on a scale from 1 to 10(1=lowest, 10=highest).

The experimental design is shown below:

Three-Factor Experimentwith a Repeated Measure on the last factorMeal: Repeated Measure Factor

SAS Program:

OUTPUT(Part 1/4):

OUTPUT(Part 2/4):

65/8765/81

OUTPUT(Part 3/4):

OUTPUT(Part 4/4):

A group of high- and low-SES children is selected for the experiment. Their reading comprehension is tested each spring and fall for three consecutive years. A Diagram of the design is shown here:

Notice that each subject is measured each spring and fall of each year so that the variables SEASON and YEAR are both repeated measures factors.

To analyze this experiment, we will use the SAS program: the REPEATED statement of PROC ANOVA:

DATA READ INPUT SUBJ SES $ READ1-READ6; LABEL READ1 = 'SPRING YR 1’ READ2 = 'FALL YR 1’ READ3 = 'SPRING YR 2’ READ4 = 'FALL YR 2’ READ5 = 'SPRING YR 3’ READ6 = 'FALL YR 3';

DATALINES;

1 HIGH 61 50 60 55 59 622 HIGH 64 55 62 57 63 633 HIGH 59 49 58 52 60 584 HIGH 63 59 65 64 67 705 HIGH 62 51 61 56 60 636 LOW 57 42 56 46 54 507 LOW 61 47 58 48 59 558 LOW 55 40 55 46 57 529 LOW 59 44 61 50 63 6010 LOW 58 44 56 49 55 49;PROC ANOVA DATA=READ; TITLE "READING COMPREHENSION ANALYSIS"; CLASS SES; MODEL READ1-READ6 = SES / NOUNI; REPEATED YEAR 3, SEASON 2; MEAN SES;RUN;

Since the REPEATED statement is confusing when we have more than one repeated factor, it is important for you to know how to determine the order of the factor names. Look at the REPEATED statement in this example:

REPEATED YEAR 3, SEASON 2;

This statement instructs the ANOVA procedure to choose the first level of YEAR(1), then loop through two levels of SEASON(SPRING FALL), then return to the next level of YEAR(2), followed by two levels of SEASON, etc.

READING COMPREHENSION ANALYSIS The ANOVA Procedure Class Level Information Class Levels Values SES 2 HIGH LOW Number of Observations Read 10 Number of Observations Used 10

The ANOVA Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Anova SS Mean Square F Value Pr > F SES 1 680.0666667 680.0666667 13.54 0.0062 Error 8 401.6666667 50.2083333 The ANOVA Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Adj Pr > F Source DF Anova SS Mean Square F Value Pr > F G - G H-F-L YEAR 2 252.0333333 126.0166667 26.91 <.0001 0.0002 <.0001

YEAR*SES 2 1.0333333 0.5166667 0.11 0.8962 0.8186 0.8450 Error(YEAR) 16 74.9333333 4.6833333 Greenhouse-Geisser Epsilon 0.6757 Huynh-Feldt-Lecoutre Epsilon 0.7642 Source DF Anova SS Mean Square F Value Pr > F SEASON 1 680.0666667 680.0666667 224.82 <.0001 SEASON*SES 1 112.0666667 112.0666667 37.05 0.0003 Error(SEASON) 8 24.2000000 3.0250000

Adj Pr > F Source DF Anova SS Mean Square F Value Pr > F G - G H-F-L

YEAR*SEASON 2 265.4333333 132.7166667 112.95 <.0001 <.0001 <.0001 YEAR*SEASON*SES 2 0.4333333 0.2166667 0.18 0.8333 0.7592 0.7905 Error(YEAR*SEASON) 16 18.8000000 1.1750000

Greenhouse-Geisser Epsilon 0.7073 Huynh-Feldt-Lecoutre Epsilon 0.8147

High-SES student have higher reading comprehension scores than low-SES students (F=13.54, p=0.0062).

Reading comprehension increases with each year (F=26.91, p=0.0001).

Students had higher reading comprehension scores in the spring compared to the following fall (F=224.82, p=0.0001)

The "slippage" was greater for the low-SES students (there was a significant SES*SEASON interaction [F=37.05, p=0.0003}).

"Slippage" decreases as the students get older (YEAR*SEASON is significant [F=112.95, p=0.0001]).

Mixed Model: When we have design in which we have both random and fixed variables, we have what is often called a mixed model.

What is Mixed Model?

Do not have to assume sphericity in the model.

Do not have to assume compound symmetry in the model.

We can use “Proc Mixed” statement to deal with the Mixed model.

Example:

Treat this case as a standard repeated measure anova. We can get the following result:

Treat it as Mixed modelSAS program:

The result of using mixed model:

comparing results of the two methods, it is obvious that Mixed model has following advantages:

The degree of freedom is bigger..

The interaction is significant..

1/87 Group 5 AMS 572 Professor: Wei Zhu. Foram Sanghvi :Brief review of ANOVA Shihui Xiang:...

Documents