Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | aleesha-atkinson |
View: | 233 times |
Download: | 0 times |
Outline
• Two-way ANOVA with unequal numbers of observations in the cells
–Data and model
–Regression approach
–Parameter estimates
• Previous analyses with constant n just special case
Data for two-way ANOVA
• Y is the response variable
• Factor A with levels i = 1 to a
• Factor B with levels j = 1 to b
• Yijk is the kth observation in cell (i,j)
• k = 1 to nij and nij may vary
Recall Bread Example• KNNL p 833• Y is the number of cases of bread sold• A is the height of the shelf display, a=3
levels: bottom, middle, top• B is the width of the shelf display, b=2:
regular, wide• n=2 stores for each of the 3x2
treatment combinations (BALANCED)
Regression Approach
• Create a-1 dummy variables to represent levels of A
• Create b-1 dummy variables to represent levels of B
• Multiply each of the a-1 variables with b-1 variables for B to get variables for AB
LET’S LOOK AT THE RELATIONSHIP AMONG THESE SETS OF VARIABLES
Common Set of Variables
data a2;
set a1;
X1 = (height eq 1) - (height eq 3);
X2 = (height eq 2) - (height eq 3);
X3 = (width eq 1) - (width eq 2);
X13 = X1*X3;
X23 = X2*X3;
j iji ij
j ji i
0)(,0)(
,0,0
Run Proc Reg
proc reg data=a2;
model sales= X1 X2 X3 X13 X23
/ XPX I;
height: test X1, X2;
width: test X3;
interaction: test X13, X23;
run;
X′X MatrixModel Crossproducts X'X X'Y Y'Y
Variable Intercept X1 X2 X3 X13 X23Intercept 12 0 0 0 0 0
X1 0 8 4 0 0 0
X2 0 4 8 0 0 0
X3 0 0 0 12 0 0
X13 0 0 0 0 8 4
X23 0 0 0 0 4 8
Sets of variables orthogonal
Cross-products between sets is 0
Orthogonal X’s
• Order in which the variables are fit in the model does not matter–Type I SS = Type III SS
• Order of fit not mattering is true for all choices of restrictions when nij is constant
• Orthogonality lost when nij are not constant
KNNL Example• KNNL p 954• Y is the change in growth rates for
children after a treatment • A is gender, a=2 levels: male, female• B is bone development, b=3 levels:
severely, moderately, or mildly depressed
• nij=3, 2, 2, 1, 3, 3 children in the groups
Read and check the data
data a3; infile 'c:\...\CH23TA01.txt'; input growth gender bone;proc print data=a1; run;
Obs growth gender bone 1 1.4 1 1 2 2.4 1 1 3 2.2 1 1 4 2.1 1 2 5 1.7 1 2 6 0.7 1 3 7 1.1 1 3 8 2.4 2 1 9 2.5 2 2 10 1.8 2 2 11 2.0 2 2 12 0.5 2 3 13 0.9 2 3 14 1.3 2 3
Common Set of Variables
data a3;
set a3;
X1 = (bone eq 1) - (bone eq 3);
X2 = (bone eq 2) - (bone eq 3);
X3 = (gender eq 1) - (gender eq 2);
X13 = X1*X3;
X23 = X2*X3;
j iji ij
j ji i
0)(,0)(
,0,0
X′X MatrixModel Crossproducts X'X X'Y Y'Y
Variable Intercept X1 X2 X3 X13 X23Intercept 14 -1 0 0 3 0
X1 -1 9 5 3 1 -1
X2 0 5 10 0 -1 -2
X3 0 3 0 14 -1 0
X13 3 1 -1 -1 9 5
X23 0 -1 -2 0 5 10
Cross-product terms no longer 0
Order of fit matters
How does this impact the analysis?
• In regression, this happens all the time (explanatory variables are correlated)
– t tests look at significance of variable when fitted last
• When looking at comparing means order of fit will alter null hypothesis
Prepare the data for a plotdata a1; set a1; if (gender eq 1)*(bone eq 1) then gb='1_Msev '; if (gender eq 1)*(bone eq 2) then gb='2_Mmod '; if (gender eq 1)*(bone eq 3) then gb='3_Mmild'; if (gender eq 2)*(bone eq 1) then gb='4_Fsev '; if (gender eq 2)*(bone eq 2) then gb='5_Fmod '; if (gender eq 2)*(bone eq 3) then gb='6_Fmild';
Plot the data
title1 'Plot of the data';symbol1 v=circle i=none;proc gplot data=a1; plot growth*gb;run;
Plot the means
title1 'Plot of the means';symbol1 v='M' i=join c=blue;symbol2 v='F' i=join c=green;proc gplot data=a2; plot avgrowth*bone=gender;run;
Cell means model
• Yijk = μij + εijk
–where μij is the theoretical mean or expected value of all observations in cell (i,j)
– the εijk are iid N(0, σ2)
–Yijk ~ N(μij, σ2), independent
Estimates
• Estimate μij by the mean of the observations in cell (i,j),
• For each (i,j) combination, we can get an estimate of the variance
• We pool these to get an estimate of σ2
ij.Y
ijn/)Y(Y k ijkij.
k ijij ns )1/()YY( 2ij.ijk
2
Pooled estimate of σ2
• In general we pool the sij2, using
weights proportional to the df, nij -1
• The pooled estimate is
s2 = (Σ (nij-1)sij2) / (Σ(nij-1))
Nothing different in terms of parameter estimates from balanced design
Run proc glm
proc glm data=a1; class gender bone; model growth=gender|bone/solution; means gender*bone;run;
Shorthand way to write main effects and interactions
Parameter Estimates• Solution option on the model statement
gives parameter estimates for the glm parameterization
• These constraints are
–Last level of main effect is zero
– Interaction terms with a or b are zero
• These reproduce the cell means in the usual way
Parameter Estimates
Parameter Estimate Standard
Error t Value Pr > |t|Intercept 0.90000000 B 0.2327373 3.87 0.0048
gender 1 -0.00000000 B 0.3679900 -0.00 1.0000
bone 1 1.50000000 B 0.4654747 3.22 0.0122
bone 2 1.20000000 B 0.3291403 3.65 0.0065
gender*bone 1 1 -0.40000000 B 0.5933661 -0.67 0.5192
gender*bone 1 2 -0.20000000 B 0.5204165 -0.38 0.7108
10.220.190.0ˆ :Example 22
Output
Note DF and SS add as usual
Source DFSum of
SquaresMean
Square F Value Pr > FModel 5 4.4742857 0.89485714 5.51 0.0172Error 8 1.3000000 0.16250000 Corrected Total
13 5.7742857
Output Type I SS
SSG+SSB+SSGB=4.47429
Source DF Type I SS Mean Square F Value Pr > Fgender 1 0.0028571 0.00285714 0.02 0.8978
bone 2 4.3960000 2.19800000 13.53 0.0027
gender*bone 2 0.0754286 0.03771429 0.23 0.7980
Output Type III SS
SSG+SSB+SSGB=4.38514
Source DF Type III SS Mean Square F Value Pr > Fgender 1 0.12000000 0.12000000 0.74 0.4152
bone 2 4.18971429 2.09485714 12.89 0.0031
gender*bone 2 0.07542857 0.03771429 0.23 0.7980
Type I vs Type III
• SS for Type I add up to model SS
• SS for Type III do not necessarily add up
• Type I and Type III are the same for the interaction because last term in model
• The Type I and Type III analysis for the main effects are not necessarily the same
• Different hypotheses are being examined
Type I vs Type III
• Most people prefer the Type III analysis
• This can be misleading if the cell sizes differ greatly
• Contrasts can provide some insight into the differences in hypotheses
Contrast for A*B
• Same for Type I and Type III
• Null hypothesis is that the profiles are parallel; see plot for interpretation
• μ12 - μ11 = μ22 - μ21 and μ13 - μ12 = μ23 - μ22
• μ11 - μ12 - μ21 + μ22 = 0 and μ12 - μ13 - μ22 + μ23 = 0
A*B Contrast statement
contrast 'gender*bone Type I and III' gender*bone 1 -1 0 -1 1 0, gender*bone 0 1 -1 0 -1 1;run;
Type III Contrast for gender
• (1) μ11 = (1)(μ + α1 + β1 + (αβ)11)
• (1) μ12 = (1)(μ + α1 + β2 + (αβ)12)
• (1) μ13 = (1)(μ + α1 + β3 + (αβ)13)
• (-1) μ21 = (-1)(μ + α2 + β1 + (αβ)21)
• (-1) μ22 = (-1)(μ + α2 + β2 + (αβ)22)
• (-1) μ23 = (-1)(μ + α2 + β3 + (αβ)23)
L = 3α1 – 3α2 + (αβ)11 + (αβ)12 + (αβ)13 – (αβ)21
– (αβ)22 – αβ23
Contrast statementGender Type III
contrast 'gender Type III' gender 3 -3 gender*bone 1 1 1 -1 -1 -1;
Type I Contrast for gender
• (3) μ11 = (3)(μ + α1 + β1 + (αβ)11)
• (2) μ12 = (2)(μ + α1 + β2 + (αβ)12)
• (2) μ13 = (2)(μ + α1 + β3 + (αβ)13)
• (-1) μ21 = (-1)(μ + α2 + β1 + (αβ)21)
• (-3) μ22 = (-3)(μ + α2 + β2 + (αβ)22)
• (-3) μ23 = (-3)(μ + α2 + β3 + (αβ)23)
L = (7α1 – 7α2 )+(2β1 – β2 – β3)+3(αβ)11
+2(αβ)12 +2(αβ)13 –1(αβ)21 –3(αβ)22 –3(αβ)23
Contrast statementGender Type I
contrast 'gender Type I' gender 7 -7 bone 2 -1 –1 gender*bone 3 2 2 -1 -3 -3;
Contrast output
Contrast DF Contrast SSgender Type III 1 0.12000000gender Type I 1 0.00285714 bone Type III 2 4.18971429gender*bone Type I and III 2 0.07542857
Summary
• Type I and Type III F tests test different null hypotheses
• Should be aware of the differences
• Most prefer Type III as it follows logic similar to regression analysis
• Be wary, however, if the cell sizes vary dramatically
Comparing Means
• If interested in Type III hypotheses, need to use LSMEANS to do comparisons
• If interested in Type I hypotheses, need to use MEANS to do comparisons.
• We will show this difference via the ESTIMATE statement
SAS Commands
• Will use earlier contrast code to set up the ESTIMATE commands
estimate 'gender Type III' gender 3 -3
gender*bone 1 1 1 -1 -1 -1 / divisor=3;
estimate 'gender Type I' gender 7 -7
bone 2 -1 -1 gender*bone 3 2 2 -1 -3 -3 /
divisor=7;
MEANS OUPUT
Level of ------------growth-----------gender N Mean Std Dev
1 7 1.65714286 0.624118432 7 1.62857143 0.75655862
Diff = 0.0286
Estimate output
Parameter Estimate Std Errgender Type III -0.200 0.2327gender Type I 0.029 0.2155
Notice that these two estimates agree with the difference of estimates for LSMEANS or MEANS
Analytical Strategy
• First examine interaction• Some options when the interaction is
significant– Interpret the plot of means–Run A at each level of B and/or B
at each level of A–Run as a one-way with ab levels–Use contrasts
Analytical Strategy
• Some options when the interaction is not significant
–Use a multiple comparison procedure for the main effects
–Use contrasts for main effects
– If needed, rerun without the interaction
Example continued
proc glm data=a3; class gender bone; model growth=gender bone/ solution; means gender bone/ tukey lines;run;
Pool here because small df error
For Type I hypotheses
Output
Source DFSum of
Squares Mean Square F Value Pr > FModel 3 4.3988571 1.46628571 10.66 0.0019
Error 10 1.3754286 0.13754286
Corrected Total 13 5.7742857
Output Type I SS
Source DF Type I SS Mean Square F Value Pr > Fgender 1 0.00285714 0.00285714 0.02 0.8883
bone 2 4.39600000 2.19800000 15.98 0.0008
Output Type III SS
Source DF Type III SS Mean Square F Value Pr > Fgender 1 0.09257143 0.09257143 0.67 0.4311
bone 2 4.39600000 2.19800000 15.98 0.0008
Although different null hypothesis for gender, both Type I and III tests are not found significant
Tukey Comparisons
• Why don’t we need a Tukey adjustment for gender?
• Means statement does provide mean estimates so you know directionality of F test but that is all the statement provides you