+ All Categories
Home > Documents > Outline Analysis of Variance Comparison of serveral...

Outline Analysis of Variance Comparison of serveral...

Date post: 12-Apr-2018
Category:
Upload: lykhue
View: 219 times
Download: 3 times
Share this document with a friend
17
Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Marc Andersen, [email protected] Analysis of variance and regression for health researchers, November 25, 2010 1 / 68 Outline Comparison of serveral groups Model checking Two-way ANOVA Interaction Advanced designs 2 / 68 Acknowledgements written by Lene Theil Skovgaard 1 2006, 2007 updated by Julie Lyng Forman 1 2008, November 2009 updated by Marc Andersen 2 April 2009, April 2010, November 2010 1 Dept. of Biostatistics 2 StatGroup 3 / 68 Comparison of 2 or more groups number different same of groups individuals individual 2 unpaired paired t-test t-test 2 oneway twoway analysis of variance analysis of variance 4 / 68
Transcript
Page 1: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Analysis of VarianceAnalysis of variance and regression coursehttp://staff.pubhealth.ku.dk/~lts/regression10_2/index.html

Marc Andersen, [email protected]

Analysis of variance and regression for health researchers,November 25, 2010

1 / 68

Outline

Comparison of serveral groups

Model checking

Two-way ANOVA

Interaction

Advanced designs

2 / 68

Acknowledgements

written by Lene Theil Skovgaard1

2006, 2007updated by Julie Lyng Forman1

2008, November 2009updated by Marc Andersen2

April 2009, April 2010, November 2010

1Dept. of Biostatistics2StatGroup

3 / 68

Comparison of 2 or more groups

number different sameof groups individuals individual

2 unpaired pairedt-test t-test

≥2 oneway two wayanalysis of variance analysis of variance

4 / 68

Page 2: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

One-way analysis of variance

◮ Do the distributions differ between the groups?◮ Do the levels differ between the groups?

5 / 68

Example: ventilation during anaesthesia

Data: 22 bypass-patients randomised to 3 different kinds ofventilation during anaesthesiaOutcome: measurement of red cell folate

Group I 50% N2O, 50% O2 for 24 hoursGroup II 50% N2O, 50% O2 during operationGroup III 30–50% O2 (no N2O) for 24 hours

Gr.I Gr.II Gr.IIIn 8 9 5Mean 316.6 256.4 278.0SD 58.7 37.1 33.8

6 / 68

Example: ventilation during anaesthesia

Red

cel

l fol

ate

200

250

300

350

400

GroupI II III

7 / 68

One-way ANOVA

One-waybecause we only have one critera for classification of theobservations, here ventilation method

ANalysis Of VAriancebecause we comparethe variance between groupswith the variance within groups

8 / 68

Page 3: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

The one-way ANOVA model

NotationThe j’th observation from group i is described by:

Yij = µi + εij

j’th observation mean individualin group no. i group i deviation

i.e. as consisting of mean of the group plus an individualdeviation , with εij ∼ N(0, σ2) or equivalently Yij ∼ N(µi , σ

2).

AssumptionsObservations are assumed be independent and to follow anormal distribution with mean µi withing group i with the samevariance.

Model assumptions should be investigated!

9 / 68

Hypothesis testing

Investigate difference between groups

◮ Null hypothesis: group means are equal, H0 : µi = µ

◮ Alternative hypothesis: group means are not equal◮ We conclude that the means are not equal when we reject

the null hypothesis of equality (ref DGA, 8.5 HypothesisTesting)

10 / 68

ANOVA math: Sums of squares

Decomposition of ’deviation from grand mean’

yij − y· = (yij − yi) + (yi − y·)

Decomposition of variation (sums of squares)∑i ,j

(yij − y·)2

︸ ︷︷ ︸total variation

=∑i ,j

(yij − yi)2

︸ ︷︷ ︸within groups

+∑i ,j

(yi − y·)2

︸ ︷︷ ︸between groups

yij j ’th observation in i ’th groupyi average in i ’th groupy. overall average, or ’grand mean’

11 / 68

Decomposition of variation

total = between + within

SStotal = SSbetween + SSwithin

(n − 1) = (k − 1) + (n − k)

F-test statistic

F =MSbetween

MSwithin=

SSbetween/(k − 1)

SSwithin/(N − k)

Hypothesis testReject the null hypothesis if F is large, i.e. if the variationbetween groups is too large compared to the variation withingroups.

12 / 68

Page 4: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Analysis of variance table

ANOVA tableVariation df SS MS F PBetween k − 1 SSb SSb/dfb MSb/MSw P(F (dfb, dfw) > Fobs)Within n − k SSw SSw/dfw

Total n − 1 SStot

F test statisticsThe F test statistics follows and F-distribution with dfb and dfwdegrees of freedom: Fobs ∼ F (dfb, dfw).

13 / 68

Analysis of variance table - Anaestesia example

ANOVA table

df SS MS F PBetween 2 15515.77 7757.9 3.71 0.04Within 19 39716.09 2090.3Total 21 55231.86

F test statistics

F = 3.71 ∼ F (2, 19) ⇒ P = 0.04

InterpretationWeak evidence of non-equality of the three means

14 / 68

Analysis of variance in SAS

To define the anaestesia data in SAS, we write

data ex_redcell;input grp redcell;cards;1 2431 2511 275. .. .. .3 2933 328;run;

The variable redcell contains all the measurements of theoutcome and grp contains the method of ventilation for eachindividual.

15 / 68

Analysis of variance program

proc glm data=ex_redcell;class grp;model redcell=grp / solution;run;

General Linear Models ProcedureDependent Variable: REDCELL

Sum of MeanSource DF Squares Square F Value Pr > F

Model 2 15515.7664 7757.8832 3.71 0.0436Error 19 39716.0972 2090.3209Corrected Total 21 55231.8636

R-Square C.V. Root MSE REDCELL Mean0.280921 16.14252 45.7200 283.227

Source DF Type I SS Mean Square F Value Pr > FGRP 2 15515.7664 7757.8832 3.71 0.0436

Source DF Type III SS Mean Square F Value Pr > FGRP 2 15515.7664 7757.8832 3.71 0.0436

16 / 68

Page 5: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Parameter estimates

The option solution outputs parameter estimates

T for H0: Pr > |T| Std Error ofParameter Estimate Parameter=0 Estimate

INTERCEPT 278.0000000 B 13.60 0.0001 20.44661784GRP 1 38.6250000 B 1.48 0.1548 26.06442584

2 -21.5555556 B -0.85 0.4085 25.501412903 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular and a generalizedinverse was used to solve the normal equations. Estimates followedby the letter ’B’ are biased, and are not unique estimators of theparameters.

◮ Group 3 (the last group) is the reference group◮ The estimates for the other groups refer to differences to

this reference group

17 / 68

PROC glm box plot

18 / 68

Interpreting the estimates

◮ What is the scientific question◮ Clinical significance◮ Statistical significance◮ Provide confidence interval◮ Does it make sense?

19 / 68

Multiple comparisons

The F -test show, that there is a difference — but where?

Pairwise t-tests are not suitable due to risk of masssignificance

A significance level of α = 0.05 means 5% chance of wrongfullyrejecting a true hypothesis (type I error)

The chance of at least one type I error goes up with the numberof tests.

(for k groups, we have m = k(k − 1)/2 possible tests, the actual significance level can

be as bad as: 1 − (1 − α)m , e.g. for k=5: 0.40)

20 / 68

Page 6: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Adressing multiplicity

There is no completely satisfactory solution.

Approximative solutions

1. Select a (small) number of relevant comparisons in theplanning stage.

2. Make a graph of the average ±2× SEM and judge visually(!), perhaps supplemented with F -tests on subsets ofgroups.

3. Modify the t-tests by multiplying the P-values with thenumber of tests, the socalled Bonferroni correction(conservative)

4. Use a correction for multiple testing (Dunnett, Tukey) or a(prespecified) multiple testing procedure

21 / 68

Tukey: multiple comparisons in SAS

proc glm data=ex_redcell;class grp;model redcell=grp /

solution;lsmeans grp /

adjust=tukey pdiff cl;run;

The GLM ProcedureLeast Squares MeansAdjustment for Multiple Comparisons: Tukey-Kramer

Least Squares Means for effect grpPr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: redcell

i/j 1 2 3

1 0.0355 0.32152 0.0355 0.68023 0.3215 0.6802

Least Squares Means for Effect grp

Difference Simultaneous 95%Between Confidence Limits for

i j Means LSMean(i)-LSMean(j)

1 2 60.180556 3.742064 116.6190471 3 38.625000 -27.590379 104.8403792 3 -21.555556 -86.340628 43.229517

22 / 68

Visual assessment (1/3)

The bars represent 95 % confidence intervals for the meansusing the standard deviation for each group (std2mjt insymbol1 statement).

proc gplot data=ex_redcell;plot redcell*grp

/ haxis=axis1 vaxis=axis2 frame;axis1 order=(1 to 3 by 1)

offset=(8,8)label=(H=3)value=(H=2) minor=NONE;

axis2offset=(1,1) value=(H=2) minor=NONElabel=(A=90 R=0 H=3);

symbol1 v=circle i=std2mjt l=1 h=2 w=2;run;

Red

cel

l fol

ate

200220240260280300320340360380400

GroupI II III

23 / 68

Visual assessment (2/3)

The bars represent 95 % confidence intervals for the meansusing the pooled standard deviation for each group (std2mpjtin symbol1 statement).

proc gplot data=ex_redcell;plot redcell*grp

/ haxis=axis1 vaxis=axis2 frame;axis1 order=(1 to 3 by 1)

offset=(8,8)label=(H=3)value=(H=2) minor=NONE;

axis2offset=(1,1) value=(H=2) minor=NONElabel=(A=90 R=0 H=3);

symbol1 v=circle i=std2mpjt l=1 h=2 w=2;run;

Red

cel

l fol

ate

200220240260280300320340360380400

GroupI II III

24 / 68

Page 7: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Visual assessment (3/3)The bars represent 95 % confidence intervals for the meansusing the pooled standard deviation for each group obtainedfrom PROC glm.

25 / 68

Model checking

Check if the assumptions are reasonable: (If not theanalysis is unreliable!)

◮ Variance homogeneity may be checked by performingLevenes test (or Bartletts test).

◮ In case of variance inhomogeneity, we may also perform aweighted analysis (Welch’s test ), just as in the T-test

◮ Normality may be checked through probability plots (orhistograms) of residuals, or by a numerical test on theresiduals.

◮ In case of non-normality, we may use the nonparametricKruskal-Wallis test

Transformation (often logarithms) may help to achievevariance homogeneity as well as normality

26 / 68

Check of variance homogeneity and normality in SAS

proc glm data=ex_redcell;class grp;model redcell=grp;means grp / hovtest=levene welch;output out=model p=predicted r=residual;

run;

Store residuals in a dataset for further model checking

proc univariate data=model normal ;var residual;histogram residual/ normal(mu=0);ppplot residual / normal(mu=0) square;

run;

27 / 68

Output from proc glm: Test for variance homogeneity

Levene’s Test for Homogeneity of redcell VarianceANOVA of Squared Deviations from Group Means

Sum of MeanSource DF Squares Square F Value Pr > F

grp 2 18765720 9382860 4.14 0.0321Error 19 43019786 2264199

Weighted anova in case of variance heterogeneity:

Welch’s ANOVA for redcell

Source DF F Value Pr > F

grp 2.0000 2.97 0.0928Error 11.0646

So we are not too sure concerning the group differences.....

28 / 68

Page 8: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Test for normality

Output from proc univariate

Tests for NormalityTest --Statistic--- -----p Value----Shapiro-Wilk W 0.965996 Pr < W 0.6188Kolmogorov-Smirnov D 0.107925 Pr > D >0.1500Cramer-von Mises W-Sq 0.043461 Pr > W-Sq >0.2500Anderson-Darling A-Sq 0.263301 Pr > A-Sq >0.2500

The 4 tests focus on different aspects of non-normality.

◮ For small data sets, we rarely get significance◮ For large data sets, we almost always get significance◮ Could look at a probability plot instead

29 / 68

Output from proc univariate: Histogram and probabilityplot

30 / 68

Non-parametric ANOVA, the Kruskal-Wallis test

SAS code

proc npar1way wilcoxon;exact;class grp;var redcell;run;

Wilcoxon Scores (Rank Sums) for Variable redcellClassified by Variable grp

Sum of Expected Std Dev Meangrp N Scores Under H0 Under H0 Score-------------------------------------------------------------------1 8 120.0 92.00 14.651507 15.0000002 9 77.0 103.50 14.974979 8.5555563 5 56.0 57.50 12.763881 11.200000

Kruskal-Wallis TestChi-Square 4.1852DF 2Asymptotic Pr > Chi-Square 0.1234Exact Pr >= Chi-Square 0.1233

Again, we have ’lost’ the significance....

31 / 68

Two-way analysis of variance

Two criterias for subdividing observations, A og B

Data in two-way layout:

BA 1 2 · · · c1 · · ·2 · · ·...

......

...r · · ·

◮ Effect of both factors◮ Perhaps even

interaction (effectmodification)

One factor may be ’individuals’or “experimental units” (e.g. dif-ferent treatments tried on sameperson)

32 / 68

Page 9: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Repeated measurements

Example: Short term effect of enalaprilate on heart rate

TimeSubject 0 30 60 120 average1 96 92 86 92 91.502 110 106 108 114 109.503 89 86 85 83 85.754 95 78 78 83 83.505 128 124 118 118 122.006 100 98 100 94 98.007 72 68 67 71 69.508 79 75 74 74 75.509 100 106 104 102 103.00average 96.56 92.56 91.11 92.33 93.14

33 / 68

Line plot (“Spaghettiogram”)

Ideally the time courses are parallel.

34 / 68

The additive model

The two effects (s and t) work in an additive way.

Yst = µ + αs + βt + εst

The εst ’s are assumed to be independent, normally distributedwith mean 0, and identical variances, εst ∼ N(0, σ2).(This assumption should be investigated!)

Variational decomposition:

SStotal = SSsubject + SStime + SSresidual

35 / 68

Analysis of variance table - enalaprilate example

df SS MS F PSubjects 8 8966.6 1120.8 90.64 <0.0001Times 3 151.0 50.3 4.07 0.0180Residual 24 296.8 12.4Total 35 9414.3

◮ Highly significant difference between subjects (not veryinteresting)

◮ Significant time differences.

36 / 68

Page 10: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Two-way ANOVA in SAS

proc glm data=ex_pulse;class subject times;

model hrate=subject times / solution;run;

General Linear Models ProcedureClass Level Information

Class Levels Values

SUBJECT 9 1 2 3 4 5 6 7 8 9TIMES 4 0 30 60 120

Number of observations in data set = 36

37 / 68

Two-way ANOVA output

General Linear Models Procedure

Dependent Variable: HRATESum of Mean

Source DF Squares Square F Value Pr > F

Model 11 9117.52778 828.86616 67.03 0.0001Error 24 296.77778 12.36574Corrected Total 35 9414.30556

R-Square C.V. Root MSE HRATE Mean

0.968476 3.775539 3.51650 93.1389

Source DF Type I SS Mean Square F Value Pr > F

SUBJECT 8 8966.55556 1120.81944 90.64 0.0001TIMES 3 150.97222 50.32407 4.07 0.0180

Source DF Type III SS Mean Square F Value Pr > F

SUBJECT 8 8966.55556 1120.81944 90.64 0.0001TIMES 3 150.97222 50.32407 4.07 0.0180

38 / 68

Parameter estimates

T for H0: Pr > |T| Std Error ofParameter Estimate Parameter=0 Estimate

INTERCEPT 102.1944444 B 50.34 0.0001 2.03024963SUBJECT 1 -11.5000000 B -4.62 0.0001 2.48653783

2 6.5000000 B 2.61 0.0152 2.486537833 -17.2500000 B -6.94 0.0001 2.486537834 -19.5000000 B -7.84 0.0001 2.486537835 19.0000000 B 7.64 0.0001 2.486537836 -5.0000000 B -2.01 0.0557 2.486537837 -33.5000000 B -13.47 0.0001 2.486537838 -27.5000000 B -11.06 0.0001 2.486537839 0.0000000 B . . .

TIMES 0 4.2222222 B 2.55 0.0177 1.6576918930 0.2222222 B 0.13 0.8945 1.6576918960 -1.2222222 B -0.74 0.4681 1.65769189120 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular and a generalizedinverse was used to solve the normal equations. Estimates followedby the letter ’B’ are biased, and are not unique estimators of theparameters.

◮ subject 9 at time 120 minutes is the reference

39 / 68

Expected values and residuals

Expected values for subject=3, times=30

yst = µ + αs + βt

= 102.19− 17.25 + 0.22

= 85.16

Residuals

rst = observed − expected

= yst − yst ≈ εst

Residual for subject 3, time 30: r32 = 86− 85.16 = 0.84

40 / 68

Page 11: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Model checking

Look for:

◮ differences in variances (systematic?)◮ Non-normality◮ Lack of additivity (interaction).

Can only be tested if there is more than one observationfor each combination

◮ Serial correlation?(Neighboring observations look more alike)

41 / 68

Residual based diagnostics

Use the residuals for model checking

◮ Probability plot of residuals.◮ Plot residuals vs expected values.◮ Plot residuals vs group.◮ Look for outliers (a large residual means observed and

expected values deviate a lot).

42 / 68

Enalaprilate example

No systematic patterns should be present.

43 / 68

Interaction

Example of two criterias for subdividing individuals:sex and smoking habits

Outcome: FEV1

Here, we see an interaction between sex and smoking.

44 / 68

Page 12: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Possible explanations for interaction

◮ Biologically different effects of smoking on males andfemales

◮ Perhaps the women do not smoke as much as the men◮ Perhaps the effect is relative

(to be expressed in %)

45 / 68

Example: The effect of smoking on birth weight

46 / 68

Example: The effect of smoking on birth weight

47 / 68

Interpreting interaction

◮ There is an effect of smoking, but only for those who havebeen smoking for a long time.

◮ There is an effect of duration, and this effects increaseswith amount of smoking

The effect of duration depends upon .... amount of smoking

and the effect of amount depends upon .... duration of smoking

48 / 68

Page 13: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Example: Fibrinogen after spleen operation

34 rats are randomized, in 2 ways

◮ 17 have their spleen removed (splenectomy=yes/no)◮ 8/17 in each group are kept in altitude chambers (15.000

ft) (place=altitude/control)

OutcomeFibrinogen level in mg% at day 21

49 / 68

Example: Fibrinogen after spleen operation

fibrin

ogen

100

200

300

400

500

600

group

no_altitude no_control yes_altitude yes_control

50 / 68

ANOVA model with interaction

The usual additive model:

Yspr = µ + αs + βp + εspr , εspr ∼ N(0, σ2)

splenectomy (s=yes/no) and place (p=altitude/control)have an additive effect.

Model with interaction

Yspr = µ + αs + βp + γsp + εspr , εspr ∼ N(0, σ2)

Here, we specify an interaction between splenectomy andplace, i.e. the effect of living in a high altitude may be thoughtto depend upon whether or not you have an intact spleen.

and vice versa..

51 / 68

Two-way ANOVA with interaction in SAS

proc glm data=ex_fibrinogen;class splenectomy place;

model fibrinogen=place splenectomyplace*splenectomy / solution;

output out=model p=predicted r=residual;run;

The GLM Procedure

Class Level Information

Class Levels Values

splenectomy 2 no yesplace 2 altitude control

Number of observations 34

52 / 68

Page 14: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Output: two-way ANOVA table

Dependent Variable: fibrinogen

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 3 139439.2067 46479.7356 8.32 0.0004Error 30 167573.7639 5585.7921Corrected Total 33 307012.9706

R-Square Coeff Var Root MSE fibrinogen Mean0.454180 20.99213 74.73816 356.0294

Source DF Type I SS Mean Square F Value Pr > F

place 1 67925.25531 67925.25531 12.16 0.0015splenectomy 1 69662.38235 69662.38235 12.47 0.0014splenectomy*place 1 1851.56904 1851.56904 0.33 0.5691

Source DF Type III SS Mean Square F Value Pr > F

place 1 67925.25531 67925.25531 12.16 0.0015splenectomy 1 68093.92198 68093.92198 12.19 0.0015splenectomy*place 1 1851.56904 1851.56904 0.33 0.5691

53 / 68

Output: Parameter estimates

StandardParameter Estimate Error t Value Pr > |t|

Intercept 261.6666667 B 24.91271904 10.50 <.0001place altitude 104.3333333 B 36.31621657 2.87 0.0074place control 0.0000000 B . . .splenectomy no 104.4444444 B 35.23190514 2.96 0.0059splenectomy yes 0.0000000 B . . .splenectomy*place no altitude -29.5694444 B 51.35888601 -0.58 0.5691splenectomy*place no control 0.0000000 B . . .splenectomy*place yes altitude 0.0000000 B . . .splenectomy*place yes control 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular, and a generalized inverse was used tosolve the normal equations. Terms whose estimates are followed by the letter ’B’ are not

uniquely estimable.

54 / 68

Computing expected values

The reference levels are place=control,splenectomy=yes(as SAS chooses the reference levels as last level based onalphabetic ordering)

so the expected fibrinogen level for these animals isintercept=261.67

For all other groups, we have to add one or more extraestimates, as shown in the table below:

55 / 68

Expected fibrinogen levels

placesplenectomy control altitude

261.67 261.67yes + 104.33

= 366.00261.67 261.67

+ 104.44 + 104.44no + 104.33

- 29.57= 366.11 = 440.87

Note: expected value for splenectomy=no, place=altitude - rounding issue

56 / 68

Page 15: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Model checking

Variance homogeneity may be judged from a one-wayanova

The GLM ProcedureClass Level Information

Class Levels Valuesgroup 4 no_altitude no_control yes_altitude yes_control

Number of observations 34

Levene’s Test for Homogeneity of fibrinogen VarianceANOVA of Squared Deviations from Group Means

Sum of MeanSource DF Squares Square F Value Pr > F

group 3 1.9078E8 63594756 1.55 0.2222Error 30 1.2314E9 41045352

No reason to suspect inhomogeneity

57 / 68

Normality assumption for residuals

Result from proc univariate normal)

Tests for Normality

Test --Statistic--- -----p Value------Shapiro-Wilk W 0.964518 Pr < W 0.3276Kolmogorov-Smirnov D 0.126665 Pr > D >0.1500Cramer-von Mises W-Sq 0.091627 Pr > W-Sq 0.1424Anderson-Darling A-Sq 0.490958 Pr > A-Sq 0.2140

ConclusionNo reason to suspect non-normality

58 / 68

Mode simplification

In the two-way anova, the interaction was not significant(P=0.77), so we omit it from the model:

proc glm data=ex_fibrinogen;class splenectomy place;model fibrinogen=place splenectomy / solution clparm;

run;

Dependent Variable: fibrinogen

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 2 137587.6377 68793.8188 12.59 <.0001Error 31 169425.3329 5465.3333Corrected Total 33 307012.9706

R-Square Coeff Var Root MSE fibrinogen Mean0.448149 20.76455 73.92789 356.0294

Source DF Type III SS Mean Square F Value Pr > Fplace 1 67925.25531 67925.25531 12.43 0.0013splenectomy 1 69662.38235 69662.38235 12.75 0.0012

59 / 68

Assessing the main effects

StandardParameter Estimate Error t Value Pr > |t|

Intercept 268.6241830 B 21.54935559 12.47 <.0001place altitude 89.5486111 B 25.40104253 3.53 0.0013place control 0.0000000 B . . .splenectomy no 90.5294118 B 25.35705800 3.57 0.0012splenectomy yes 0.0000000 B . . .

Parameter 95% Confidence Limits

Intercept 224.6739825 312.5743835place altitude 37.7428433 141.3543789place control . .splenectomy no 38.8133510 142.2454725splenectomy yes . .

◮ Removal of spleen leads to a decrease in fibronogen ofapprox 90.53 mg% at day 21

◮ Placing in altitude leads to an increase in fibronogen ofapprox 89.55 mg% at day 21

60 / 68

Page 16: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Residual plots

Normality Variance homogeneity

Res

idua

l

-200

-100

0

100

200

Expected

260 280 300 320 340 360 380 400 420 440 460

61 / 68

More complicated analyses of variances

◮ Three- or more-sided analysis of variance.◮ Latin squares

1 2 3I A B CII B C AIII C A B

(Cochran & Cox (1957): Experimental Designs, 2.ed., Wiley)

◮ Cross-over designs◮ Variance component models

62 / 68

Example of a latin square: A rabbit experiment

63 / 68

Example of a latin square: A rabbit experiment

◮ 6 rabbits◮ Vaccination at 6 different

spots on the back◮ 6 different orders of

vaccination◮ Swelling is area of

blister (cm2)

spot rabbit order swelling

1 1 3 7.91 2 5 8.71 3 4 7.41 4 1 7.4

.

.6 4 4 5.86 5 1 6.46 6 3 7.7

64 / 68

Page 17: Outline Analysis of Variance Comparison of serveral groupspublicifsv.sund.ku.dk/~lts/varians_regression/overheads/anova3.pdf · Analysis of variance and regression for health researchers

Illustrations

sw

ellin

g

5

6

7

8

9

10

spot

a b c d e f

1

1

11 1

1

22 2

2

2

2

3 3

33

3 34 44

4

44

5

5

5

5

5 5

6

6

6

6

66

sw

ellin

g

5

6

7

8

9

10

order

1 2 3 4 5 6

11

1

1

11

2 2 2

2

22

3 33

3

3

34 4

44

445 5

55

55

6

66 6

6

6

65 / 68

3-way analysis of variance, with additive effects

proc glm;class rabbit spot order;model swelling=rabbit spot order;

run;

The GLM Procedure

Class Level Information

Class Levels Values

rabbit 6 1 2 3 4 5 6spot 6 a b c d e forder 6 1 2 3 4 5 6

Number of observations 36

66 / 68

3-way analysis of variance

Dependent Variable: swelling

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 15 17.23000000 1.14866667 1.75 0.1205Error 20 13.13000000 0.65650000Corrected Total 35 30.36000000

R-Square Coeff Var Root MSE swelling Mean

0.567523 10.99883 0.810247 7.366667

Source DF Type III SS Mean Square F Value Pr > F

rabbit 5 12.83333333 2.56666667 3.91 0.0124spot 5 3.83333333 0.76666667 1.17 0.3592order 5 0.56333333 0.11266667 0.17 0.9701

The design is balanced , so the test of the effect of one variable(covariate) does not depend on which of the others are still inthe model.

67 / 68

How about possible interactions?

proc glm;class rabbit spot order;model swelling=rabbit spot order spot*order;

run;

Dependent Variable: swellingSum of

Source DF Squares Mean Square F Value Pr > F

Model 35 30.36000000 0.86742857 . .Error 0 0.00000000 .Corrected Total 35 30.36000000

Source DF Type I SS Mean Square F Value Pr > F

rabbit 5 12.83333333 2.56666667 . .spot 5 3.83333333 0.76666667 . .order 5 0.56333333 0.11266667 . .spot*order 20 13.13000000 0.65650000 . .

There is no room for interaction, since there is only oneobservation for each combination of spot and order!

68 / 68


Recommended