+ All Categories
Home > Documents > Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The...

Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The...

Date post: 02-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
56
Topic 20: Single Factor Analysis of Variance
Transcript
Page 1: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Topic 20: Single Factor

Analysis of Variance

Page 2: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Outline

• Single factor Analysis of Variance

–One set of treatments

•Cell means model

•Factor effects model

–Link to linear regression using indicator explanatory variables

Page 3: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

One-Way ANOVA• The response variable Y is continuous

• The explanatory variable is categorical

–We call it a factor

–The possible values are called levels

• This approach is a generalization of the

independent two-sample pooled t-test

• In other words, it can be used when there

are more than two treatments

Page 4: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Data for One-Way ANOVA

• Y is the response variable

• X is the factor (it is qualitative/discrete)

– r is the number of levels

–often refer to these levels as groups

or treatments

Page 5: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Notation• For Yij we use

– i to denote the level of the factor

– j to denote the jth observation at

factor level i

• i = 1, . . . , r levels of factor X

• j = 1, . . . , ni observations for level i

of factor X

– Note that ni does not need to be the

same for each level

Page 6: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

KNNL Example (p 685)

• Y is the number of cases of cereal sold

• X is the design of the cereal package

– there are four levels for X because

there are four different designs

• i =1 to 4 levels

• j =1 to ni stores with design i (ni=5,5,4,5)

• Will use n if ni the same across levels

Page 7: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Data for one-way ANOVA

data a1;

infile 'c:../data/ch16ta01.txt';

input cases design store;

proc print data=a1;

run;

Page 8: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

The data

Obs cases design store

1 11 1 1

2 17 1 2

3 16 1 3

4 14 1 4

5 15 1 5

6 12 2 1

7 10 2 2

8 15 2 3

Page 9: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Plot the data

symbol1 v=circle i=none;

proc gplot data=a1;

plot cases*design;

run;

Page 10: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

The scatterplot

Page 11: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Plot the means

proc means data=a1;

var cases; by design;

output out=a2 mean=avcases;

proc print data=a2;

symbol1 v=circle i=join;

proc gplot data=a2;

plot avcases*design;

run;

Page 12: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Proc Print: New Data Set

Obs design _TYPE_ _FREQ_ avcases

1 1 0 5 14.6

2 2 0 5 13.4

3 3 0 4 19.5

4 4 0 5 27.2

Page 13: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Proc Gplot: Means Plot

Page 14: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

The Model

• We assume that the response variable is

–Normally distributed with a

1. mean that may depend on the level of the factor

2. constant variance

• All observations assumed independent

• NOTE: Same assumptions as linear regression except there is no assumed linear relationship between X and E(Y|X)

Page 15: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

The scatterplot

Based on scatterplot and design:

Independence?

Constant variance?

Normally distributed?

Page 16: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Cell Means Model

• A “cell” refers to a level of the factor

• Yij = μi + εij

–where μi is the theoretical mean or

expected value of all observations

at level (or in cell) i

– the εij are iid N(0, σ2) which means

–Yij ~N(μi, σ2) and independent

Page 17: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Parameters• The parameters of the model are

– μ1, μ2, … , μr

–σ2

• Question (Version 1) – Does our

explanatory variable help explain Y?

• Question (Version 2) – Do the μi vary?

H0: μ1= μ2= … = μr = μ (a constant)

Ha: not all μ’s are the same

Page 18: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Estimates

• Estimate μi by the mean of the

observations at level i, (sample mean)

• For each level i, also get an estimate of the variance

• (sample variance)

• We combine these to get an overall estimate of σ2

• Same approach as pooled t-test

iY

i i ij iˆ Y n Y

2

2

i ij i iY n 1s Y2

is

Page 19: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Pooled estimate of σ2

• If the ni were all the same we would

average the

– Do not average the si

• In general we pool the , giving

weights proportional to the df, ni -1

• The pooled estimate is

2

is

2

is

2 2

i i i

2

i i T

n 1 n 1

n 1 n

s s

s r

Page 20: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Running proc glm

proc glm data=a1;

class design;

model cases=design;

means design;

lsmeans design / stderr;

run;

Difference 1: Need

to specify factor

variables

Difference 2: Ask

for mean estimates

Page 21: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Output

Class Level

Information

Class Levels Values

design 4 1 2 3 4

Number of Observations Read 19

Number of Observations Used 19

Important to check

these summaries!!!

Page 22: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

SAS 9.4 default output

for MEANS statement

Page 23: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

MEANS statement output

Level of

design N

cases

Mean Std Dev

1 5 14.6000000 2.30217289

2 5 13.4000000 3.64691651

3 4 19.5000000 2.64575131

4 5 27.2000000 3.96232255

Table of sample means and

sample standard deviations

Page 24: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

SAS 9.4 default output

for LSMEANS statement

Page 25: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

LSMEANS statement

output

design cases LSMEAN

Standard

Error Pr > |t|

1 14.6000000 1.4523544 <.0001

2 13.4000000 1.4523544 <.0001

3 19.5000000 1.6237816 <.0001

4 27.2000000 1.4523544 <.0001

Provides estimates based on

model (i.e., constant variance)

Page 26: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Notation for ANOVA

i ij ij

ij Ti j

T ii

Y n (trt sample mean)

Y n (overall sample mean)

n n (total number of observations)

Y

Y

i iiWhen n n for all i, r YY

Page 27: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

ANOVA Table

Source df SS MS

Model r-1 SSR/dfR

Error nT-r SSE/dfE

Total nT-1 SSTO/dfT

2

ii jY Y

2

ij ii jY Y

2

iji jY Y

Page 28: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

ANOVA SAS Output

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 3 588.2210526 196.0736842 18.59 <.0001

Error 15 158.2000000 10.5466667

Cor Total 18 746.4210526

R-Square Coeff Var Root MSE cases Mean

0.788055 17.43042 3.247563 18.63158

Page 29: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Expected Mean Squares

• E(MSR) > E(MSE) when the group means are different

• See KNNL p 694 – 698 for more details

• In more complicated models, the EMS tell us how to construct the F test

2

22

i ii

i i Ti

E MSE

E MSR n 1

where n n

r

Page 30: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

F test

• F* = MSR/MSE

• H0: μ1 = μ2 = … = μr

• Ha: not all of the μi are equal

• Under H0, F* ~ F(r-1, nT-r)

• Reject H0 when F* is large

• Typically report the P-value

Page 31: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Maximum Likelihood

Approach1. proc mixed data=a1;

class design;

model cases=design;

lsmeans design;

2. proc glimmix data=a1;

class design;

model cases=design / dist=normal;

lsmeans design;

run;

Page 32: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

GLIMMIX Output

Model Information

Data Set WORK.A1

Response Variable cases

Response Distribution Gaussian

Link Function Identity

Variance Function Default

Variance Matrix Diagonal

Estimation Technique Restricted Maximum

Likelihood

Degrees of Freedom Method Residual

Page 33: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

GLIMMIX Output

Fit Statistics

-2 Res Log Likelihood 84.12

AIC (smaller is better) 94.12

AICC (smaller is better) 100.79

BIC (smaller is better) 97.66

CAIC (smaller is better) 102.66

HQIC (smaller is better) 94.08

Pearson Chi-Square 158.20

Pearson Chi-Square / DF 10.55

Page 34: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

GLIMMIX OutputType III Tests of Fixed Effects

Effect

Num

DF

Den

DF F Value Pr > F

design 3 15 18.59 <.0001

design Least Squares Means

design Estimate

Standard

Error DF t Value Pr > |t|

1 14.6000 1.4524 15 10.05 <.0001

2 13.4000 1.4524 15 9.23 <.0001

3 19.5000 1.6238 15 12.01 <.0001

4 27.2000 1.4524 15 18.73 <.0001

Page 35: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Factor Effects Model

• A reparameterization of the cell means

model

• Useful way at looking at more

complicated models

• Null hypotheses are easier to state

• Yij = μ + i + εij

– the εij are iid N(0, σ2)

Page 36: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Parameters

• The parameters of the model are

– μ, 1, 2, … , r

– σ2

• The cell means model had r + 1 parameters

– r μ’s and σ2

• The factor effects model has r + 2 parameters

– μ, the r ’s, and σ2

– Cannot uniquely estimate all parameters

Page 37: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

An example

• Suppose r=3; μ1 = 10, μ2 = 20, μ3 = 30

• What is an equivalent set of parameters

for the factor effects model?

• We need to have μ + i = μi so…

1. μ = 0, 1 = 10, 2 = 20, 3 = 30

2. μ = 20, 1 = -10, 2 = 0, 3 = 10

3. μ = 5000, 1 = -4990, 2 = -4980, 3 = -4970

all provide the same means

Page 38: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Problem with factor effects?

• These parameters are not estimable or not well defined (i.e., not unique)

– There are many solutions to the least squares problem

– There is an X΄X matrix for this parameterization that does not have an inverse (perfect multicollinearity)

• We addressed similar situation in multiple regression. Parameter estimators provided by SAS are biased

Page 39: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Factor effects solution

• Put a constraint on the i

• Common to assume Σi i = 0

• This effectively reduces the number

of parameters by one

• Numerous other constraints possible

–Σi i = 100

– r = 0

Page 40: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Consequences

• Regardless of constraint, we always

have μi = μ + i

• The constraint Σi i = 0 implies

–μ = (Σi μi)/r (unweighted overall mean)

– i = μi – μ (group effect)

• The “unweighted” complicates

things when the ni are not all equal;

see KNNL p 702-708

Page 41: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Hypotheses

• H0: μ1 = μ2 = … = μr

• H1: not all of the μi are equal

are translated into

• H0: 1 = 2 = … = r = 0

• H1: at least one i is not 0

Page 42: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Estimates of parameters

• With the constraint Σi i = 0

ii

i

i i

ˆ

(if n n)

ˆ ˆ

r

Y

Y

Y

Page 43: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Solution used by SAS

• Recall, X΄X does not have an inverse

• We can use a generalized inverse in

its place

• (X΄X)- is the standard notation for

generalized inverse

• There are many generalized inverses,

each corresponding to a different

constraint

Page 44: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Solution used by SAS

• (X΄X)- used in proc glm corresponds to the constraint r = 0

• Recall that μ and the i are not estimable

• But the linear combinations μ + i are estimable

• These are estimated by the cell means (i.e., sample means)

Page 45: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Cereal package example

• Y is the number of cases of cereal sold

• X is the design of the cereal package

• i =1 to 4 levels

• j =1 to ni stores with design i

Page 46: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

SAS coding for X•Class statement generates r explanatory

variables

•The ith explanatory variable is equal to 1 if the observation is from the ith group

•In other words, the rows of X are

1 1 0 0 0 for design=1

1 0 1 0 0 for design=2

1 0 0 1 0 for design=3

1 0 0 0 1 for design=4

Page 47: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Some SAS options

proc glm data=a1;

class design;

model cases=design

/xpx inverse solution;

run;

Page 48: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

xpx OutputThe X'X Matrix

Int d1 d2 d3 d4 casesInt 19 5 5 4 5 354d1 5 5 0 0 0 73d2 5 0 5 0 0 67d3 4 0 0 4 0 78d4 5 0 0 0 5 136cases 354 73 67 78 136 7342

Contains

X’Y

Also Y’Y

Page 49: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

inverse Output

X'X Generalized Inverse (g2)

Int d1 d2 d3 d4 casesInt 0.2 -0.2 -0.2 -0.2 0 27.2d1 -0.2 0.4 0.2 0.2 0 -12.6d2 -0.2 0.2 0.4 0.2 0 -13.8d3 -0.2 0.2 0.2 0.45 0 -7.7d4 0 0 0 0 0 0cases 27.2 -12.6 -13.8 -7.7 0 158.2

Page 50: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Inverse Matrix

•Actually, this matrix is

(X΄X)- (X΄X)- X΄Y

Y΄X(X΄X)- Y΄Y-Y΄X(X΄X)- X΄Y

•Parameter estimates are in upper

right corner, SSE is lower right

corner (last column on previous page)

Page 51: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

solution Output:

Parameter estimates

StPar Est Err t PInt 27.2 B 1.45 18.73 <.0001d1 -12.6 B 2.05 -6.13 <.0001d2 -13.8 B 2.05 -6.72 <.0001d3 -7.7 B 2.17 -3.53 0.0030d4 0.0 B . . .

Page 52: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Caution Message

NOTE: The X'X matrix has beenfound to be singular, and ageneralized inverse was usedto solve the normal equations.Terms whose estimates arefollowed by the letter 'B' arenot uniquely estimable.

Page 53: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Interpretation

• If r = 0 (in our case, 4 = 0), then the

corresponding estimate should be zero

• The intercept μ is then estimated by the

sample mean of Group 4

• Since μ + i is the mean of group i, the i

are estimated as the differences between

the sample mean of Group i and the

sample mean of Group 4

Page 54: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Recall the means output

Level ofdesign N Mean Std Dev

1 5 14.6 2.32 5 13.4 3.63 4 19.5 2.64 5 27.2 3.9

Page 55: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Parameter estimates

based on means

Level ofdesign Mean

= 27.2 = 27.2

1 14.6 = 14.6-27.2 = -12.6

2 13.4 = 13.4-27.2 = -13.8

3 19.5 = 19.5-27.2 = -7.7

4 27.2 = 27.2-27.2 = 0

1

2

3

4

Page 56: Topic 20: Single Factor Analysis of Variancebacraig/notes512/Topic_20.pdf · One-Way ANOVA • The response variable Y is continuous • The explanatory variable is categorical –We

Last slide

• Read KNNL Chapter 16 up to 16.10

• We used programs topic20.sas to generate the output for today

• Will focus more on the relationship between regression and one-way ANOVA in next topic


Recommended