+ All Categories
Home > Documents > ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction...

ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction...

Date post: 15-Jan-2016
Category:
Upload: madlyn-caldwell
View: 237 times
Download: 0 times
Share this document with a friend
Popular Tags:
66
ANOVA and ANOVA and Regression Regression Brian Healy, PhD Brian Healy, PhD
Transcript
Page 1: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ANOVA and ANOVA and RegressionRegression

Brian Healy, PhDBrian Healy, PhD

Page 2: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ObjectivesObjectives

ANOVAANOVA– Multiple comparisonsMultiple comparisons

Introduction to regressionIntroduction to regression– Relationship to correlation/t-testRelationship to correlation/t-test

Page 3: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Comments from reviewsComments from reviews

Please fill them out because I read themPlease fill them out because I read them More examples and not just MSMore examples and not just MS More depth on technical More depth on technical

details/statistical theory/equationsdetails/statistical theory/equations– First time ever!!First time ever!!– I have made slides from more in depth I have made slides from more in depth

courses available on-line so that you have courses available on-line so that you have access to formulas for t-test, ANOVA, etc.access to formulas for t-test, ANOVA, etc.

Talks too fast for non-native speakersTalks too fast for non-native speakers

Page 4: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ReviewReview

Types of dataTypes of data p-valuep-value Steps for hypothesis testSteps for hypothesis test

– How do we set up a null hypothesis?How do we set up a null hypothesis? Choosing the right testChoosing the right test

– Continuous outcome Continuous outcome variable/dichotomous explanatory variable/dichotomous explanatory variable: Two sample t-testvariable: Two sample t-test

Page 5: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Steps for hypothesis testingSteps for hypothesis testing

1)1) State null hypothesisState null hypothesis2)2) State type of data for explanatory and State type of data for explanatory and

outcome variableoutcome variable3)3) Determine appropriate statistical testDetermine appropriate statistical test4)4) State summary statisticsState summary statistics5)5) Calculate p-value (stat package)Calculate p-value (stat package)6)6) Decide whether to reject or not reject the Decide whether to reject or not reject the

null hypothesisnull hypothesis• NEVER accept nullNEVER accept null

7)7) Write conclusionWrite conclusion

Page 6: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ExampleExample

In previous class, two groups were In previous class, two groups were compared on a continuous outcomecompared on a continuous outcome

What if we have more than two groups?What if we have more than two groups? Ex. A recent study compared the Ex. A recent study compared the

intensity of structures on MRI in normal intensity of structures on MRI in normal controls, benign MS patients and controls, benign MS patients and secondary progressive MS patientssecondary progressive MS patients

Question: Is there any difference among Question: Is there any difference among these groups?these groups?

Page 7: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 8: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Two approachesTwo approaches

Compare each group to each other Compare each group to each other group using a t-testgroup using a t-test– Problem with Problem with multiple comparisonsmultiple comparisons

Complete Complete global comparisonglobal comparison to see to see if there is any differenceif there is any difference– Analysis of variance (ANOVA)Analysis of variance (ANOVA)– Good first step even if eventually Good first step even if eventually

complete pairwise comparisonscomplete pairwise comparisons

Page 9: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis

ContinuousContinuous DichotomousDichotomous t-test, Wilcoxon t-test, Wilcoxon testtest

ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression

ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression

DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

Page 10: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Global test-ANOVAGlobal test-ANOVA

As a first step, we can compare across As a first step, we can compare across all groups at onceall groups at once

The null hypothesis for ANOVA is that The null hypothesis for ANOVA is that the means in all of the groups are equalthe means in all of the groups are equal

ANOVA compares the within group ANOVA compares the within group variance and the between group variance and the between group variancevariance– If the patients within a group are very alike If the patients within a group are very alike

and the groups are very different, the and the groups are very different, the groups are likely differentgroups are likely different

Page 11: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 12: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) HH00: mean: meannormalnormal=mean=meanBMSBMS=mean=meanSPMSSPMS

2)2) Outcome variable: continuousOutcome variable: continuousExplanatory variable: categoricalExplanatory variable: categorical

3)3) Test: ANOVATest: ANOVA4)4) meanmeannormalnormal=0.41; mean=0.41; meanBMSBMS= 0.34; = 0.34;

meanmeanSPMSSPMS=0.30=0.305)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: At least one of the groups is Conclusion: At least one of the groups is

significantly different than the others significantly different than the others

Page 13: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 14: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Technical asideTechnical aside Our F-statistic is the ratio of the between group Our F-statistic is the ratio of the between group

variance and the within group variancevariance and the within group variance

This ratio of variances has a known distribution (F-This ratio of variances has a known distribution (F-distribution)distribution)

If our calculated F-statistic is high, the between If our calculated F-statistic is high, the between group variance is higher than the within group group variance is higher than the within group variance, meaning the differences between the variance, meaning the differences between the groups are not likely due to chancegroups are not likely due to chance

Therefore, the probability of the observed result Therefore, the probability of the observed result or something more extreme will be low (low p-or something more extreme will be low (low p-value)value)

1111

1

122

11

1

2

2

2

kkk

k

iii

within

between

nnsnsn

kxxn

s

sF

Page 15: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

This is the distribution under the null

This small shaded region is the part of the distribution that is equal to or more extreme than the observed value.

The p-value!!!

Page 16: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Now whatNow what

The question often becomes which The question often becomes which groups are differentgroups are different

Possible comparisonsPossible comparisons– All pairsAll pairs– All groups to a specific controlAll groups to a specific control– Pre-specified comparisonsPre-specified comparisons

If we do many tests, we should If we do many tests, we should account for account for multiple comparisonsmultiple comparisons

Page 17: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Type I errorType I error

Type I error is when you reject the Type I error is when you reject the null hypothesis even though it is true null hypothesis even though it is true ((=P(reject H=P(reject H00|H|H00 is true)) is true))

We accept making this error 5% of We accept making this error 5% of the timethe time

If we run a large experiment with 100 If we run a large experiment with 100 tests and the null hypothesis was tests and the null hypothesis was true in each case, how many times true in each case, how many times would we expect to reject the null?would we expect to reject the null?

Page 18: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Multiple comparisonsMultiple comparisons For this problem, three comparisonsFor this problem, three comparisons

– NC vs. BMS; NC vs. SPMS; BMS vs. SPMSNC vs. BMS; NC vs. SPMS; BMS vs. SPMS If we complete each test at the 0.05 level, If we complete each test at the 0.05 level,

what is the chance that we make a type I what is the chance that we make a type I error? error? – P(reject at least 1 | HP(reject at least 1 | H00 is true) is true) = = – P(reject at least 1 | HP(reject at least 1 | H00 is true) is true) = 1- = 1- P(fail to reject P(fail to reject

all three| Hall three| H00 is true) is true) = 1-0.95= 1-0.9533 = 0.143 = 0.143 Inflated type I error rateInflated type I error rate Can correct p-value for each test to Can correct p-value for each test to

maintain experiment type I errormaintain experiment type I error

Page 19: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Bonferroni correctionBonferroni correction

The The Bonferroni correctionBonferroni correction multiples all p- multiples all p-values by the number of comparisons values by the number of comparisons completedcompleted– In our experiment, there were 3 comparisons, so In our experiment, there were 3 comparisons, so

we multiply by 3we multiply by 3– Any p-value that remains less than 0.05 is Any p-value that remains less than 0.05 is

significant significant The Bonferroni correction is conservative (it The Bonferroni correction is conservative (it

is more difficult to obtain a significant result is more difficult to obtain a significant result than it should be), but it is an extremely easy than it should be), but it is an extremely easy way to account for multiple comparisons.way to account for multiple comparisons.– Can be very harsh correction with many testsCan be very harsh correction with many tests

Page 20: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Other correctionsOther corrections

All pairwise comparisonsAll pairwise comparisons– Tukey’s testTukey’s test

All groups to a controlAll groups to a control– Dunnett’s testDunnett’s test

MANY othersMANY others False discovery rateFalse discovery rate

Page 21: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ExampleExample

For our three-group comparison, we For our three-group comparison, we compare each and get the following results compare each and get the following results from Tukey’s testfrom Tukey’s test

GroupsGroups Mean Mean diffdiff

p-valuep-value SignificaSignificantnt

NC vs. BMSNC vs. BMS 0.0750.075 0.100.10

NC vs. SPMSNC vs. SPMS 0.1140.114 0.0120.012 **

BMS vs. BMS vs. SPMSSPMS

0.0390.039 0.600.60

Page 22: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 23: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Questions to ask yourselfQuestions to ask yourself

What is the null hypothesis?What is the null hypothesis? We would like to test the null We would like to test the null

hypothesis at the 0.05 levelhypothesis at the 0.05 level If well defined prior to the experiment, If well defined prior to the experiment,

the correction for multiple comparison the correction for multiple comparison if necessary will be clearif necessary will be clear

Hypothesis generating vs. Hypothesis generating vs. hypothesis testinghypothesis testing

Page 24: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ConclusionsConclusions

If you are doing a multiple group If you are doing a multiple group comparison, always specify before the comparison, always specify before the experiment which comparisons are of experiment which comparisons are of interest if possibleinterest if possible

If the null hypothesis is that all the groups If the null hypothesis is that all the groups are the same, test global null using ANOVAare the same, test global null using ANOVA

Complete appropriate additional Complete appropriate additional comparisons with corrections if necessarycomparisons with corrections if necessary

No single right answer for every situationNo single right answer for every situation

Page 25: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis

ContinuousContinuous DichotomousDichotomous t-test, Wilcoxon t-test, Wilcoxon testtest

ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression

ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression

DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

Page 26: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

CorrelationCorrelation

Is there a linear Is there a linear relationship relationship between IL-10 between IL-10 expression and IL-6 expression and IL-6 expression? expression?

The best graphical The best graphical display for this display for this data is a scatter data is a scatter plotplot

Page 27: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

CorrelationCorrelation

DefinitionDefinition: the degree to which two : the degree to which two continuous variables are linearly relatedcontinuous variables are linearly related– Positive correlation- As one variable goes up, the Positive correlation- As one variable goes up, the

other goes up (positive slope)other goes up (positive slope)– Negative correlation- As one variable goes up, the Negative correlation- As one variable goes up, the

other goes down (negative slope)other goes down (negative slope) Correlation (Correlation () ranges from -1 (perfect ) ranges from -1 (perfect

negative correlation) to 1 (perfect positive negative correlation) to 1 (perfect positive correlation)correlation)

A correlation of 0 means that there is no linear A correlation of 0 means that there is no linear relationship between the two variablesrelationship between the two variables

Page 28: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Positive correlation

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Negative correlation

0

2

4

6

8

10

12

0 2 4 6 8 10 12

No correlation

0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12

No correlation (quadratic)

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10

Page 29: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) HH00: correlation between IL-10 expression : correlation between IL-10 expression and IL-6 expression=0and IL-6 expression=0

2)2) Outcome variable: IL-6 expression- Outcome variable: IL-6 expression- continuouscontinuousExplanatory variable: IL-10 expression- Explanatory variable: IL-10 expression- continuouscontinuous

3)3) Test: correlationTest: correlation4)4) Summary statistic: correlation=0.51Summary statistic: correlation=0.515)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: A statistically significant Conclusion: A statistically significant

correlation was observed between the two correlation was observed between the two variables variables

Page 30: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 31: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Technical aside-correlationTechnical aside-correlation The formal definition of the correlation is given by: The formal definition of the correlation is given by:

Note that this is dimensionless quantity Note that this is dimensionless quantity This equation shows that if the covariance between This equation shows that if the covariance between

the two variables is the same as the variance in the the two variables is the same as the variance in the two variables, we have perfect correlation because two variables, we have perfect correlation because all of the variability in x and y is explained by how all of the variability in x and y is explained by how the two variables change togetherthe two variables change together

)()(

),(),(

yVarxVar

yxCovyxCorr

Page 32: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

How can we estimate the How can we estimate the correlation?correlation?

The most common estimator of the correlation is The most common estimator of the correlation is the the Pearson’s correlation coefficientPearson’s correlation coefficient, given by: , given by:

This is a estimate that requires both x and y are This is a estimate that requires both x and y are normally distributed. Since we use the mean in the normally distributed. Since we use the mean in the calculation, the estimate is sensitive to outliers.calculation, the estimate is sensitive to outliers.

n

ii

n

ii

n

iii

yyxx

yyxxr

1

2

1

2

1

Page 33: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Distribution of the test Distribution of the test statisticstatistic

The standard error of the sample The standard error of the sample correlation coefficient is given bycorrelation coefficient is given by

The resulting distribution of the test The resulting distribution of the test statistic is a t-distribution with n-2 degrees statistic is a t-distribution with n-2 degrees of freedom where n is the number of of freedom where n is the number of patients (not the number of measurements)patients (not the number of measurements)

2

1)(ˆ

2

n

rres

22 1

2

21

0

r

nr

nr

rt

Page 34: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Regression-Everything in one Regression-Everything in one placeplace

All analyses we have done to this All analyses we have done to this point can be completed using point can be completed using regression!!!regression!!!

Page 35: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Quick math reviewQuick math review

As you remember, As you remember, the equation of a the equation of a line is line is y=mx+by=mx+b

FFor every one unit or every one unit increase in x, there increase in x, there is an m unit is an m unit increase in yincrease in y

bb is the value of y is the value of y when x is equal to when x is equal to zerozero

Line

y = 1.5x + 4

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12

Page 36: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

PicturePicture

Does there seem to Does there seem to be a linear be a linear relationship in the relationship in the data?data?

Is the data Is the data perfectly linear?perfectly linear?

Could we fit a line Could we fit a line to this data?to this data?

0

5

10

15

20

25

0 2 4 6 8 10 12

Page 37: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

How do we find the best How do we find the best line?line?

Linear regression Linear regression tries to find the tries to find the best line (curve) to best line (curve) to fit the data Let’s fit the data Let’s look at three look at three candidate linescandidate lines

Which do you think Which do you think is the best?is the best?

What is a way to What is a way to determine the best determine the best line to use?line to use?

Page 38: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

What is linear regression?What is linear regression?

The method of The method of finding the best finding the best line (curve) is least line (curve) is least squares, which squares, which minimizes the minimizes the distance from the distance from the line for each of line for each of points points

The equation of the The equation of the line is y=1.5x + 4line is y=1.5x + 4

y = 1.5x + 4

0

5

10

15

20

25

0 2 4 6 8 10 12

Page 39: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ExampleExample For our investigation of the For our investigation of the

relationship between IL-10 relationship between IL-10 and IL-6, we can set up a and IL-6, we can set up a regression equationregression equation

is the expression of IL-6 is the expression of IL-6 when IL-10=0 (intercept)when IL-10=0 (intercept)

is the change in IL-6 for is the change in IL-6 for every 1 unit increase in IL-every 1 unit increase in IL-10 (slope)10 (slope)

ii is the residual from the is the residual from the lineline

iii ILIL 10*6 10

Page 40: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

The final regression equation is The final regression equation is

The coefficients mean The coefficients mean – the estimate of the mean expression of IL-6 the estimate of the mean expression of IL-6

for a patient with IL-10 expression=0 (for a patient with IL-10 expression=0 (00))

– an increase of one unit in IL-10 expression an increase of one unit in IL-10 expression leads to an estimated increase of 0.63 in the leads to an estimated increase of 0.63 in the mean expression of IL-6 (mean expression of IL-6 (11))

10*63.04.266̂ ILIL

Page 41: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Tough questionTough question

In our correlation hypothesis test, we In our correlation hypothesis test, we wanted to know if there was an association wanted to know if there was an association between the two measuresbetween the two measures

If there was no relationship between IL-10 If there was no relationship between IL-10 and IL-6 in our system, what would happen and IL-6 in our system, what would happen to our regression equation?to our regression equation?– No effect means that the change in IL-6 is not No effect means that the change in IL-6 is not

related to the change in IL-10related to the change in IL-10

– 11=0=0

Is Is 11 significantly different than zero? significantly different than zero?

Page 42: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) HH00: no relationship between IL-6 : no relationship between IL-6 expression and IL-10 expression, expression and IL-10 expression, 11 =0 =0

2)2) Outcome variable: IL-6- continuousOutcome variable: IL-6- continuousExplanatory variable: IL-10- continuousExplanatory variable: IL-10- continuous

3)3) Test: linear regressionTest: linear regression4)4) Summary statistic: Summary statistic: 11 = 0.63 = 0.635)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: A significant correlation was Conclusion: A significant correlation was

observed between the two variables observed between the two variables

Page 43: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.
Page 44: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Wait a second!!Wait a second!!

Let’s check somethingLet’s check something– p-value from correlation analysis = 0.011p-value from correlation analysis = 0.011– p-value from regression analysis = 0.011p-value from regression analysis = 0.011– They are the same!!They are the same!!

Regression leads to same conclusion as Regression leads to same conclusion as correlation analysiscorrelation analysis

Other similarities as well from modelsOther similarities as well from models

Page 45: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Technical aside-Estimates of Technical aside-Estimates of regression coefficientsregression coefficients

Once we have solved the least squares Once we have solved the least squares equation, we obtain estimates for the equation, we obtain estimates for the ’s, ’s, which we refer to as which we refer to as

To test if this estimate is significantly To test if this estimate is significantly different than 0, we use the following different than 0, we use the following equation: equation:

10ˆ,ˆ

xy

xx

yyxx

n

ii

n

iii

10

1

2

11

ˆˆ

ˆ

111

ˆˆ

ˆ

est

Page 46: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Assumptions of linear Assumptions of linear regressionregression

LinearityLinearity– Linear relationship between outcome and predictorsLinear relationship between outcome and predictors– E(Y|X=x)=E(Y|X=x)=++xx1 1 + + 22xx22

22 is still a linear regression is still a linear regression equation because each of the equation because each of the ’s is to the first ’s is to the first powerpower

Normality of the residualsNormality of the residuals– The residuals, The residuals, ii, are normally distributed, N(0, , are normally distributed, N(0,

Homoscedasticity of the residualsHomoscedasticity of the residuals– The residuals, The residuals, ii, have the same variance, have the same variance

IndependenceIndependence– All of the data points are independentAll of the data points are independent– Correlated data points can be taken into account Correlated data points can be taken into account

using multivariate and longitudinal data methodsusing multivariate and longitudinal data methods

Page 47: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Linear regression with Linear regression with dichotomous predictordichotomous predictor

Linear regression can also be used for Linear regression can also be used for dichotomous predictors, like sexdichotomous predictors, like sex

Last class we compared relapsing MS Last class we compared relapsing MS patients to progressive MS patientspatients to progressive MS patients

To do this, we use an indicator variable, To do this, we use an indicator variable, which equals 1 for relapsing and 0 for which equals 1 for relapsing and 0 for progressive. The resulting regression progressive. The resulting regression equation for expression isequation for expression is

iii Rex *10

Page 48: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Interpretation of modelInterpretation of model The meaning of the coefficients in this case The meaning of the coefficients in this case

are are – 0 0 is the estimate of the mean expression when is the estimate of the mean expression when

R=0, in the progressive groupR=0, in the progressive group

– is the estimate of the mean expression is the estimate of the mean expression when R=1, in the relapsing groupwhen R=1, in the relapsing group

– 1 1 is the estimate of the mean increase in is the estimate of the mean increase in expression between the two groupsexpression between the two groups

The difference between the two groups is The difference between the two groups is 11

If there was no difference between the If there was no difference between the groups, what would groups, what would 11 equal? equal?

Page 49: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Mean in wildtype=0

Mean in Progressive group=0

Difference between groups=1

Page 50: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) Null hypothesis: meanNull hypothesis: meanprogressiveprogressive=mean=meanrelapsing relapsing

((11=0)=0)2)2) Explanatory: group membership- Explanatory: group membership-

dichotomousdichotomousOutcome: cytokine production-continuousOutcome: cytokine production-continuous

3)3) Test: Linear regressionTest: Linear regression

4)4) 11=6.87=6.875)5) p-value=0.199p-value=0.1996)6) Fail to reject null hypothesisFail to reject null hypothesis7)7) Conclusion: The difference between the Conclusion: The difference between the

groups is not statistically significantgroups is not statistically significant

Page 51: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

T-testT-test

As hopefully you remember, you could As hopefully you remember, you could have tested this same null hypothesis have tested this same null hypothesis using a two sample t-testusing a two sample t-test

Very similar result to previous classVery similar result to previous class If we would have assumed equal If we would have assumed equal

variance for our t-test, we would have variance for our t-test, we would have gotten to the same result!!!gotten to the same result!!!

ANOVA results can also be tested ANOVA results can also be tested using regression using more than one using regression using more than one indicatorindicator

Page 52: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Multiple regressionMultiple regression

A large advantage of regression is the A large advantage of regression is the ability to include multiple predictors of an ability to include multiple predictors of an outcome in one analysisoutcome in one analysis

A multiple regression equation looks just A multiple regression equation looks just like a simple regression equation.like a simple regression equation.

exxxY nn ...22110

Page 53: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ExampleExample

Brain parenchymal fraction (BPF) is a Brain parenchymal fraction (BPF) is a measure of disease severity in MSmeasure of disease severity in MS

We would like to know if gender has We would like to know if gender has an effect on BPF in MS patientsan effect on BPF in MS patients

We also know that BPF declines with We also know that BPF declines with age in MS patientsage in MS patients

Is there an effect of sex on BPF if we Is there an effect of sex on BPF if we control for age?control for age?

Page 54: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

.75

.8.8

5.9

.95

BP

F

0 .2 .4 .6 .8 1Sex

Blue=males; Red=females

Page 55: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Blue=males; Red=females

.75

.8.8

5.9

.95

BP

F

20 30 40 50 60Age

Page 56: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Is age a potential Is age a potential confounder?confounder?

We know that age has an effect on We know that age has an effect on BPF from previous researchBPF from previous research

We also know that male patients We also know that male patients have a different disease course than have a different disease course than female patients so the age at time of female patients so the age at time of sampling may also be related to sexsampling may also be related to sex

BPFSex

Age

Page 57: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ModelModel

The multiple linear regression model The multiple linear regression model includes a term for both age and sexincludes a term for both age and sex

What are the values genderWhat are the values genderii takes takes on?on?– gendergenderii=0 if the patient is female=0 if the patient is female

– gendergenderii=1 if the patient is male=1 if the patient is male

iiii agegenderBPF ** 210

Page 58: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ExpressionExpression Females:Females:

– BPFBPFi i = = 00+ + 22*age*ageii++ii

Males:Males:– BPFBPFi i = (= (00+ + )+ )+ 22*age*ageii++ii

What is different about the equations?What is different about the equations?– InterceptIntercept

What is the same?What is the same?– SlopeSlope

This model allows an effect of gender on the This model allows an effect of gender on the intercept, but not on the change with ageintercept, but not on the change with age

Page 59: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

The meaning of each coefficientThe meaning of each coefficient– the average BPF when age is 0 and the the average BPF when age is 0 and the

patient is femalepatient is female

– the average difference in BPF between the average difference in BPF between males and female, HOLDING AGE CONSTANTmales and female, HOLDING AGE CONSTANT

– the average increase in BPF for a one unit the average increase in BPF for a one unit increase in age, HOLDING GENDER CONSTANT increase in age, HOLDING GENDER CONSTANT

Note that the interpretation of the Note that the interpretation of the coefficient requires mention of the other coefficient requires mention of the other variables in the modelvariables in the model

Interpretation of coefficientsInterpretation of coefficients

Page 60: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Estimated coefficientsEstimated coefficients

Here is the estimated regression equationHere is the estimated regression equation

The average difference between males and The average difference between males and females is 0.017 holding age constantfemales is 0.017 holding age constant

For every one unit increase in age, the mean For every one unit increase in age, the mean BPF decreases 0.0026 units holding sex constantBPF decreases 0.0026 units holding sex constant

Are either of these effects statistically Are either of these effects statistically significant?significant?– What is the null hypothesis?What is the null hypothesis?

iii agesexFBP *0026.0*017.0942.0ˆ

Page 61: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) HH00: No effect of sex, controlling for age : No effect of sex, controlling for age =0=02)2) Continuous outcome, continuous predictorContinuous outcome, continuous predictor3)3) Linear regression controlling for sexLinear regression controlling for sex

4)4) Summary statistic: Summary statistic: =0.017=0.0175)5) p-value=0.37p-value=0.376)6) Since the p-value is more than 0.05, we fail Since the p-value is more than 0.05, we fail

to reject the null hypothesisto reject the null hypothesis7)7) We conclude that there is no significant We conclude that there is no significant

association between sex and BPF controlling association between sex and BPF controlling for agefor age

Page 62: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Hypothesis testHypothesis test

1)1) HH00: No effect of age, controlling for sex : No effect of age, controlling for sex 22=0=02)2) Continuous outcome, continuous predictorContinuous outcome, continuous predictor3)3) Linear regression controlling for sexLinear regression controlling for sex

4)4) Summary statistic: Summary statistic: =-0.0026=-0.00265)5) p-value=0.00p-value=0.00 446)6) Since the p-value is less than 0.05, we reject Since the p-value is less than 0.05, we reject

the null hypothesisthe null hypothesis7)7) We conclude that there is a significant We conclude that there is a significant

association between age and BPF controlling association between age and BPF controlling for sexfor sex

Page 63: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Estimated effect of age

p-value for age

Estimated effect of sex

p-value for sex

Page 64: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

.75

.8.8

5.9

.95

BP

F

20 30 40 50 60Age

Page 65: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

ConclusionsConclusions

Although there was a marginally Although there was a marginally significant association of sex and significant association of sex and BPF, this association was not BPF, this association was not significant after controlling for agesignificant after controlling for age

The significant association between The significant association between age and BPF remained statistically age and BPF remained statistically significant after controlling for sexsignificant after controlling for sex

Page 66: ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

What we learned (hopefully)What we learned (hopefully)

ANOVAANOVA CorrelationCorrelation Basics of regressionBasics of regression


Recommended