+ All Categories
Home > Documents > Lecture 6: Introduction to Linear...

Lecture 6: Introduction to Linear...

Date post: 18-Aug-2018
Category:
Upload: trinhnga
View: 221 times
Download: 2 times
Share this document with a friend
63
Lecture 6: Introduction to Linear Regression Ani Manichaikul [email protected] 24 April 2007
Transcript
Page 1: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

Lecture 6:Introduction to Linear Regression

Ani [email protected]

24 April 2007

Page 2: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

2

Linear regression: main idea

Linear regression can be used to study anoutcome as a linear function of a predictorExample: 60 cities in the US were evaluatedfor numerous characteristics, including:

the percentage of the population that was“disadvantaged”median education level

Page 3: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

3

Binary education variable10

1520

2530

% o

f pop

ulat

ion

with

inco

me

< $3

000

Low Education High Education

Page 4: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

4

Linear regression vs. ANOVAThese means could be compared by a t-test or ANOVA

Mean in low education group: 15.7%Mean in high education group: 13.2%

Regression provides a unified equation:

where Xi= 1 for high education 0 for low education (X is a“dummy variable” or “indicator variable” that designatesgroup)

ii

i10i

X5.27.51Y

XY

Page 5: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

5

Interpreting the modelis the predicted mean of the outcome for

Xi, that observation’s value for X.

Xi=0 (Low education)

Xi=1 (High education)

0

i

7.1505.27.51Y

iY

ii

i10i

X5.27.51Y

XY

10

i

2.1315.27.51Y

Page 6: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

6

Interpretation

0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with low education.

Page 7: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

7

Interpretation

1 is the difference in the meanoutcome between the two groups(when Xi=1 vs. when Xi=0)Here, 1 is difference in the averagepercent of the population that isdisadvantaged for cities with higheducation compared to cities with loweducation.

Page 8: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

8

Why use linear regression?

Linear regression is very powerful. Itcan be used for many things:

Binary XContinuous XCategorical XAdjustment for confoundingInteractionCurved relationships between X and Y

Page 9: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

9

Regression Analysis

A regression is a description of aresponse measure, Y ,the dependentvariable, as a function of anexplanatory variable, X, theindependent variable.Goal: prediction or estimation of thevalue of one variable, Y , based on thevalue of the other variable, X.

Page 10: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

10

Regression Analysis

A simple relationship between the twovariables is a linear relationship(straight line relationship)

Other names: linear, simple linear, leastsquares regression

Page 11: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

11

Galton’s Example

1000 records of heights of familygroupsReally tall fathers tend on average tohave tall sons but not quite as tall asthe really tall fathersThere is a “regression” of a son’s heighttoward the average height for sons

Page 12: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

12

Galton’s ExampleRegression of Son's Stature on Father'sE(Y) = 33.73 + 0.516*X

Son

's H

eigh

t

Father's Height (inches)60 62 64 66 68 70 72 74

64

66

68

70

72

74

Page 13: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

13

Regression Analysis:Population Model

Probability Model: independent responses

y1, y2,…,yn are sampled from

Yi ~ N( i, 2)

Systematic Model: µi = E(yi|xi) = 0 + 1xiwhere: 0 = intercept

1 = slope

Page 14: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

14

Another way to write the model

Systematic: yi = 0 + 1xi + i

Probability: i ~ N(0, 2)

The response, Yi, is a linear function ofXi plus some random, normallydistributed error, I

Data = Signal + noise

Page 15: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

15

Geometric Interpretation

Page 16: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

16

Model1) Yi ~ N( i, 2)2) µi = E(yi|xi) = 0 + 1xi

OR1) yi = 0 + 1xi + i2) i ~ N(0, 2)

where: 0 = intercept1 = slope

The response, Yi, is a linear function of Xiplus some random, normally distributederror, i

Page 17: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

17

Interpretation of Coefficients

Mean Model: µ = E(y|x) = 0 + 1x0 = expected response when X = 0

Since: E(y|x=0) = 0 + 1(0) = 0

1 = change in expected response per 1 unitincrease in X

Since: E(y|x+1) = 0 + 1(x+1)And: E(y|x) = 0 + 1x

E(y) from x to x+1 = 1

Page 18: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

18

From Galton’s Example

E(Y|x) = 0 + 1xE(Y|x) = 33.7 + 0.52x

where: Y = son’s height (inches)x = father’s height (inches)

Expected son’s height =33.7 inches whenfather’s height is 0 inchesExpected difference in heights for sons whosefathers’ heights differ by one inch = 0.52inches

Page 19: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

19

City/Education Example10

1520

2530

9 10 11 12 13Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

Page 20: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

20

Model

where Xi = the median educationlevel in city i

when Xi=0

when Xi=1

when Xi=2

ii

i10i

X0.22.36Y

XY

0

i

36.200.22.36Y

10

i

34.210.22.36Y

232.220.22.36Y

10

i

Page 21: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

21

Interpretation

0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with median education level of 0.

Page 22: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

22

Interpretation

1 is the difference in the meanoutcome for a one unit change in X.Here, 1 is difference in the averagepercent of the population that isdisadvantaged between two cities,when the first city has 1% highermedian education level than the secondcity.

Page 23: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

23

Finding ’s from the graph

0 is the Y-intercept of the line, or theaverage value of Y when X=0.

1 is the slope of the line, or the averagechange in Y per unit change in X.

y=mx+bb= 0, m= 1

21

211 xx

yyˆNotation:

1 represents the true slope (in the population)

b1 and are sample estimates of the slope1ˆ

Page 24: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

24

Where is our intercept?10

1520

2530

3540

4550

5560

0 2 4 6 8 10 12 14Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

Page 25: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

25

Centering

0 makes no sense!We can change X to fix this problemby a process called centering

1. Pick a value of X (c) within the range ofthe data

2. For each observation, generateX_centered = Xi-c

3. Redo the regression with X_centered

Page 26: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

26

We’ll use c=12,a high school degree

1015

2025

30

9 10 11 12 13Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

Page 27: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

27

New equation

1 has not changed

0 now corresponds to X=12, not X=0

Note: with X=0, we have

12X0.22.12Y

12XY

ii

i10i

36.22412.21200.22.12Yi

Page 28: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

28

Interpretation

0 is the mean outcome for the referencegroup, or the group for which Xi-12=0, orwhen Xi=12.Here, 0 (12.2%) is the average percent ofthe population that is disadvantaged for citieswith a median education level of 12, theequivalent of a high school degree.The interpretation of 1 has not changed.

Page 29: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

29

Centering in Galton ExampleMake 6 feet (72 inch) fathers the ‘reference group’Create a new X variable, X*, by subtracting 72 fromour old X variable, X* = X – 72

Then: E(Y|x*) = 0 + 1x*= 0 + 1(x – 72)

So, 0 = expected response when X = 72,since E(Y|x=72) = 0 + 1(72 – 72) = 0

Center X’s whenever interpretations call for it!

Page 30: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

30

Population Comparisons

0: changes depending on centering of X,which doesn’t affect association of interestReal concern: is X associated with Y?Assess by testing 1:Does 1=0 in the population from which thissample was drawn?

Hypothesis testingConfidence interval

Page 31: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

31

Hypothesis testing

H0: 1=0Test statistic:

df = n-k-1n = number of observationsk = number of predictors (X’s)

1

1obs ˆSE

0ˆt

Page 32: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

32

Hypothesis testing foreducation example

H0: 1=0Test statistic:

df = n-k-1 = 60-1-1 = 58n = number of observations = 60k = number of predictors (X’s) = 1

p<2*(1-0.995)p<0.01

36.30.59

00.2-tobs

Page 33: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

33

Interpretation and conclusionIf there were no association between medianeducation and percentage of disadvantagedcitizens in the population, there would be lessthan a 1% chance of observing data as ormore extreme than ours.

The null probability is very small, so:reject the null hypothesisconclude that median education level andpercentage of disadvantaged citizens areassociated in the population

Page 34: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

34

Confidence IntervalNo need to specify a hypothesis:

3.2,-0.8-0.59021.20.2

ˆSEtˆ1cr1

Page 35: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

35

Interpretation and conclusion

We are 95% confident that the truepopulation decrease in percentage ofdisadvantaged citizens per additional year ofmedian education is between 3.2 and 0.8.

Since this interval does not contain 0, webelieve percentage of disadvantaged citizensand median education are associated amongcities in the United States.

Page 36: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

36

So far…Linear regression is used for continuous outcomevariables

0: mean outcome when X=0Binary X = “dummy variable” for group

1: mean difference in outcome between groupsContinuous X

1: mean difference in outcome corresponding toa 1-unit increase in XCenter X to give meaning to 0

Test 1=0 in the population

Page 37: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

Linear Regression:Multiple covariates andconfounding

Page 38: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

38

Dataset

Hourly wage information from 9,918workers, along with informationregarding age, gender, years ofexperience, etc.We’ll focus on predicting hourly wagewith available information.

Page 39: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

39

Regression: Hourly wage vs.Years of experience

010

2030

4050

0 20 40 60Years of Experience

Hou

rly W

age

Page 40: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

40

What are the parameters?For each person, their actual hourly wage (Yi)and predicted hourly wage are known.

is the residual or errorThe parameters are found by minimizing thesum of the squared error

The parameters are the “least squares”estimates

i10i

iii

XYYY

n

1i

2i10i XYmin

iY

Page 41: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

41

Notesfor any known pointon the line

is always true

The regression line equation

XY 10

i10i XY

ii10i XY

Page 42: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

42

Model 1Model 1: Predict income by years of experience

so the average hourly wage for someonewith no experience at all is about $8.40.

so for every additional year of experience,the predicted hourly wage increases about 4 cents.

For 10 years of additional experience, the predicted hourlywage increases about 40 cents.

38.8ˆ0

04.0ˆ1

iii10i X04.038.8YXˆˆY

Page 43: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

43

Should we center X?

0 years of experience is within therange of the dataThe average hourly wage correspondingto 0 years of experience makes sense

No need to center X

Page 44: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

44

What happens if we alsoconsider gender? (Model 2)

010

2030

4050

0 20 40 60Years of Experience

Men's hourly wage Women's hourly wagefit2_men fit2_women

Hou

rly W

age

Page 45: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

45

Model 2: Gender effect,no experience

For a man with no experience:

For a woman with no experience:0

i

ˆ9.27$)0(2.20-0)(04.027.9Y

20

i

ˆˆ$7.072.20(1)-0.04(0)9.27Y

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

Page 46: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

46

Model 2: Gender effect,10 years experience

For a man with 10 years of experience:

For a woman with 10 years of experience:

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y

10

i

(1)ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y

210

i

Page 47: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

47

Model 2: Experience effect,males

For a man with no experience:

For a man with 10 years of experience:0

i

ˆ9.27$)0(2.20-0)(04.027.9Y

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y

10

i

Page 48: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

48

Model 2: Experience effect,females

For a woman with no experience:

For a woman with 10 years of experience:

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

210

i

ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y

20

i

ˆˆ$7.072.20(1)-0.04(0)9.27Y

Page 49: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

49

Interpretation: Model 2

: the average hourly wage for a manwith no experience at all is about $9.30.

: for every additional year ofexperience, the predicted hourly wage increasesabout 4 cents for both men and women.

: the expected hourly wage is $2.20lower for women than it is for men at anyexperience level.

27.9ˆ0

04.0ˆ1

20.2ˆ2

Page 50: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

50

Model 1 vs. Model 2Model 1:

Model 2:

95% CI for 1 in Model 1: (0.001, 0.07)and from Model 2 is within this CI

Gender is not a confounder

)enderG(2.20-)Experience(04.027.9Y iii

ii Experience04.038.8Y

Page 51: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

51

What happens if we considerage, instead? (Model 3)

The relationship is harder to graph with twocontinuous predictors, since now theregression is in a 3-dimensional space.

Notice that age is centered at 40 years.Age ranged between 18 and 64 in thisdataset.

40)-Age(ˆ)Experience(ˆˆY i2i10i

Page 52: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

52

Model 3: Age effect,no experience

For a 40-year-old with no experience:

For a 41-year-old with no experience:0

i

ˆ50.62$)4040(0.920)(82.05.26Y

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

20

i

ˆˆ42.72$)4041(0.920)(82.05.26Y

Page 53: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

53

Model 3: Age effect,10 years experience

For a 40-year-old with 10 years of experience:

For a 41-year-old with 10 years of experience:

10ˆˆ30.18$)4040(0.920)1(82.05.26Y

10

i

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y

210

i

Page 54: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

54

Model 3: Experience effect,40 year old

For a 40-year-old with no experience:

For a 40-year-old with 10 years of experience:0

i

ˆ50.62$)4040(0.920)(82.05.26Y

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

10ˆˆ30.18$)4040(0.920)1(82.05.26Y

10

i

Page 55: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

55

Model 3: Experience effect,41 year old

For a 41-year-old with no experience:

For a 41-year-old with 10 years of experience:

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

20

i

ˆˆ42.72$)4041(0.920)(82.05.26Y

1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y

210

i

Page 56: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

56

Interpretation: Model 3: the average hourly wage for a 40-

year-old with no experience at all is about$26.50

: for every additional year ofexperience, the predicted hourly wage decreasesabout 82 cents for two people of the same age(or “adjusting for age”)

: for every additional year of age, theexpected hourly wage increases about 92 centsfor two people with the same amount ofexperience (or “adjusting for experience”)

5.26ˆ0

82.0ˆ1

92.0ˆ2

Page 57: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

57

Model 1 vs. Model 3Model 1:

Model 3:

95% CI for 1 in Model 1: (0.001, 0.07)and from Model 3 is outside this CI

Age is a confounder. When we adjust for age,the apparent effect of experience on wagechanges.

ii Experience04.038.8Y

40)-Age(0.92)Experience(82.05.62Y iii

Page 58: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

58

The Coefficient of Determination

R2 is the “coefficient of determination”R2 measures the ability to predict Yusing XVariability explained by X is

SSM =Total variability is SST =

2)ˆ( yyi

2)( yyi

Page 59: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

59

The Coefficient of Determination

R2 is defined as

Measures the proportion of totalvariability explained by the model

2

22

)(

)ˆ(

yy

yy

SSTSSMR

i

i

Page 60: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

60

R2 is the square of r, “Pearson’scorrelation coefficient”

r is a rough way of evaluating theassociation between two continuousvariables.

The Coefficient of Determination

Page 61: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

61

So, what is R2?

The coefficient of determination, R2

evaluates the entire model.R2 shows the proportion of the totalvariation in Y that has beenpredicted by this model.

Model 1: 0.0076; 0.8% of variationexplainedModel 2: 0.05; 5% of variation explainedModel 3: 0.20; 20% of variation explained

Page 62: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

62

What is the adjusted R2?

In both models 2 and 3, the new predictoradded a great deal to the model

R2 increased a lotMore importantly, both new predictors werestatistically significant

R2 always goes up!The adjusted R2 is adjusted for the number ofX’s in the model, so it only goes up whenhelpful predictors are added.

Page 63: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression

63

SummaryRegression by least squaresInterpreting regression coefficientsAdding a 2nd predictor to a model

Binary X added: 2 parallel linesContinuous X added: 3-dimensional graphfor both, new interpretation reflecting new model

Is the new X a confounder?Compare 1 across models


Recommended