+ All Categories
Home > Documents > FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept...

FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept...

Date post: 18-Dec-2015
Category:
Upload: poppy-rich
View: 214 times
Download: 0 times
Share this document with a friend
33
FPP 10 kind of Regression 1
Transcript
Page 1: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

FPP 10 kind of

Regression

1

Page 2: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Plan of attackIntroduce regression model

Correctly interpret intercept and slope

Prediction

Pit falls to avoid

2

Page 3: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression lineCorrelation coefficient a nice numerical

summary of two quantitative variablesIt indicates direction and strength of

association

But does it quantify the association?

It would be of interest to do this forPredictionsUnderstanding phenomena

3

Page 4: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression line Correlation measures the direction and strength

of the straight-line (linear) relationship between two quantitative variables

If a scatter plot shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatter plot

This line represents a mathematical model. Later we will make the mathematical model a statistical one.

4

Page 5: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Slope intercept form review

5

Page 6: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression lineSlope intercept form notation

Regression form notation€

y = mx + b

ˆ y = α + βx

6

Page 7: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression

Price of Homes Based on Square FeetPrice = -90.2458 + 0.1598SQFT

r = 0.8718945

7

Page 8: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Which line is best

Price = -90.2458 + 0.1598SQFT (red)Price = -300 + 0.3SQFT (blue)Price = 0 + 0.1SQFT (green)

8

Page 9: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Which model to useDifferent people might draw different lines by

eye on a scatterplot

What are some ways we can determine which model(line) out of all the possible models(lines) is the “best” one?

What are some ways that we can numerically rank the different models? (i.e. the different lines)

This will come later in the course

9

Page 10: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Slope interpretation

The slope, β, of a regression line is almost always important for interpreting the data.

The slope is a rate of change. It is the mean amount of change in y-hat when x increases by 1

10

ˆ y = α + βx

Page 11: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Slope interpretationPrice of Homes Based on Square FeetPrice = -90.2458 + 0.1598SQFT

r = 0.8718945

For every 1 sqft increase in size of home on average the house price increases by $159.8 dollars

11

Page 12: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Intercept interpretation

The intercept, α, of the regression line is the value of y-hat when x = 0. Although we need the value of the intercept to draw the line, it is statistically meaningful only when x can actually take values close to zero.

12

ˆ y = α + βx

Page 13: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Intercept interpretationPrice of Homes Based on Square FeetPrice = -90.2458 + 0.1598SQFT

r = 0.8718945

If the sqft of a home was 0 on average the house price will be -$90,245.80 dollars

This doesn’t make much sense here because x (sqft) doesn’t take on values close to zero.

13

Page 14: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Prediction

Price of Homes Based on Square FeetPrice = -90.2458 + 0.1598SQFT

r = 0.8718945

For a 3500 sqft home we would predict the selling price to be price = -90.2458 + 0.1598*3500 price = $469,054.2

14

Page 15: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

OECD data: Income and unemployment in the U.S.What is the relationship between

households’ disposable income and the nation’s unemployment rate?

Data from the U.S. 1980 to 1998(data provided by the economics department

at Duke)

15

Page 16: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Disposable income vs unemployment rates

3500000

4000000

4500000

5000000

5500000

Rea

l Hou

seho

ld D

ispo

sabl

e In

com

e

5 6 7 8 9 10

Unemployment Rate

Linear Fit

Bivariate Fit of Real Household Disposable Income By Unemployment Rate

16

Page 17: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Disposable income and unemployment rates regression output

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Hou

seho

ld D

ispo

sabl

e In

com

e

5 6 7 8 9 10

Unemployment Rate

Linear Fit

Household Disposable Income = 8266987.1 - 664053.26 Unemployment

Rate

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.507648

0.478687

920643.7

3833103

19

Summary of Fit

Model

Error

C. Total

Source

1

17

18

DF

1.48566e13

1.44089e13

2.92656e13

Sum of Squares

1.4857e13

8.4758e11

Mean Square

17.5282

F Ratio

0.0006

Prob > F

Analysis of Variance

Intercept

Unemployment Rate

Term

8266987.1

-664053.3

Estimate

1079905

158611.4

Std Error

7.66

-4.19

t Ratio

<.0001

0.0006

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of Household Disposable Income By Unemployment Rate

17

Page 18: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Facts about regressionThere is a close relationship between the

correlation coefficient and the slope of a regression line

They have the same signThey are proportional to each other

The intercept has no relationship with the correlation coefficient but here is the formula

β =rσ y

σ x

or β = rSDy

SDx

α =μy − βμ x18

Page 19: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Facts about regressionThe distinction between explanatory and

response variable is essential in regressionIf you have a slope computed using x as the

explanatory and y as the response variable you can’t “back solve” to get a slope and intercept for the regression model with x being the response and y the explanatory variables.

If you want to predict x given a y then you must find the intercept and slope with y being the explanatory variable and x being the response

19

Page 20: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Facts about regressionR2 (coefficient of determination) provides a

one number summary of how well regression line fits data

R2 is the percentage of variation in Y’s explained by the regression line

R2 lies between 0 and 1Values near 1 indicate regression predicts y’s

in data set very closelyValues near 0 indicate regression does not

predict the y’s in the data set very closely

20

Page 21: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Facts about regressionExample:

The correlation coefficient between sale price and square feet was r = 0.8718945

Thus the coefficient of determination is R2=(0.8718)2=0.76

So 76% of the variability in sale price is explained by (taken into account by) the regression line with square feet.

21

Page 22: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Does regression fit data well?A regression line is reasonable if

Association between two variables is indeed linear

When points are randomly scattered around line

Income/unemployment rate data well-described by regression line.

22

Page 23: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression of AIDS rates per 1000 people of GNP per capita

Line is too low for GDP values near zero and too high for big GDP values.

We shouldn’t use line for predictions

0

50

100

150

200

HIV

/AID

S p

er 1

000

0 5000 10000 15000 20000 25000 30000

GDP/capita

Linear Fit

Linear Fit

HIV/AIDS per 1000 = 34.840498 - 0.0013312 GDP/capita

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.073769

0.051715

39.24748

27.33318

44

Summary of Fit

Model

Error

C. Total

Source

1

42

43

DF

5152.577

64695.328

69847.905

Sum of Squares

5152.58

1540.36

Mean Square

3.3450

F Ratio

0.0745

Prob > F

Analysis of Variance

Intercept

GDP/capita

Term

34.840498

-0.001331

Estimate

7.201186

0.000728

Std Error

4.84

-1.83

t Ratio

<.0001

0.0745

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of HIV/AIDS per 1000 By GDP/capita

23

Page 24: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Changing the response variableWhen the regression line fits the data

badly, sometimes you can transform variables to obtain a better fitting line.

With monetary variables, typically this can be accomplished by taking logarithms.

24

Page 25: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Regression of log(AIDS) on log(GNP)

Much better fit

Predict log(AIDS) from log(GNP). Exponentiate to estimate AIDS

-1

0

1

2

3

4

5

6

loga

ids

6 6.5 7 7.5 8 8.5 9 9.5 10 10.5

logGNP

Linear Fit

logaids = 8.8562593 - 0.8185802 logGNP

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.346571

0.331013

1.213907

2.312979

44

Summary of Fit

Intercept

logGNP

Term

8.8562593

-0.81858

Estimate

1.398379

0.173436

Std Error

6.33

-4.72

t Ratio

<.0001

<.0001

Prob>|t|

Parameter Estimates

-3

-1

1

3R

esid

ual

6 6.5 7 7.5 8 8.5 9 9.5 10 10.5

logGNP

Linear Fit

Bivariate Fit of logaids By logGNP

25

Page 26: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Birth and death rates in 74 countries

5

10

15

20

25

30

deat

h

10 20 30 40 50

birth

5

10

15

20

25

30

deat

h

10 20 30 40 50

birth

26

Page 27: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Warnings about regressionPredicting y at values of x beyond the

range of x in the data is called extrapolation

This is risky, because we have no evidence to believe that the association between x and y remains linear for unseen x values

Extrapolated predictions can be absolutely wrong

27

Page 28: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

ExtrapolationDiamond price and

carat

Explanatory variable is measured by carats and response variable is dollars

Predict price of hope diamond

ˆ y = 48.88 + 2430.77(45.52) = $110,697.5328

Page 29: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

ExtrapolationThe relationship

between diamond carat and price doesn’t remain linear after a carat size of about 0.4

29

Page 30: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

ExtrapolationGreen line is

linear fit with only diamonds less then 0.4 carats

Blue line is linear fit with all carat sizes

Red curve a quadratic fit

30

Page 31: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Lurking variableA variable not being considered could be

driving the relationship

In practice this is a difficult issue to tackle. Especially when everything seems OK

31

Page 32: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Influential pointAn outlier in either the X or Y direction

which, if removed, would markedly change the value of the slope and y-interept.

applet

32

Page 33: FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

CausalityOn its own, regression only quantifies an

association between x and y

It does not prove causality

Under a carefully designed experiment (or in some cases observational studies) regression can be used to show causality.

33


Recommended