+ All Categories
Home > Documents > CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount)....

CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount)....

Date post: 21-Jan-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
51
CIVL 7012/8012 Simple Linear Regression Lecture 2
Transcript
Page 1: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

CIVL 7012/8012

Simple Linear Regression

Lecture 2

Page 2: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

โ€ข Correlation is the degree to which two continuous variables are

linearly associated.

โ€ข This is most often represented by a scatterplot and the Pearson

correlation coefficient, denote by (๐‘Ÿ).

โ€ข The scatterplot provides a visual as to how the two continuous

variable are correlated.

โ€ข The coefficient is a measure of the linear association between the

two variables.

Page 3: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

โ€ข If there is no correlation between the two variables, the points will

form a horizontal or vertical line or complete randomness (no obvious

patterns).

โ€ข Note that it does not matter which variable is on x-axis and which is

on the y-axis.

โ€ข The pattern the two variables form determines the strength and

direction of their correlation.

Page 4: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

โ€ข The stronger the correlation, the more

linearly distinct the pattern will be.

โ€ข The coefficient is between -1 and 1.

+1 indicates a perfect positive correlation

-1 indicates a perfect negative correlation

0 indicates no correlation

โ€ข No strict rules for interpretation, however,

as a guideline, it is suggested:

0 < |๐‘Ÿ| < 0.3: weak correlation

0.3 < |๐‘Ÿ| < 0.7: moderate correlation

|๐‘Ÿ| > 0.7: strong correlation

Page 5: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

Snapshot from Multivariate Lecture 6

๐œŒ๐‘‹๐‘Œ is the correlation notation for the entire population.

Pearson correlation coefficient (๐‘Ÿ) is for our sample representing

the population.

๐‘Ÿ = ๐‘ฅ๐‘– โˆ’ ๐‘ฅ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ

๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2 ๐‘ฆ๐‘– โˆ’ ๐‘ฆ 2

Page 6: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation calculation

Meal

Bill ($)

Tip ($)

Bill deviations

Tip deviations

Deviations products

Bill deviations squared

Tip deviations squared

๐‘ฅ ๐‘ฆ ๐‘ฅ๐‘– โˆ’ ๐‘ฅ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ (๐‘ฅ๐‘– โˆ’ ๐‘ฅ )(๐‘ฆ๐‘– โˆ’ ๐‘ฆ ) ๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2 ๐‘ฆ๐‘– โˆ’ ๐‘ฆ 2

1 35 6 -37.5 -4 150 1406.25 16

2 110 18 37.5 8 300 1406.25 64

3 66 11 -6.5 1 -6.5 42.25 1

4 75 7 2.5 -3 -7.5 6.25 9

5 100 14 27.5 4 110 756.25 16

6 49 4 -23.5 -6 141 552.25 36

687 4169.5 142

๐‘Ÿ = ๐‘ฅ๐‘– โˆ’ ๐‘ฅ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ

๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2 ๐‘ฆ๐‘– โˆ’ ๐‘ฆ 2=

687

(4169.5)(142) = 0.892

Page 7: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation significance test (t-test)

โ€ข Is it statistically significant?

โ€ข Conduct a t-test

โ€ข ๐ป0: ๐œŒ = 0 ๐‘ฃ๐‘ . ๐ป1: ๐œŒ โ‰  0 ๐‘Ž๐‘ก ๐›ผ = 0.05

โ€ข ๐‘ก = ๐‘Ÿ๐‘›โˆ’2

1โˆ’๐‘Ÿ2, df=n-2

โ€ข ๐‘ก = 0.8926โˆ’2

1โˆ’0.8922= 3.947

๐‘Ÿ = 0.892

Page 8: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation significance test (t-test)

โ€ข ๐ป0: ๐œŒ = 0 ๐‘ฃ๐‘ . ๐ป1: ๐œŒ โ‰  0 ๐‘Ž๐‘ก ๐›ผ = 0.05

โ€ข ๐‘ก = ๐‘Ÿ๐‘›โˆ’2

1โˆ’๐‘Ÿ2, df=n-2

โ€ข ๐‘ก = 0.8926โˆ’2

1โˆ’0.8922= 3.947

โ€ข ๐‘ก๐‘๐‘Ž๐‘™๐‘ > ๐‘ก๐‘๐‘Ÿ๐‘–๐‘ก. โˆ’โˆ’โ†’ ๐‘Ÿ๐‘’๐‘—๐‘’๐‘๐‘ก ๐‘›๐‘ข๐‘™๐‘™

Page 9: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Lecture 1 Recap

Page 10: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Quick Review

โ€ข SLR is a comparison of 2 models:

โ€ข One is where the independent variable does not exist

โ€ข And the other uses the best-fit regression line

โ€ข If there is only one variable, the best prediction for other

values is the mean of the dependent variable.

โ€ข The distance between the best-fit line and the observed

value is called residual (or error).

โ€ข The residuals are squared and added together to

generate sum of squares residuals/error (SSE).

โ€ข SLR is designed to find the best fitting line through the

data that minimizes the SSE.

Page 11: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Example

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7

Tip

($

)

Meal #

Tips for service ($)

๐‘ฆ =10

Best-fit line

Meal # Tip ($)

1 6

2 18

3 11

4 7

5 14

6 4

Page 12: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7

Tips for service ($)

16 1

16

64

9 36

Recap - Residuals (Errors)

+8

+1

โˆ’3

+4

โˆ’6 Squared Residuals (Errors)

# Residual Residual2

1 โˆ’4 16

2 +8 64

3 +1 1

4 โˆ’3 9

5 +4 16

6 โˆ’6 36

Sum of squared errors (SSE)

= 142

๐‘น๐’†๐’”๐’Š๐’…๐’–๐’‚๐’๐’”๐Ÿ = ๐Ÿ๐Ÿ’๐Ÿ

โˆ’4

Page 13: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap โ€“ Population vs. Sample Eq.

โ€ข If we knew our โ€œpopulationโ€ parameters, ๐›ฝ0, ๐›ฝ1, then we could use the SLR eq. as is.

โ€ข In reality, we almost never have the population parameters. Therefore we have to estimate them using sample data. With sample data, SLR eq. changes a bit.

โ€ข Where ๐‘ฆ โ€œy-hatโ€ is the point estimator of ๐ธ ๐‘ฆ .

โ€ข Or, ๐‘ฆ is the mean value of ๐‘ฆ for a given ๐‘ฅ.

๐ธ ๐‘ฆ = ๐›ฝ0 + ๐›ฝ1๐‘ฅ

๐‘ฆ = ๐‘0 + ๐‘1๐‘ฅ

Page 14: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap โ€“ OLS criterion

๐‘ฆ๐‘– = observed value of dependent variable (tip amount).

๐‘ฆ ๐‘– =estimated (predicted) value of the dependent variable

(predicted tip amount based on regression model).

min ๐‘ฆ๐‘– โˆ’ ๐‘ฆ ๐‘–2

0

5

10

15

20

0 50 100 150

observed

predicted

Page 15: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - SLR parameter equations

๐‘ฆ ๐‘– = ๐‘0 + ๐‘1๐‘ฅ

๐‘1 = ๐‘ฅ๐‘– โˆ’ ๐‘ฅ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ

๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2

slope

๐‘ฅ = mean of the independent variable ($

bill)

๐‘ฆ = mean of the dependent variable ($ tip)

๐‘ฅ๐‘– = value of the independent variable

๐‘ฆ๐‘– = value of the dependent variable

๐‘0 = ๐‘ฆ โˆ’ ๐‘1๐‘ฅ

intercept

Page 16: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

Meal Bill ($) Tip ($) Bill deviations

(๐‘†๐‘ฅ) Tip deviations Deviations products

Bill deviations squared ๐‘†๐‘ฅ

2

๐‘ฅ ๐‘ฆ ๐‘ฅ๐‘– โˆ’ ๐‘ฅ ๐‘ฆ๐‘– โˆ’ ๐‘ฆ (๐‘ฅ๐‘– โˆ’ ๐‘ฅ )(๐‘ฆ๐‘– โˆ’ ๐‘ฆ ) ๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2

1 35 6 -37.5 -4 150 1406.25

2 110 18 37.5 8 300 1406.25

3 66 11 -6.5 1 -6.5 42.25

4 75 7 2.5 -3 -7.5 6.25

5 100 14 27.5 4 110 756.25

6 49 4 -23.5 -6 141 552.25

๐‘ฅ = 72.5 ๐‘ฆ = 10 687 4169.5

Page 17: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

Deviations products Bill deviations squared

(๐’™๐’Š โˆ’ ๐’™ )(๐’š๐’Š โˆ’ ๐’š ) ๐’™๐’Š โˆ’ ๐’™ ๐Ÿ

150 1406.25

300 1406.25

-6.5 42.25

-7.5 6.25

110 756.25

141 552.25

๐Ÿ”๐Ÿ–๐Ÿ• ๐Ÿ’๐Ÿ๐Ÿ”๐Ÿ—. ๐Ÿ“

๐’ƒ๐Ÿ = ๐’™๐’Š โˆ’ ๐’™ ๐’š๐’Š โˆ’ ๐’š

๐’™๐’Š โˆ’ ๐’™ ๐Ÿ

๐’ƒ๐Ÿ =๐Ÿ”๐Ÿ–๐Ÿ•

๐Ÿ’๐Ÿ๐Ÿ”๐Ÿ—. ๐Ÿ“

๐’ƒ๐Ÿ = ๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–

Page 18: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

๐’ƒ๐ŸŽ = ๐Ÿ๐ŸŽ โˆ’ ๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–(๐Ÿ•๐Ÿ. ๐Ÿ“)

๐’ƒ๐Ÿ = ๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–

๐’ƒ๐ŸŽ = ๐’š + ๐’ƒ๐Ÿ๐’™

Bill ($) Tip ($)

๐’™ ๐’š

35 6

110 18

66 11

75 7

100 14

49 4

๐‘ฅ = 72.5 ๐‘ฆ = 10

๐’ƒ๐ŸŽ = ๐Ÿ๐ŸŽ โˆ’ ๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ•

๐’ƒ๐ŸŽ = โˆ’๐Ÿ. ๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ•

Page 19: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap โ€“ New Best-Fit Line & Parameters

๐‘ฆ ๐‘– = ๐‘0 + ๐‘1๐‘ฅ

๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648๐‘ฅ

๐‘0 = โˆ’1.9457

intercept

๐‘1 = 0.1648

slope

๐‘ฆ ๐‘– = 0.1648๐‘ฅ โˆ’ 1.9457

OR

Page 20: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Final SLR line

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120

Tip

($

)

Bill ($)

Bill vs. Tip Amount ($)

๐’š ฬ‚_๐’Š =โˆ’๐Ÿ.๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ• +๐ŸŽ.๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–๐’™

๐’ƒ๐ŸŽ=โˆ’๐Ÿ.๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ•

๐’”๐’๐’๐’‘๐’† ๐’ƒ๐Ÿ = ๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–

Page 21: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - SLR Model Interpretation

๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648๐‘ฅ

For every $1 the bill amount (๐‘ฅ) increases, we would expect the tip

amount to also increase by $0.1648 or

about 16 cents (positive coefficient).

If the bill amount (๐‘ฅ) is zero, then the

expected/predicted tip amount is $-

1.9457 or negative $1.95!

Does this make any sense? NO In real

world problems, the intercept may or

may not make sense.

Page 22: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR โ€“ Lecture 2

Page 23: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 50 100 150

Bills vs Tips ($)

0

5

10

15

20

0 1 2 3 4 5 6 7

Tips ($)

Model fit and Coefficient of Determination

๐‘บ๐‘บ๐‘ฌ = ๐Ÿ๐Ÿ’๐Ÿ

๐‘บ๐‘บ๐‘ฌ = ๐‘บ๐‘บ๐‘ป

With only the DV, the only sum

of squares is due to error.

Therefore, it is also the total,

and MAX sum of squares for

this data sample. ๐‘บ๐‘บ๐‘ป = ๐Ÿ๐Ÿ’๐Ÿ

With both the IV and DV, SST

remains the same. But the SSE

is reduced significantly. The

difference between the SSE

and SST is due to regression

(SSR).

๐‘บ๐‘บ๐‘ป = ๐Ÿ๐Ÿ’๐Ÿ

๐‘บ๐‘บ๐‘ฌ = ?

๐‘บ๐‘บ๐‘ป โˆ’ ๐‘บ๐‘บ๐‘ฌ = ๐‘บ๐‘บ๐‘น

Page 24: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Estimate regression values

Meal Bill ($) Tip ($) ๐’š ๐’Š = โˆ’๐Ÿ. ๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ• +๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–๐’™ ๐’š ๐’Š (predicted tip $)

๐‘ฅ๐‘– ๐‘ฆ๐‘–

1 35 6 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(35) 3.8212

2 110 18 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(110) 16.1788

3 66 11 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(66) 8.9290

4 75 7 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(75) 10.4119

5 100 14 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(100) 14.5311

6 49 4 ๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648(49) 6.1280

๐‘ฅ = 72.5 ๐‘ฆ = 10

min ๐‘ฆ๐‘– โˆ’ ๐‘ฆ ๐‘–2

Page 25: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Regression errors (residuals)

Meal Bill ($) Tip ($) ๐’š ๐’Š (predicted tip $) Error (๐’š โˆ’ ๐’š ๐’Š)

๐‘ฅ ๐‘ฆ (observed-predicted)

1 35 6 3.8212 6 โˆ’ 3.8212 = 2.1788

2 110 18 16.1788 18 โˆ’ 16.1788 = 1.8212

3 66 11 8.9290 11 โˆ’ 8.9290 = 2.0710

4 75 7 10.4119 7 โˆ’ 10.4119 = -3.4119

5 100 14 14.5311 14 โˆ’ 14.5311 = -0.5311

6 49 4 6.1280 4 โˆ’ 6.1280 = -2.1280

๐‘ฅ = 72.5 ๐‘ฆ = 10

Page 26: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Meal Bill ($) Tip ($) ๐’š ๐’Š (predicted tip $) Error (๐’š โˆ’ ๐’š ๐’Š) (๐’š โˆ’ ๐’š ๐’Š)๐Ÿ

๐‘ฅ ๐‘ฆ

1 35 6 3.8212 2.1788 4.7472

2 110 18 16.1788 1.8212 3.3168

3 66 11 8.9290 2.0710 4.2890

4 75 7 10.4119 -3.4119 11.6412

5 100 14 14.5311 -0.5311 0.2821

6 49 4 6.1280 -2.1280 4.5282

Regression errors (residuals) - SSE

๐‘ฅ = 72.5 ๐‘ฆ = 10 ๐‘†๐‘†๐ธ = 28.8044

Page 27: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SSE comparison

Sum of squared error (SSE) Comparison

D.V. (tip $) ONLY

+ + + + + = SSE = 28.8044

16 1 16 64 9 36 + + + + + = SSE = 142

D.V. & I.V (tip $ as a function of bill $)

Page 28: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Comparison of two lines

โ€ข When we conducted the regression, the SSE decreased

from 142 to 28.8044.

โ€ข 28.8044 was explained by (allocated to) ERROR.

โ€ข What happen to the difference (113.1956)?

โ€ข 113.1956 is the sum of squares due to REGRESSION

(SSR).

โ€ข ๐‘†๐‘†๐‘‡ = ๐‘†๐‘†๐‘… + ๐‘†๐‘†๐ธ

โ€ข In this case:

142 = 113.1956 + 28.8044

Page 29: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 50 100 150

Bills vs Tips ($)

0

5

10

15

20

0 1 2 3 4 5 6 7

Tips ($)

Comparison of two lines

๐‘บ๐‘บ๐‘ฌ = ๐Ÿ๐Ÿ’๐Ÿ

๐‘บ๐‘บ๐‘ฌ = ๐‘บ๐‘บ๐‘ป

๐‘บ๐‘บ๐‘ป = ๐Ÿ๐Ÿ’๐Ÿ

๐‘บ๐‘บ๐‘ป = ๐Ÿ๐Ÿ’๐Ÿ

๐‘บ๐‘บ๐‘ฌ = ๐Ÿ๐Ÿ–. ๐Ÿ–๐ŸŽ๐Ÿ’๐Ÿ’

๐‘บ๐‘บ๐‘ป โˆ’ ๐‘บ๐‘บ๐‘ฌ = ๐‘บ๐‘บ๐‘น = ๐Ÿ๐Ÿ๐Ÿ‘. ๐Ÿ๐Ÿ—๐Ÿ“๐Ÿ”

Page 30: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Coefficient of Determination (๐‘Ÿ2)

โ€ข How well does the estimated regression equation fit our

data?

โ€ข This is where regression starts to look a lot like ANOVA,

where the SST is partitioned into SSE & SSR.

โ€ข The larger the SSR the smaller the SSE.

โ€ข The Coefficient of Determination quantifies this ratio as a

percentage (%).

SSE

SST

SSR

๐ถ๐‘œ๐‘’๐‘“๐‘“๐‘–๐‘๐‘–๐‘’๐‘›๐‘ก ๐‘œ๐‘“ ๐ท๐‘’๐‘ก๐‘’๐‘Ÿ๐‘š๐‘–๐‘›๐‘Ž๐‘ก๐‘–๐‘œ๐‘› = ๐‘Ÿ2 =๐‘†๐‘†๐‘…

๐‘†๐‘†๐‘‡

Page 31: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Coefficient of Determination (๐‘Ÿ2)

โ€ข How well does the estimated regression equation fit our

data?

โ€ข This is where regression starts to look a lot like ANOVA,

where the SST is partitioned into SSE & SSR.

โ€ข The larger the SSR the smaller the SSE.

โ€ข The Coefficient of Determination quantifies this ratio as a

percentage (%).

SSE

SST

SSR

ANOVA

df SS MS F Significance F

Regression 1 113.1956 113.1956 15.7192 0.016611541

Residual 4 28.80441 7.201103

Total 5 142

Page 32: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

๐‘Ÿ2 Interpretation

โ€ข ๐ถ๐‘œ๐‘’๐‘“๐‘“๐‘–๐‘๐‘–๐‘’๐‘›๐‘ก ๐‘œ๐‘“ ๐ท๐‘’๐‘ก๐‘’๐‘Ÿ๐‘š๐‘–๐‘›๐‘Ž๐‘ก๐‘–๐‘œ๐‘› = ๐‘Ÿ2 =๐‘†๐‘†๐‘…

๐‘†๐‘†๐‘‡

โ€ข ๐ถ๐‘œ๐‘’๐‘“๐‘“๐‘–๐‘๐‘–๐‘’๐‘›๐‘ก ๐‘œ๐‘“ ๐ท๐‘’๐‘ก๐‘’๐‘Ÿ๐‘š๐‘–๐‘›๐‘Ž๐‘ก๐‘–๐‘œ๐‘› = ๐‘Ÿ2 =113.1956

142

โ€ข ๐ถ๐‘œ๐‘’๐‘“๐‘“๐‘–๐‘๐‘–๐‘’๐‘›๐‘ก ๐‘œ๐‘“ ๐ท๐‘’๐‘ก๐‘’๐‘Ÿ๐‘š๐‘–๐‘›๐‘Ž๐‘ก๐‘–๐‘œ๐‘› = ๐‘Ÿ2 = 0.7972 ๐‘œ๐‘Ÿ 79.72%

โ€ข We can conclude that 79.72% of the total sum of squares

can be explained using the estimates from the regression

equation to predict the tip amount. And that the remainder

(20.28%) is error.

โ€ข This is a โ€œGood fitโ€!

Page 33: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

30 40 50 60 70 80 90 100 110

Tip

($

)

Bill ($)

3 squared differences

๐’š ๐’Š = โˆ’๐Ÿ. ๐Ÿ—๐Ÿ’๐Ÿ“๐Ÿ• +๐ŸŽ. ๐Ÿ๐Ÿ”๐Ÿ’๐Ÿ–๐’™

Bills vs. Tips ($)

๐’š = ๐Ÿ๐ŸŽ

SSE= (๐‘ฆ๐‘– โˆ’ ๐‘ฆ ๐‘–)2

SST= (๐‘ฆ๐‘– โˆ’ ๐‘ฆ )2

SSR= (๐‘ฆ ๐‘– โˆ’ ๐‘ฆ )2

Page 34: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Model fit

๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648๐‘ฅ

Questions:

โ€ข Once a regression line is calculated, how much better is it than only

using the mean of the dependent variable line alone? (coefficient of

determination (๐‘Ÿ2)

โ€ข How confident are we in the significance of the relationship between x

and y? (t-test of slope)

Page 35: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Regression with Excel

โ€ข Produce SLR model in Excel.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.892834

R Square 0.797152

Adjusted R Square 0.74644

Standard Error 2.683487

Observations 6

ANOVA

df SS MS F Significance F

Regression 1 113.1956 113.1956 15.7192 0.016611541

Residual 4 28.80441 7.201103

Total 5 142

Coefficien

ts Standard

Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept -1.94568 3.205964 -0.60689 0.576683 -10.84685887 6.955504991 -10.84685887 6.955504991

X Variable 1 0.164768 0.041558 3.964745 0.016612 0.049383684 0.280152232 0.049383684 0.280152232

Page 36: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -1

โ€ข Is the relationship between ๐‘ฆ and ๐‘ฅ significant?

โ€ข Test the slope ๐›ฝ1. (two-tailed t-test)

โ€ข Remember ๐‘1is for our sample and ๐›ฝ1 is for the population

โ€ข We will use our sample slope ๐‘1 to test if the true slope of

the population ๐›ฝ1 is significantly different than 0.

๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648๐‘ฅ

Page 37: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -2

Steps to conduct a t-test on slope ๐›ฝ1:

โ€ข Step 1: Specify hypothesis:

โ€ข ๐ป0: ๐›ฝ1 = 0 ๐‘ฃ๐‘ . ๐ป1: ๐›ฝ1 โ‰  0 ๐‘Ž๐‘ก ๐›ผ = 0.05

โ€ข Step 2: Determine the test statistic:

๐‘ก =๐‘1โˆ’๐›ฝ1

๐‘†๐ธ๐‘1

โ€ข where ๐›ฝ1 is true coefficient for all population

โ€ข where ๐‘†๐ธ๐‘1 =๐‘†๐‘†๐ธ๐‘›โˆ’2

(๐‘ฅโˆ’๐‘ฅ )2

= standard error of the slope ๐‘1

Page 38: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -3

โ€ข Step 2 calculation:

โ€ข ๐‘†๐ธ๐‘1 =๐‘†๐‘†๐ธ๐‘›โˆ’2

(๐‘ฅโˆ’๐‘ฅ )2

=28.8044(6โˆ’2)

4169.5

= 0.0416

โ€ข ๐‘ก =๐‘1โˆ’๐›ฝ1

๐‘†๐ธ๐‘1=

0.1648โˆ’0

0.0416= 3.9615

โ€ข Step 3: Quantify the evidence of the test

โ€ข Method 1: Critical value method

โ€ข Compare calculated t to critical t

โ€ข ยฑ๐‘ก1โˆ’๐›ผ

2,๐‘›โˆ’2 = ยฑ๐‘ก0.975,4

๐‘ฆ ๐‘– = โˆ’1.9457 +0.1648๐‘ฅ

Page 39: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -4

โ€ข Step 3: Quantify the evidence of the test

โ€ข Method 1: Critical value method

โ€ข Compare calculated ๐‘ก to critical ๐‘ก (remember ๐›ผ = 0.05)

โ€ข ยฑ๐‘ก1โˆ’๐›ผ

2,๐‘›โˆ’2 = ยฑ๐‘ก0.975,4 = 2.776

Page 40: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -5

โ€ข Step 3: Method 1: Critical value method

โ€ข Compare calculated ๐‘ก to critical ๐‘ก (remember ๐›ผ = 0.05)

โ€ข ๐‘ก๐‘๐‘Ž๐‘™๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘’๐‘‘ = 3.9615 > ๐‘ก๐‘๐‘Ÿ๐‘–๐‘ก๐‘–๐‘๐‘Ž๐‘™ = 2.776

โ€ข T calc is in the critical region so Reject null hypothesis ๐ป0: ๐›ฝ1 = 0

meaning that our ๐›ฝ1 โ‰  0 and we do have a statistically significant

relationship between ๐‘ฅ and ๐‘ฆ. .

0.95

0.025 0.025

Page 41: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -6

โ€ข Step 3: Method 2: p-value method

โ€ข Compare calculated/estimated ๐‘ value to desired significance

level. (remember ๐›ผ = 0.05)

โ€ข ๐‘๐‘๐‘Ž๐‘™๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘’๐‘‘/๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘‘ = 2๐‘ ๐‘ก > ๐‘๐‘œ๐‘š๐‘๐‘ข๐‘ก๐‘’๐‘‘ ๐‘ก = 2๐‘(๐‘ก > 3.9615) โ‰ˆ

0.03

โ€ข ๐‘ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ ๐‘œ๐‘“ 0.03 < ๐›ผ = 0.05, therefore reject null hypothesis

๐ป0: ๐›ฝ1 = 0 meaning that our ๐›ฝ1 โ‰  0 and we do have a statistically

significant relationship between ๐‘ฅ and ๐‘ฆ. .

Page 42: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Example with R

โ€ข Start R session

โ€ข Import dataset โ€œairqualityโ€ included in R base

โ€ข Explore and plot data

โ€ข Run a simple linear regression model with

โ€œOzoneโ€ as a DV (๐‘ฆ)

โ€œTempโ€ as an IV (๐‘ฅ)

โ€ข Follow in R session and model results are as follows:

Page 43: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Example with R

โ€ข Dataset = airquality ----> 153 obs. of 6 variables

โ€ข Start R session and follow instructions in code

โ€ข Use simple linear regression to predict ozone levels โ€œOzoneโ€ based on the

temperature โ€œTempโ€.

ID Ozone Solar.R Wind Temp Month Day

1 41 190 7.4 67 5 1

2 36 118 8 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

7 23 299 8.6 65 5 7

8 19 99 13.8 59 5 8

9 8 19 20.1 61 5 9

10 NA 194 8.6 69 5 10

Page 44: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Step 1: scatter plot

Ozone Temp

41 67

36 72

12 74

18 62

NA 56

28 66

23 65

19 59

8 61

NA 69

Page 45: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

STEP 3: CORRELATION (Ozone vs Temp)

โ€ข What is the correlation coefficient (r) for Ozone vs. Temp? (see R session)

In this case, ๐‘Ÿ = .698

โ€ข Is the relationship strong?

MODERATE! --------> RUN MODEL see R session

Page 46: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Model results (model m1)

โ€ข ๐‘ฆ = ๐›ฝ0 + ๐›ฝ1๐‘ฅ

โ€ข ๐›ฝ0 = โˆ’146.996 (Intercept) ๐›ฝ1 = +2.429 (Slope)

โ€ข Regression line for this model ---> ๐‘ฆ = โˆ’146.996 +2.429(๐‘ฅ)

Page 47: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -1

Residuals:

โ€ข Residuals are the differences between the actual observed response values

(distance to Ozone levels in our case) and the response values that the

model predicted.

โ€ข The โ€œResidualsโ€ section of the model output breaks it down into 5 summary

points to assess how well the model fit the data.

โ€ข A good fit model will show symmetry from the min to max around the mean

value (0).

โ€ข We do not have a very good symmetry here.

โ€ข So, the model is predicting certain points that fall far away from the actual

observed points.

Page 48: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -2

Model Coefficients:

โ€ข ๐›ฝ0 = โˆ’146.996 (๐‘ฆ โˆ’ ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก)

No interpretational meaning; but it is the Ozone level value when Temp = 0

โ€ข ๐›ฝ1 = +2.429 (๐‘†๐‘™๐‘œ๐‘๐‘’)

For every 1 degree โ„‰ the temperature increases (๐‘ฅ), it is expected that the

Ozone level to also increase by 2.429 units.

โ€ข ๐‘ ๐‘ก๐‘‘. ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ = 0.2331

We can say that Ozone level/units can vary by 0.2331.

โ€ข t-value for โ€œTempโ€ = ๐‘๐‘œ๐‘’๐‘“๐‘“๐‘–๐‘๐‘–๐‘’๐‘›๐‘ก

๐‘ ๐‘ก๐‘‘. ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ =

2.429

0.233 = 10.418

t-value is significant Pr (> |๐‘ก|) = 2๐‘’โˆ’16 ; which is significant at any level of

significance (you could say at 99.99% level of confidence or 0.001).

Page 49: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -3

โ€ข Residual Standard Error = 23.71 on 114 degrees of freedom

โ€ข The Residual Standard Error is the average amount that the response

โ€œOzoneโ€ will deviate from the true regression line.

โ€ข In our example, the actual Ozone level can deviate from the true regression

line by approximately 23.71 units, on average.

โ€ข Degrees of freedom are the actual number of data points (observations)

minus 2 (taking into account the parameters for the โ€œinterceptโ€ and the

โ€œOzoneโ€ variables).

So, we started the model with 153 data point in the โ€œairqualityโ€ dataset

We removed 37 data points that were N/Aโ€™s

We are left with 116 data points

116 data points will lead to (116-2 parameters) = 114 DF

Page 50: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -4

โ€ข ๐‘…-squared = 0.4877 (๐‘…2 = coefficient of determination)

๐‘…2 varies from 0 ๐‘ก๐‘œ 1; in this case, 48.77% of (๐‘ฆ) is explained by (๐‘ฅ)

โ€ข Adjusted ๐‘…2 = 0.4832

Adjusted ๐‘…2 accounts for how many independent variables entered the

model. Typically lower than ๐‘…2 based on how much contribution

additional independent variables (๐‘ฅโ€™๐‘ )added to explaining (๐‘ฆ)

A sharp drop in the adjusted ๐‘…2 versus ๐‘…2 indicates a bad model.

๐‘ญ-Test (F-value is used for measuring the overall model significance).

โ€ข At the desired level of significance (say 95%), the statistical significance of

the ๐น-test will show how good of a model this is.

โ€ข In this model, the ๐น-statistic = 108.5 on 1 variable with 114

โ€ข The ๐น-statistic level of significance is Pr (> ๐น) = 2.2๐‘’โˆ’16; that is the ๐น-statistic

is significant at any reasonable level of significance (or you could say @

99.99%).

Page 51: CIVL 7012/8012 - Memphis Linear...๐‘–= observed value of dependent variable (tip amount). ๐‘–=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR โ€“ R code


Recommended