Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note.

Class 22. Understanding Regression

EMBSPart of 12.7

Sections 1-3 and 7 of Pfeifer Regression note

What is the regression line?

• It is a line drawn through a cloud of points.• It is the line that minimizes sum of squared errors.– Errors are also known as residuals.– Error = Actual – Predicted.– Error is the vertical distance point (actual) to line

(predicted).– Points above the line are positive errors.

• The average of the errors will be always be zero• The regression line will always “go through” the

average X, average Y.

Error aka residualPredicted aka fitted

X

YCan you draw the regression line?

X

YA

B

C

D

E

Which is the regression line?

F

X

Y

D

Which is the regression line?

X

YWhich is the regression line?

(1,1) (3,1)

(2,7)

(3,3)(2,3)(1,3) Error

= 7-3 = 4

Error = 1-3 = -2

Error = 1-3 = -2

Sum of Errors is 0!

SSE=(-2^2+4^2+-2^2) is smaller than from any other

line. The line goes through (2,3), the average.

Draw in the regression line…

40 60 80 100 120 140 160 1800

20406080

100120140160180

40 60 80 100 120 140 1600

20

40

6080

100

120

140

160

20 40 60 80 100 120 140 1600

20

40

60

80

100

120

140

160

0 50 100 150 200 250 30035

55

75

95

115

135

155

Draw in the regression line…

40 60 80 100 120 140 160 1800

20406080

100120140160180

40 60 80 100 120 140 1600

20

40

6080

100

120

140

160

20 40 60 80 100 120 140 1600

20

40

60

80

100

120

140

160

0 50 100 150 200 250 30035

55

75

95

115

135

155

Two Points determine a line…….and regression can give you the equation.

Degrees C Degrees F0 32

100 212

0 20 40 60 80 100 1200

50

100

150

200

250

Degrees C

Degr

ees F

Two Points determine a line…….and regression can give you the equation.

Degrees C Degrees F0 32

100 212

0 20 40 60 80 100 1200

50

100

150

200

250

f(x) = 1.8 x + 32

Degrees C

Degr

ees F

Data Set A Data Set B Data Set C Data Set DX Y X Y X Y X Y10 9.14 10 8.04 10 7.47 19 12.088 8.14 8 6.95 8 6.47 19 11.26

13 8.74 13 7.58 13 8.97 19 13.219 8.77 9 8.81 9 6.97 19 14.3411 9.25 11 8.33 11 10.87 19 13.9714 8.1 14 9.96 14 9.47 19 12.546 6.13 6 7.24 6 5.47 19 10.754 3.1 4 4.26 4 4.47 8 7.00

12 9.13 12 10.84 12 8.47 19 11.067 7.26 7 4.82 7 8.87 19 13.415 4.74 5 5.68 5 4.97 19 12.39

Four Sets of X,Y Data

Four Sets of X,Y Data

2 4 6 8 10 12 14 160123456789

10

A

2 4 6 8 10 12 14 160

2

4

6

8

10

12

C

2 4 6 8 10 12 14 160

2

4

6

8

10

12

B

6 8 10 12 14 16 18 2002468

10121416

D

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.8166R Square 0.6669Adjusted R Square 0.6299Standard Error 1.2357Observations 11

ANOVA df SS MS F Significance F

Regression 1 27.5100 27.5100 18.0164 0.0022Residual 9 13.7425 1.5269Total 10 41.2525

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 2.9993 2.1532 1.3929 0.1971 -1.8716 7.8702 -1.8716 7.8702X 0.5001 0.1178 4.2446 0.0022 0.2336 0.7666 0.2336 0.7666

Four Sets of X,Y DataData Analysis/Regression

Identical Regression OutputFor A, B, C, and D!!!!!

Assumptions

• Y is normal and we sample n independent observations.– The sample mean is the estimate of μ– The sample standard deviation s is the estimate of σ.– We use and s and n to test hypotheses about μ• Using the t-statistic and the t-distribution with n-1 dof.

– We never forecasted “the next Y”.• Although, our point forecast for a new Y would be

Example: Section 4 IQsIQ

Mean 108.545Standard Error 3.448Median 110Mode 102Standard Deviation 19.807Sample Variance 392.318Kurtosis 0.228Skewness -0.499Range 85Minimum 57Maximum 142Sum 3582Count 33 n

𝑌s

To test H0: μ=100

The CLT tells us this test works even if Y is not

normal.

Regression Assumptions

• Y│X is normal with mean a+bX and standard deviation σ, and we sample n independent observations.– We use regression to estimate a, b, and σ.• , , and “standard error” are the appropriate estimates.• Our point forecast for a new observation is + (X)

– (Plug X into the regression equation)• At some point, we will learn how to use regression output

to test interesting hypotheses.• What about a probability forecast of the new YlX?

Summary: The key assumption of linear regression…..

• Y ~ N(μ,σ) (no regression)

• Y│X ~ N(a+bX,σ) (with regression)

– In other wordsμ = a + b (X) or E(Y│X) = a + b(X)

Without regression, we used data to

estimate and test hypotheses about the

parameter μ.

With regression, we use (x,y) data to estimate and test

hypotheses about the parameters a and b.

In both cases, we use the t because we

don’t know σ.

With regression, we also want to use X to forecast a new Y.

The mean of Y given X is a linear function of X.

EMBS(12.14)

Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08


ANOVA df

Regression 1Residual 13Total 14

CoefficientsIntercept 3.312316042MSF 0.044489502

n

�̂��̂�

Standarderror

Forecasting Y│X=157.3

• Plug X=157.3 into the regression equation to get 10.31 as the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– GOOD METHOD• Pr(Y<8) = NORMDIST(8,10.31,2.77,true) = 0.202

Assumes and and “standard error” are a,

b, and σ.

Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08


ANOVA df

Regression 1Residual 13Total 14

CoefficientsIntercept 3.312316042MSF 0.044489502

Job A Job BIntercept 1 1MSF 157.3 64.7Point Forecast 10.3105 6.1908sigma 2.77 2.77X 8 8Normdist 0.2021 0.7432

n│(X=157.3)

�̂��̂�

Standarderror


• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– BETTER METHOD• t= (8-10.31)/2.77 = -0.83• Pr(Y<8) = 1-t.dist.rt(-0.83,13) = 0.210

Assumes and are a and b….but accounts for the

fact that “standard error” is not σ

dof = n - 2


• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– PERFECT METHOD• t= (8-10.31)/2.93 = -0.79• Pr(Y<8) = 1-t.dist.rt(-0.79,13) = 0.222

To account for using and to estimate a and b, we must

increase the standard deviation used in the forecast. The

“correct” standard deviation is called “standard error of

prediction”…which here is 0.293.

dof = n - 2

Probability Forecasting with Regressionsummary

• Plug X into the regression equation to calculate the point forecast.– This becomes the mean.

• GOOD– Use the normal with “standard error” in place of σ.

• BETTER– Use the t (with n-2 dof) to account for using “standard error”

to estimate σ.• PERFECT– Use the t with the “standard error of prediction” to account for

using and to estimate a and b.

Probability Forecasting with Regression

• “Standard error of prediction” is larger than “standard error” and depends on– 1/n (the larger the n the smaller is “standard error

of prediction”)– (X-)^2 (the farther the X is from the average X, the

larger is “standard error of prediction”)• As n gets big, the “standard error of

prediction” approaches “standard error”.

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 ×√1+ 1𝑛+( 𝑋− 𝑋 )2

∑ ( 𝑋−𝑋 )2❑

Summed over the n data points

The X for which we predict Y

The good and better methods ignore these

terms…okay the bigger the n.

(EMBS 12.26)

BOTTOM LINE

• You will be asked to use the BETTER METHOD– Use the t with n-2 dof– Just use “standard error”

• Know that “standard error” is smaller than the correct “standard deviation of prediction”.– As a result, your probability distribution is a little too

narrow.• Know that the “standard deviation of prediction”

depends on 1/n and (X-)^2 … which means it approaches “standard error” as n gets big.

Much ado about nothing?

0 50 100 150 200 250 300

-5

0

5

10

15

20

25

95% Prediction Intervals

MSF

Hour

sPerfect

(widest and curved)

Good (straight and

narrowest)

Better

TODAY• Got a better idea of how the “least squares” regression line

goes through the cloud of points.• Saw that several “clouds” can have exactly the same

regression line….so chart the cloud.• Practiced using a regression equation to calculate a point

forecast (a mean)• Saw three methods for creating a probability distribution

forecast of Y│X.– We will use the better method.– We will know that it understates the actual uncertainty…..a

problem that goes away as n gets big.

Next Class

• We will learn about “adjusted R square”– (p 9-10 pfeifer note)

– The most over-rated statistic of all time.• We will learn the four assumptions required to use

regression to make a probability forecast of Y│X.– (Section 5 pfeifer note, 12.4 EMBS)

– And how to check each of them.• We will learn how to test H0: b=0.– (p 12-13 pfeifer note, 12.5 EMBS)

– And why this is such an important test.

Date post:	17-Jan-2018
Category:	Documents
Upload:	nicholas-greene
View:	220 times
Download:	0 times

Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note.

Documents