Date post: | 17-Jan-2018 |
Category: |
Documents |
Upload: | nicholas-greene |
View: | 220 times |
Download: | 0 times |
Class 22. Understanding Regression
EMBSPart of 12.7
Sections 1-3 and 7 of Pfeifer Regression note
What is the regression line?
• It is a line drawn through a cloud of points.• It is the line that minimizes sum of squared errors.– Errors are also known as residuals.– Error = Actual – Predicted.– Error is the vertical distance point (actual) to line
(predicted).– Points above the line are positive errors.
• The average of the errors will be always be zero• The regression line will always “go through” the
average X, average Y.
Error aka residualPredicted aka fitted
X
YCan you draw the regression line?
X
YA
B
C
D
E
Which is the regression line?
F
X
Y
D
Which is the regression line?
X
YWhich is the regression line?
(1,1) (3,1)
(2,7)
(3,3)(2,3)(1,3) Error
= 7-3 = 4
Error = 1-3 = -2
Error = 1-3 = -2
Sum of Errors is 0!
SSE=(-2^2+4^2+-2^2) is smaller than from any other
line. The line goes through (2,3), the average.
Draw in the regression line…
40 60 80 100 120 140 160 1800
20406080
100120140160180
40 60 80 100 120 140 1600
20
40
6080
100
120
140
160
20 40 60 80 100 120 140 1600
20
40
60
80
100
120
140
160
0 50 100 150 200 250 30035
55
75
95
115
135
155
Draw in the regression line…
40 60 80 100 120 140 160 1800
20406080
100120140160180
40 60 80 100 120 140 1600
20
40
6080
100
120
140
160
20 40 60 80 100 120 140 1600
20
40
60
80
100
120
140
160
0 50 100 150 200 250 30035
55
75
95
115
135
155
Two Points determine a line…….and regression can give you the equation.
Degrees C Degrees F0 32
100 212
0 20 40 60 80 100 1200
50
100
150
200
250
Degrees C
Degr
ees F
Two Points determine a line…….and regression can give you the equation.
Degrees C Degrees F0 32
100 212
0 20 40 60 80 100 1200
50
100
150
200
250
f(x) = 1.8 x + 32
Degrees C
Degr
ees F
Data Set A Data Set B Data Set C Data Set DX Y X Y X Y X Y10 9.14 10 8.04 10 7.47 19 12.088 8.14 8 6.95 8 6.47 19 11.26
13 8.74 13 7.58 13 8.97 19 13.219 8.77 9 8.81 9 6.97 19 14.3411 9.25 11 8.33 11 10.87 19 13.9714 8.1 14 9.96 14 9.47 19 12.546 6.13 6 7.24 6 5.47 19 10.754 3.1 4 4.26 4 4.47 8 7.00
12 9.13 12 10.84 12 8.47 19 11.067 7.26 7 4.82 7 8.87 19 13.415 4.74 5 5.68 5 4.97 19 12.39
Four Sets of X,Y Data
Four Sets of X,Y Data
2 4 6 8 10 12 14 160123456789
10
A
2 4 6 8 10 12 14 160
2
4
6
8
10
12
C
2 4 6 8 10 12 14 160
2
4
6
8
10
12
B
6 8 10 12 14 16 18 2002468
10121416
D
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.8166R Square 0.6669Adjusted R Square 0.6299Standard Error 1.2357Observations 11
ANOVA df SS MS F Significance F
Regression 1 27.5100 27.5100 18.0164 0.0022Residual 9 13.7425 1.5269Total 10 41.2525
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 2.9993 2.1532 1.3929 0.1971 -1.8716 7.8702 -1.8716 7.8702X 0.5001 0.1178 4.2446 0.0022 0.2336 0.7666 0.2336 0.7666
Four Sets of X,Y DataData Analysis/Regression
Identical Regression OutputFor A, B, C, and D!!!!!
Assumptions
• Y is normal and we sample n independent observations.– The sample mean is the estimate of μ– The sample standard deviation s is the estimate of σ.– We use and s and n to test hypotheses about μ• Using the t-statistic and the t-distribution with n-1 dof.
– We never forecasted “the next Y”.• Although, our point forecast for a new Y would be
Example: Section 4 IQsIQ
Mean 108.545Standard Error 3.448Median 110Mode 102Standard Deviation 19.807Sample Variance 392.318Kurtosis 0.228Skewness -0.499Range 85Minimum 57Maximum 142Sum 3582Count 33 n
𝑌s
To test H0: μ=100
The CLT tells us this test works even if Y is not
normal.
Regression Assumptions
• Y│X is normal with mean a+bX and standard deviation σ, and we sample n independent observations.– We use regression to estimate a, b, and σ.• , , and “standard error” are the appropriate estimates.• Our point forecast for a new observation is + (X)
– (Plug X into the regression equation)• At some point, we will learn how to use regression output
to test interesting hypotheses.• What about a probability forecast of the new YlX?
Summary: The key assumption of linear regression…..
• Y ~ N(μ,σ) (no regression)
• Y│X ~ N(a+bX,σ) (with regression)
– In other wordsμ = a + b (X) or E(Y│X) = a + b(X)
Without regression, we used data to
estimate and test hypotheses about the
parameter μ.
With regression, we use (x,y) data to estimate and test
hypotheses about the parameters a and b.
In both cases, we use the t because we
don’t know σ.
With regression, we also want to use X to forecast a new Y.
The mean of Y given X is a linear function of X.
EMBS(12.14)
Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08
Regression StatisticsMultiple R 0.72600331R Square 0.527080806Adjusted R Square 0.490702407Standard Error 2.773595935Observations 15
ANOVA df
Regression 1Residual 13Total 14
CoefficientsIntercept 3.312316042MSF 0.044489502
n
�̂��̂�
Standarderror
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to get 10.31 as the point forecast.– The point forecast is the mean of the probability
distribution forecast.• Under Certain Assumptions…….– GOOD METHOD• Pr(Y<8) = NORMDIST(8,10.31,2.77,true) = 0.202
Assumes and and “standard error” are a,
b, and σ.
Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08
Regression StatisticsMultiple R 0.72600331R Square 0.527080806Adjusted R Square 0.490702407Standard Error 2.773595935Observations 15
ANOVA df
Regression 1Residual 13Total 14
CoefficientsIntercept 3.312316042MSF 0.044489502
Job A Job BIntercept 1 1MSF 157.3 64.7Point Forecast 10.3105 6.1908sigma 2.77 2.77X 8 8Normdist 0.2021 0.7432
n│(X=157.3)
�̂��̂�
Standarderror
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability
distribution forecast.• Under Certain Assumptions…….– BETTER METHOD• t= (8-10.31)/2.77 = -0.83• Pr(Y<8) = 1-t.dist.rt(-0.83,13) = 0.210
Assumes and are a and b….but accounts for the
fact that “standard error” is not σ
dof = n - 2
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability
distribution forecast.• Under Certain Assumptions…….– PERFECT METHOD• t= (8-10.31)/2.93 = -0.79• Pr(Y<8) = 1-t.dist.rt(-0.79,13) = 0.222
To account for using and to estimate a and b, we must
increase the standard deviation used in the forecast. The
“correct” standard deviation is called “standard error of
prediction”…which here is 0.293.
dof = n - 2
Probability Forecasting with Regressionsummary
• Plug X into the regression equation to calculate the point forecast.– This becomes the mean.
• GOOD– Use the normal with “standard error” in place of σ.
• BETTER– Use the t (with n-2 dof) to account for using “standard error”
to estimate σ.• PERFECT– Use the t with the “standard error of prediction” to account for
using and to estimate a and b.
Probability Forecasting with Regression
• “Standard error of prediction” is larger than “standard error” and depends on– 1/n (the larger the n the smaller is “standard error
of prediction”)– (X-)^2 (the farther the X is from the average X, the
larger is “standard error of prediction”)• As n gets big, the “standard error of
prediction” approaches “standard error”.
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 ×√1+ 1𝑛+( 𝑋− 𝑋 )2
∑ ( 𝑋−𝑋 )2❑
Summed over the n data points
The X for which we predict Y
The good and better methods ignore these
terms…okay the bigger the n.
(EMBS 12.26)
BOTTOM LINE
• You will be asked to use the BETTER METHOD– Use the t with n-2 dof– Just use “standard error”
• Know that “standard error” is smaller than the correct “standard deviation of prediction”.– As a result, your probability distribution is a little too
narrow.• Know that the “standard deviation of prediction”
depends on 1/n and (X-)^2 … which means it approaches “standard error” as n gets big.
Much ado about nothing?
0 50 100 150 200 250 300
-5
0
5
10
15
20
25
95% Prediction Intervals
MSF
Hour
sPerfect
(widest and curved)
Good (straight and
narrowest)
Better
TODAY• Got a better idea of how the “least squares” regression line
goes through the cloud of points.• Saw that several “clouds” can have exactly the same
regression line….so chart the cloud.• Practiced using a regression equation to calculate a point
forecast (a mean)• Saw three methods for creating a probability distribution
forecast of Y│X.– We will use the better method.– We will know that it understates the actual uncertainty…..a
problem that goes away as n gets big.
Next Class
• We will learn about “adjusted R square”– (p 9-10 pfeifer note)
– The most over-rated statistic of all time.• We will learn the four assumptions required to use
regression to make a probability forecast of Y│X.– (Section 5 pfeifer note, 12.4 EMBS)
– And how to check each of them.• We will learn how to test H0: b=0.– (p 12-13 pfeifer note, 12.5 EMBS)
– And why this is such an important test.