+ All Categories
Home > Documents > REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | +...

REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | +...

Date post: 21-Dec-2015
Category:
View: 261 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
REGRESSION MODEL REGRESSION MODEL ASSUMPTIONS ASSUMPTIONS
Transcript
Page 1: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

REGRESSION MODELREGRESSION MODEL

ASSUMPTIONSASSUMPTIONS

Page 2: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

The Regression Model

• We have hypothesized that:

y = 0 + 1x +

|<Regression>| + |<Error>|

• So far we focused on the regression part – getting the best estimates for the ’s

• Here we focus on the error term,

Page 3: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

THE RANDOM VARIABLE,

• The error term, , is a random variable that describes how the observed values, yi, vary around the regression line.

• For any value of x, has a distribution with a mean and a standard deviation

• At any x value xi, the observed value of the error term is called its residual, given by:

iii y - y e ˆ

Page 4: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

STEP 3: 4 ASSUMPTIONS ABOUT

The remainder of our discussion about linear regression assumes the following about

• (1) DISTRIBUTION: is distributed normally

• (2) MEAN:– The errors average out to 0, i.e. E(), or = 0

• (3) STANDARD DEVIATION: , is the samesame at all values of x

• (4) INDEPENDENCE:– The errors are independentindependent of each other

Page 5: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

What Do These Assumptions Imply About y?

• y = 0 + 1x + .0 + 1x is a constant for a given value of x is normally distributed with mean 0 and standard

deviation .

• Thus y is normally distributed with standard deviation and mean E(y),

E(y) = E(0 + 1x + ) = E(0 + 1x) + E() = 0 + 1x + 0 = 0 + 1x

Page 6: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

BEST ESTIMATE FOR

• The true value of is unkown.

• It can estimated by s as follows:

s s and 2-n

y -(y

2-n

SSE s

and, 2-n freedom of degrees Thus

β and β :quantities two estimating are we Here

.estimated) being quantities(# - n freedom of Degrees

Freedom of Degrees

y -(y

Freedom of Degrees

SSE s

2ii

10

ii

;)ˆ

.

22

22

Page 7: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Hand Calculation of SSE

1 1200 101000 109567.57 73403214.02

2 800 92000 88540.54 11967859.75

3 1000 110000 99054.05 119813732.7

4 1300 120000 114824.32 26787618.7

5 700 90000 83283.78 45107560.26

6 800 82000 88540.54 42778670.56

7 1000 93000 99054.05 36651570.49

8 600 75000 78027.03 9162892.622

9 900 91000 93797.30 7824872.169

10 1100 105000 104310.81 474981.7385

SUM 373972972.97

ii 52.5657x 46486.49 ythat Recall ˆ

SSESSE

22iiiii )( )y(y )y y y x i

6837.15246746621.6s

246746621.68

97377972972.

2n

SSEs2

Page 8: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

s

Residual Error

SSE/(n-2) = s2

SSE

Page 9: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Checking the Assumptions

• Many times it is just assumed that the assumptions hold.

• We now show how to check the assumptions.

Page 10: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Residuals

• The assumptions for can be checked using RESIDUAL ANALYSISRESIDUAL ANALYSIS.

• A residual, ei, is the observation of at an observed value of x, xi.

• For example in the Dollar Only example:y1 = 101,000 when x1 = 1200

8567.67109,567.57101,000e

109,567.57200)52.56757(146486.49y

1

1

ˆ

Page 11: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Standardized Residuals• Is a residual of -8,567.67 large?

– It depends on the size of a standard error, s.• Standardized residual = ei/(standard error of ei for xi).• Standardized residuals are easier to use to test the

assumptions.• Two typical ways for calculating the standard error of

ei for a particular xi value are:

• Both approaches yield substantially the same results.

2i

2i

i

i

i

i

)x(x

)x(x

n

1h where

h1s

e

s

e

Page 12: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Standardized Residuals in Excel

• Excel uses the following formula:

1-n

2-ns

ei

This still gives approximately the same values as the other methods. We will use the ones generated by Excel to check the assumptions.

Page 13: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Checking to See if Errors (Residuals) Appear to Come From a Normal Distribution

TWO WAYS TO CHECK• Construct a plot of standardized residuals and

see if they look normal– Could use Histogram from Data Analysis– A “quick check” – Standardized residuals are like

z-values. Check to see if about 68% are between ± 1, 95% between ± 2, and virtually all between ± 3.

• Look at a normal probability plot. These are statistical plots to check for “normality”. A “perfect” normal distribution would be a straight line on such a plot.

Page 14: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Checking to see if Is Constant

• Look at the residual plot to see if the points seem more spread out at some x’s than at others – in the Dollar Only example, it did not appear so on the Excel residual plot.

• Constant is called homoscedasticityhomoscedasticity!• If the points had looked like the next page, then

we see for lower values of x there is less variation than at higher values and the constant variation assumption would have been violated. This is called heteroscedasticityheteroscedasticity!

Page 15: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

x

e

Heteroscedasticity– Nonconstant Variance

Page 16: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Checking Independence

• This is mainly for time series data (i.e. the x-axis is time) used in forecasting

• But basically if the data looks like the next slide – errors are not independent – In this case whether you have a positive or

negative error (residual) depends on the x-value.

– This is called autocorrelation.

Page 17: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

X=timeX=time

YY

Example of Autocorrelation(Errors are Dependent on x)

Page 18: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Residual Analysis in Excel

CHECK:

Residuals

Standardized Residuals

Residual Plots

Normal Probability Plots

Page 19: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Standardized ResidualsStandardized Residuals70% are between ± 1

100% are between ±2

“Close” to expected

normalnormal values

Residual values appear to

average out to 0 everywhere.

There is no discernable

pattern for the errors.

Page 20: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Normal Probability Plot

• The following is the normal probability plot generated by Excel. Again Excel does it “slightly wrong”, but it should give us a good idea.

• Looks close to a straight line – normality assumption appears valid.

Normal Probability Plot

050000100000150000

0 20 40 60 80 100

Sample Percentile

Sal

es

Page 21: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Review• 4 assumptions about

1. is normal.

2. = E() = 0.3. is the same for all values of x.4. Errors are independent.

• Checking The Assumptions– Check residual plot to see if variation changes for

different values of x.– Check normality assumption by a normal probability

plot or by creating a histogram of standardized residuals.

• Does it appear normal and centered around 0?• Are about 68% between ±1, 95% between ±2, almost all

between ±3?


Recommended