+ All Categories
Home > Documents > TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

Date post: 06-Feb-2016
Category:
Upload: shawna
View: 40 times
Download: 0 times
Share this document with a friend
Description:
TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL. Test 1: Are Any of the x’s Useful in Predicting y?. We are asking: Can we conclude at least one of the ’s (other than  0 )  0? H 0 :  1 =  2 =  3 =  4 = 0 H A : At least one of these ’s  0  = .05. - PowerPoint PPT Presentation
Popular Tags:
25
TESTING THE STRENGTH TESTING THE STRENGTH OF THE OF THE MULTIPLE REGRESSION MODEL MULTIPLE REGRESSION MODEL
Transcript
Page 1: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

TESTING THE STRENGTH TESTING THE STRENGTH

OF THEOF THE

MULTIPLE REGRESSION MODELMULTIPLE REGRESSION MODEL

Page 2: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 1: Are Any of the x’s Useful in Predicting y?

We are asking: Can we conclude at least one of the ’s (other than 0) 0?

H0: 1 = 2 = 3 = 4 = 0

HA: At least one of these ’s 0

= .05

Page 3: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Idea of the Test

• Measure the overall “average variability” due to changes in the x’s

• Measure the overall “average variability” that is due to randomness (error)

• If the overall “average variability” due to changes in the x’s IS A LOT LARGERIS A LOT LARGER than “average variability” due to error, we conclude at least is non-zero, i.e. at least one factor (x) is useful in predicting y

Page 4: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

“Total Variability”

• Just like with simple linear regression we have total sum of squares due to regression SSR , and total sum of squares due to error, SSE, which are printed on the EXCEL output.

– The formulas are a more complicated (they involve matrix operations)

Page 5: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

“Average Variability”

• “Average variability” (Mean variability) for a group is defined as the Total Variability divided by the degrees of freedom associated with that group:

• Mean Squares Due to RegressionMSR = SSR/DFR

• Mean Squares Due to ErrorMSE = SSE/DFE

Page 6: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Degrees of Freedom

• Total number of degrees of freedom DF(Total) always = n-1

• Degrees of freedom for regression (DFR) = the number of factors in the regression (i.e. the number of x’s in the linear regression)

• Degrees of freedom for error (DFE) = difference between the two = DF(Total) -DFR

Page 7: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The F-Statistic

• The F-statistic is defined as the ratio of two measures of variability. Here,

• Recall we are saying if MSR is “large” compared to MSE, at least one β ≠ 0.

• Thus if F is “large”, we draw the conclusion is that HA is true, i.e. at least one β ≠ 0.

MSE

MSRF

Page 8: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The F-test

• “Large” compared to what?

• F-tables give critical values for given values of

• TEST: REJECT H0 (Accept HA) if:

F = MSR/MSE > F,DFR,DFE

Page 9: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

RESULTS

• If we do not get a large F statistic– We cannot conclude that any of the variables

in this model are significant in predicting y.

• If we do get a large F statistic– We can conclude at least one of the variables

is significant for predicting y .– NATURAL QUESTION --

• WHICH ONES?

Page 10: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

DFR = #x’sDFE = Total DF- DFRTotal DF = n-1

SSRSSE

Total SS = (yi - )2y

Page 11: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

MSR = SSR/DFRMSE = SSE/DFE

F = MSR/MSE

P-value for the F test

Page 12: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Results

• We see that the F statistic is 20.89762• This would be compared to F.05,3,34

– From the F.05 Table, the value of F.05,3,34 is not given.– But F.05,3,30 = 2.92 and F.05,3,40 = 2.84.– And 20.89762 > either of these numbers.– The actual value of F.05,3,34 can be calculated by Excel

by FINV(.05,3,34) = 2.882601

• USE SIGNIFICANCE F USE SIGNIFICANCE F – This is the p-valuep-value for the F-Test– Significance F = 7.46 x 10-8 = .0000000746 < .05– Can conclude that at least one x is useful in predicting y

Page 13: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 2: Which Variables Are Significant IN THIS MODEL?

• The question we are asking is, “taking all the other factors (x’s) into consideration, does a change in a particular x (x3, say) value significantly affect y.

• This is another hypothesis test (a t-test).

• To test if the age of the house is significant:

H0: 3 = 0 (x3 is not significant in this modelin this model)

HA: 3 0 (x3 is significant in this modelin this model)

Page 14: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The t-test for a particular factor IN THIS MODEL

• Reject H0 (Accept HA) if:

DFE.025,DFE.025,β

3 tor ts

0β̂t

3

Page 15: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

t-value for test of 3 = 0

p-value for test of 3 = 0

Page 16: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Reading Printout for the t-test

• Simply look at the p-value– p-value for 3 = 0 is .02194 < .05

• Thus the age of the house is significant in this modelin this model

• The other variables– p-value for 1 = 0 is .0000839 < .05

• Thus square feet is significant in this modelin this model

– p-value for 2 = 0 is .15503 > .05• Thus the land (acres) is not significant in this modelin this model

Page 17: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Does A Poor t-value Imply the Variable is not Useful in Predicting y?

• NO

• It says the variable is not significant IN THIS IN THIS MODELMODEL when we consider all the other factors.

• In this model – land is not significant when included with square footage and age.

• But if we would have run this model without square footage we would have gotten the output on the next slide.

Page 18: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

p-value for land is .00000717.In this model Land is significant.

Page 19: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Can it even happen that F says at least one variable is significant, but none of

the t’s indicate a useful variable?

• YES

EXAMPLES IN WHICH THIS MIGHT HAPPEN:– Miles per gallon vs. horsepower and engine size

– Salary vs. GPA and GPA in major

– Income vs. age and experience – HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND

• There is a relation between the x’s – – Multicollinearity

Page 20: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Approaches That Could Be Used When Multicollinearity Is Detected

• Eliminate some variables and run again

• Stepwise regressionThis is discussed in a future module.

Page 21: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 3 --What Proportion of the Overall Variability in y Is Due to

Changes in the x’s?

R2 • R2 = .442197• Overall 44% of the total variation in sales price is

explained by changes in square footage, land, and age of the house.

Page 22: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

What is Adjusted R2?

• Adjusted R2 adjusts R2 to take into account degrees of freedom.

• By assuming a higher order equation for y, we can force the curve to fit this one set of data points in the model – eliminating much of the variability (See next slide).

• But this is not what is going on!R2 might be higher – but adjusted R2 might be much

lower

• Adjusted R2 takes this into account

• Adjusted R2 = 1-MSE/SST

Page 23: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

ScatterplotSales vs Ad Dollars

$0

$20,000

$40,000

$60,000

$80,000

$100,000

$120,000

$140,000

$- $200 $400 $600 $800 $1,000 $1,200 $1,400

Ad Dollars

Sale

s

This is not what is really going on

Page 24: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Review

• Are any of the x’s useful in predicting y IN THIS MODEL – Look at p-value for F-test – Significance F

– F = MSR/MSE would be compared to F,DFR,DFE

• Which variables are significant in this model?– Look at p-values for the individual t-tests

• What proportion of the total variance in y can be explained by changes in the x’s?– R2

– Adjusted R2 takes into account the reduced degrees of freedom for the error term by including more terms in the model

Page 25: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

1-regression equation3- p-values for t-tests

Which variables are significantin this model?

4- R2

What proportion of y can beexplained by changes in x?

4 Places to Look on Excel Printout

2- Significance FAre any variables useful?


Recommended