+ All Categories
Home > Documents > Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression...

Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression...

Date post: 14-Dec-2015
Category:
Upload: kerry-mosley
View: 218 times
Download: 2 times
Share this document with a friend
Popular Tags:
22
Week 5 Slide #1 Adjusted R 2 , Residuals, and Review •Adjusted R 2 •Residual Analysis •Stata Regression Output revisited –The Overall Model –Analyzing Residuals •Review for Exam 2
Transcript
Page 1: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #1

Adjusted R2, Residuals, and Review

• Adjusted R2

• Residual Analysis• Stata Regression Output

revisited– The Overall Model– Analyzing Residuals

• Review for Exam 2

Page 2: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #2

Exercise Review

– Use the caschool.dta dataseet

– Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr)

– Examine the univariate distributions of both variables and the residuals

• Walk through the entire interpretation

• Build a Stata do-file as you go

Page 3: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #3

Exercise Review, continued

Source | SS df MS Number of obs = 420 -------------+------------------------------ F( 1, 418) = 430.83 Model | 77204.394 1 77204.394 Prob > F = 0.0000 Residual | 74905.1997 418 179.199042 R-squared = 0.5076 -------------+------------------------------ Adj R-squared = 0.5064 Total | 152109.594 419 363.030056 Root MSE = 13.387 ------------------------------------------------------------------------------ testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- avginc | 1.87855 .0905044 20.76 0.000 1.700649 2.05645 _cons | 625.3836 1.532405 408.11 0.000 622.3714 628.3958 ------------------------------------------------------------------------------

Page 4: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #4

Exercise Review, Continued

600

650

700

750

0 20 40 60avginc

testscr 95% CI

Fitted values

Page 5: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #5

Adjusted R2: An Alternative “Goodness of Fit” Measure

• Recall that R2 is calculated as:

• Hypothetically, as K approaches n, R2 approaches one (why?) – “degrees of freedom”

• Adjusted R2 compensates for that tendency

∑ ∑ −=−=

=

22

2

)( and )ˆ(

: where,

YYTSSYYESS

TSS

ESSR

ii

“explained sum of squares” “total sum of squares”

Page 6: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #6

Calculating Adjusted R2

Ra2 =R2 −

K −1n−K

(1−R2)

• The bigger the sample size (n), the smaller the adjustment• The more complex the model (the bigger K is), the larger the adjustment• The bigger R2 is, the smaller the adjustment

Page 7: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #7

Residual Analysis: Trouble Shooting• Conceptual use of residuals

– e, or what the model can’t explain

• Visual Diagnostics – Ideal: a “Sneeze plot”– Diagnostics using Residual Plots:

• Checking for heteroscedasticity• Checking for non-linearity• Checking for outliers

• Saving and Analyzing Residuals in Stata

Page 8: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #8

Review: Assumptions Necessary for Estimating Linear Models

1.Errors have identical distributions

Zero mean, same variance, across the range of X

2.Errors are independent of X and other i

3.Errors are normally distributed

E[ i ] ≠ f(X) and E[i ] ≠ f( j , j ≠i)

i=0

X

Page 9: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #9

The Ideal: Sneeze Splatter

e

Predicted Y

Problems: It is possible to “over-interpret” residual plots; it is also possible to miss patterns when there are large numbers of observations

Page 10: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #10

Heteroscedasticity

e

Predicted Y

Problem: Standard errors are not constant; hypothesis tests invalid

Page 11: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #11

Non-Linearity

e

Predicted Y

Problem: Biased estimated coefficients, inefficient model

Page 12: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #12

Checking for Outliers

e

Predicted Y

Problem: Under-specified model; measurement error

Residuals formodel usingall data

Possible Outliers

Residuals for modelwith outliers deleted

Page 13: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #13

Stata Regression Model:Regressing “testscr” onto “avginc”

Source | SS df MS Number of obs = 420 -------------+------------------------------ F( 1, 418) = 430.83 Model | 77204.394 1 77204.394 Prob > F = 0.0000 Residual | 74905.1997 418 179.199042 R-squared = 0.5076 -------------+------------------------------ Adj R-squared = 0.5064 Total | 152109.594 419 363.030056 Root MSE = 13.387 ------------------------------------------------------------------------------ testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- avginc | 1.87855 .0905044 20.76 0.000 1.700649 2.05645 _cons | 625.3836 1.532405 408.11 0.000 622.3714 628.3958 ------------------------------------------------------------------------------

Page 14: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #14

Regression Plot (again)

600

650

700

750

0 20 40 60avginc

testscr 95% CI

Fitted values

Page 15: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #15

Residual Plot

0

.01

.02

.03

Density

-40 -20 0 20 40Residual

Page 16: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #16

Examination of Residualsgsort e (or you can use “-e”)list observat testscr avginc yhat e in 1/5

. list observat testscr avginc yhat e in 1/5

+---------------------------------------------------+observat testscr avginc yhat e ---------------------------------------------------1. 393 683.4 13.567 650.8699 32.53016 2. 386 681.6 14.177 652.0157 29.5842 3. 419 672.2 9.952 644.0789 28.12111 4. 366 675.7 11.834 647.6143 28.08568 5. 371 676.95 12.934 649.6807 27.26921 +---------------------------------------------------+

Use the case ID number to find the relevant observation in the data set

Page 17: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #17

Residuals v. Predicted Values

Using an “ocular test,” non-linearity seems probable, but heteroscedasticity is not obvious here. But should we trust our eyeballs?

-40

-20

0

20

40

Residuals

640 660 680 700 720 740Fitted values

Page 18: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #18

Formal Test for Non-linearity:Omitted Variables

Tests whether adding 2nd, 3rd and 4th powers of X will improve the fit of the model:

Y=b0+b1X+b2X2+b3X3+b4X4+e

. ovtest Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 415) = 17.75 Prob > F = 0.0000

Page 19: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #19

Formal Tests for Heteroscedasticity

Tests to see whether the squared standardized residuals are linearly related to the predicted value of Y:

std(e2)=b0+b1(Predicted Y)

Page 20: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #20

393386419366371389328346335356367372

416

338395324374373355364

2733611385336325394308368311353362297316391287382262

3453133422365292381369272260380258294388357339286312315358

403

349350257344215384

410

321271320248360322363326

397

279

396

352232231284300280255270281242283295375343216246285208237210268

399

331218205318337221254247319203390376252189282299290330185274174309

377211347301183228226240370420288278233251277186317341245304310259250

348329239

412

1784197307334229289264139266291206

407

225340195212

400

354323175293213261

409

182162123

398

209

417

244173190243253159269222207219333

406

2353272143141961491532652272672001473238188204249223158144137194256129172296155169276161131359

224125

401

33230618418710194111150

415

2171281673021765105121241230145303

408

411

17717019911782263751191181461401411431228511313314812071104160220899819286

351

275164889616599305

1521321631081911566910390157

298

1511342011661101686184124

413

1541987913814212765

379

181100136

392

4610217983925880193130787491171109

234442021261158711410673112681074381422366462

383

9770953152

378

116603345

402

2820403766547218050273059632349762577262947415367191713

56215736

387

51343524

414

18162255

124832

1493418

3811915

404

135 7 810

405

639

0

.02

.04

.06

.08

Leverage

0 .005 .01 .015 .02Normalized residual squared

Case-wise Influence AnalysisThe Leverage versus Squared Residual Plot

Page 21: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #21

What to Do?• Nonlinearity

– Polynomial regression: try X and X2

– Variable transformation: logged variables– Use non-OLS regression (curve fitting)

• Heteroscedasticity– Re-specify model

• Omitted variables?• Use non-OLS regression (WLS)• Use robust standard errors

• Influential and Deviant Cases– Evaluate the cases– Run with controls (multivariate model)– Omit cases (last option)

Page 22: Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.

Week 5 Slide #22

Next Week

• Review regression diagnostics

• Introduction to Matrix Algebra

• Review for Exam


Recommended