+ All Categories
Home > Documents > SADC Course in Statistics Choosing the best model (Session 08)

SADC Course in Statistics Choosing the best model (Session 08)

Date post: 28-Mar-2015
Category:
Upload: gabriel-roberts
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
17
SADC Course in Statistics Choosing the “best” model (Session 08)
Transcript
Page 1: SADC Course in Statistics Choosing the best model (Session 08)

SADC Course in Statistics

Choosing the “best” model

(Session 08)

Page 2: SADC Course in Statistics Choosing the best model (Session 08)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to• use a simple descriptive approach to select

of the most appropriate subset of explanatory variables

• apply methods of variable selection (based on statistical tests) in a meaningful way to get the “best” model

• appreciate the effect on t-probabilities when x’s are added or dropped from a model

• understand dangers of using automatic selection procedures

Page 3: SADC Course in Statistics Choosing the best model (Session 08)

3To put your footer here go to View > Header and Footer

Example of choosing “best” set of x’s

Consider data (fictitious) from a retrospective study of patients surviving less than 4 months after being diagnosed as having acute leukaemia.

Objective: To identify factors affecting survival time.

Variables were:y = survival time (days) after diagnosisx1 = no: of chemotherapy sessionsx2 = total volume of blood transfused

x3 = no: of days of hospital carex4 = age of patient (years).

Page 4: SADC Course in Statistics Choosing the best model (Session 08)

4To put your footer here go to View > Header and Footer

Start with a matrix plot

Page 5: SADC Course in Statistics Choosing the best model (Session 08)

5To put your footer here go to View > Header and Footer

Summary statistics for all regressionsHow many possible regression models exist?

Example with x1 and x3 to show summaries:---------+--------------------------------------- Source | SS df MS F Prob>F---------+--------------------------------------- Model | 1488.691 2 744.346 6.07 0.0188Residual | 1227.072 10 122.707 ---------+--------------------------------------- Total | 2715.763 12 226.314 ---------+---------------------------------------

No. of parameters fitted (p) = 3

R2p = 1488.69 / 2715.07 = 0.5482

Adjusted R2p = 1 – 122.71 / 226.31 = 0.4578

Page 6: SADC Course in Statistics Choosing the best model (Session 08)

6To put your footer here go to View > Header and Footer

Descriptive approach (all regressions)

No. of x’s p = No. of parameters

Terms in model

R2 Adj. R2 Res. M.S.

None None None 0 0 226.3

1 1 x1 0.534 0.492 115.1

1 1 x2 0.666 0.636 82.4

1 1 x3 0.286 0.221 176.3

1 1 x4 0.675 0.645 80.4

2 2 x1, x2 0.979 0.974 5.8

2 2 x1, x3 0.548 0.458 122.7

2 2 x1, x4 0.972 0.967 7.5

2 2 x2, x3 0.847 0.816 41.5

2 2 x2, x4 0.680 0.616 86.9

2 2 x3, x4 0.935 0.922 17.6

3 3 x1, x2, x3 0.982 0.976 5.4

3 3 x1, x2, x4 0.982 0.976 5.3

3 3 x1, x3, x4 0.981 0.975 5.7

3 3 x2, x3, x4 0.973 0.964 8.2

4 4 x1, x2, x3, x4 0.982 0.974 6.0

Page 7: SADC Course in Statistics Choosing the best model (Session 08)

7To put your footer here go to View > Header and Footer

A descriptive approach… continued

Plot R2 versus no. of parameters (p) in model

Which model would you select on the basis of these results?

Page 8: SADC Course in Statistics Choosing the best model (Session 08)

8To put your footer here go to View > Header and Footer

A descriptive approach… continued

Which model would you select on the basis of the residual mean square?

Alternatively, plot residual mean square. Small residual mean square is good!

Page 9: SADC Course in Statistics Choosing the best model (Session 08)

9To put your footer here go to View > Header and Footer

An inferential approach…

Use a sequential procedure to select variables that contribute most, and significantly, to the regression model.

Three popular methods exist:

• Forward selection

• Backward elimination

• Stepwise regression

Page 10: SADC Course in Statistics Choosing the best model (Session 08)

10To put your footer here go to View > Header and Footer

Forward selection …

Select the “best” single variable - see slide 6

Ask, “Is it contributing significantly?” Answer: Yes (see below)

----------------------------------------- y | Coef. Std. Err. t P>|t|-------+--------------------------------- x4 | -.73816 .1546 -4.77 0.001const. | 117.57 5.2622 22.34 0.000-----------------------------------------

Now consider 2-variable models with x4.

Page 11: SADC Course in Statistics Choosing the best model (Session 08)

11To put your footer here go to View > Header and Footer

Two-variable models with x4 ----------------------------------------- y | Coef. Std.Err. t P>|t|-------------+--------------------------- x4 | -.61395 .04864 -12.62 0.000 x1 | 1.4400 .13842 10.40 0.000const.| 103.10 2.1240 48.54 0.000----------------------------------------- x4 | -.45694 .69595 -0.66 0.526 x2 | .31090 .74861 0.42 0.687const.| 94.160 56.627 1.66 0.127----------------------------------------- x4 | -.72460 .07233 -10.02 0.000 x3 | -1.1999 .18902 -6.35 0.000const.| 131.28 3.2748 40.09 0.000-----------------------------------------

Page 12: SADC Course in Statistics Choosing the best model (Session 08)

12To put your footer here go to View > Header and Footer

Three-variable models with x4, x1 ----------------------------------------- y | Coef. Std.Err. t P>|t|-------------+--------------------------- x4 | -.23654 .17329 -1.37 0.205 x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052const. | 71.648 14.142 5.07 0.001----------------------------------------- x4 | -.64280 .04454 -14.43 0.000 x1 | 1.0519 .22368 4.70 0.001 x3 | -.41004 .19923 -2.06 0.070const. | 111.68 4.5625 24.48 0.000-----------------------------------------Model with x1, x2 and x4 would be selected!- despite x4 now being non-significant!

Page 13: SADC Course in Statistics Choosing the best model (Session 08)

13To put your footer here go to View > Header and Footer

Backward elimination gives x1,x2 --------------------------------------- y | Coef. Std.Err. t P>|t|-----+--------------------------------- x1 | 1.5511 .74477 2.08 0.071 x2 | .51017 .7238 0.70 0.501 x3 | .10191 .7547 0.14 0.896 x4 | -.14406 .7091 -0.20 0.844--------------------------------------- x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052 x4 | -.23654 .17329 -1.37 0.205--------------------------------------- x1 | 1.4683 .12130 12.10 0.000 x2 | .66225 .04585 14.44 0.000---------------------------------------

Page 14: SADC Course in Statistics Choosing the best model (Session 08)

14To put your footer here go to View > Header and Footer

Stepwise selection procedure…

This is similar to forward selection, but at each stage of the process, all x’s in the model are re-assessed to check if those that entered the model at an earlier stage still remain “important”.

Note: Software packages allow automatic use of one of these with pre-specified p-values for selection and deletion of variables. Usually available only with quantitative x’s.

Page 15: SADC Course in Statistics Choosing the best model (Session 08)

15To put your footer here go to View > Header and Footer

Discussion… in small groups • Look back at results. What do you observe

with the forward and backward procedures. Do they give the same results?

• Did the selection using forward seem sensible, given that for x4, the p-value =0.205?

• Can you work out what model would results with a stepwise selection procedures?

• Is it a good idea to use such automatic selection procedures available in software packages? If not, why not?

Page 16: SADC Course in Statistics Choosing the best model (Session 08)

16To put your footer here go to View > Header and Footer

Discussion continued…

Suppose a medical researcher told you that a model without x2 was not meaningful, how would you proceed with your model selection?

What other latent (lurking) variables, measurable or non-measurable, might affect y?

What further steps would you undertaken before accepting the final model?

Page 17: SADC Course in Statistics Choosing the best model (Session 08)

17To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…


Recommended