Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | gabriel-roberts |
View: | 216 times |
Download: | 2 times |
SADC Course in Statistics
Choosing the “best” model
(Session 08)
2To put your footer here go to View > Header and Footer
Learning Objectives
At the end of this session, you will be able to• use a simple descriptive approach to select
of the most appropriate subset of explanatory variables
• apply methods of variable selection (based on statistical tests) in a meaningful way to get the “best” model
• appreciate the effect on t-probabilities when x’s are added or dropped from a model
• understand dangers of using automatic selection procedures
3To put your footer here go to View > Header and Footer
Example of choosing “best” set of x’s
Consider data (fictitious) from a retrospective study of patients surviving less than 4 months after being diagnosed as having acute leukaemia.
Objective: To identify factors affecting survival time.
Variables were:y = survival time (days) after diagnosisx1 = no: of chemotherapy sessionsx2 = total volume of blood transfused
x3 = no: of days of hospital carex4 = age of patient (years).
4To put your footer here go to View > Header and Footer
Start with a matrix plot
5To put your footer here go to View > Header and Footer
Summary statistics for all regressionsHow many possible regression models exist?
Example with x1 and x3 to show summaries:---------+--------------------------------------- Source | SS df MS F Prob>F---------+--------------------------------------- Model | 1488.691 2 744.346 6.07 0.0188Residual | 1227.072 10 122.707 ---------+--------------------------------------- Total | 2715.763 12 226.314 ---------+---------------------------------------
No. of parameters fitted (p) = 3
R2p = 1488.69 / 2715.07 = 0.5482
Adjusted R2p = 1 – 122.71 / 226.31 = 0.4578
6To put your footer here go to View > Header and Footer
Descriptive approach (all regressions)
No. of x’s p = No. of parameters
Terms in model
R2 Adj. R2 Res. M.S.
None None None 0 0 226.3
1 1 x1 0.534 0.492 115.1
1 1 x2 0.666 0.636 82.4
1 1 x3 0.286 0.221 176.3
1 1 x4 0.675 0.645 80.4
2 2 x1, x2 0.979 0.974 5.8
2 2 x1, x3 0.548 0.458 122.7
2 2 x1, x4 0.972 0.967 7.5
2 2 x2, x3 0.847 0.816 41.5
2 2 x2, x4 0.680 0.616 86.9
2 2 x3, x4 0.935 0.922 17.6
3 3 x1, x2, x3 0.982 0.976 5.4
3 3 x1, x2, x4 0.982 0.976 5.3
3 3 x1, x3, x4 0.981 0.975 5.7
3 3 x2, x3, x4 0.973 0.964 8.2
4 4 x1, x2, x3, x4 0.982 0.974 6.0
7To put your footer here go to View > Header and Footer
A descriptive approach… continued
Plot R2 versus no. of parameters (p) in model
Which model would you select on the basis of these results?
8To put your footer here go to View > Header and Footer
A descriptive approach… continued
Which model would you select on the basis of the residual mean square?
Alternatively, plot residual mean square. Small residual mean square is good!
9To put your footer here go to View > Header and Footer
An inferential approach…
Use a sequential procedure to select variables that contribute most, and significantly, to the regression model.
Three popular methods exist:
• Forward selection
• Backward elimination
• Stepwise regression
10To put your footer here go to View > Header and Footer
Forward selection …
Select the “best” single variable - see slide 6
Ask, “Is it contributing significantly?” Answer: Yes (see below)
----------------------------------------- y | Coef. Std. Err. t P>|t|-------+--------------------------------- x4 | -.73816 .1546 -4.77 0.001const. | 117.57 5.2622 22.34 0.000-----------------------------------------
Now consider 2-variable models with x4.
11To put your footer here go to View > Header and Footer
Two-variable models with x4 ----------------------------------------- y | Coef. Std.Err. t P>|t|-------------+--------------------------- x4 | -.61395 .04864 -12.62 0.000 x1 | 1.4400 .13842 10.40 0.000const.| 103.10 2.1240 48.54 0.000----------------------------------------- x4 | -.45694 .69595 -0.66 0.526 x2 | .31090 .74861 0.42 0.687const.| 94.160 56.627 1.66 0.127----------------------------------------- x4 | -.72460 .07233 -10.02 0.000 x3 | -1.1999 .18902 -6.35 0.000const.| 131.28 3.2748 40.09 0.000-----------------------------------------
12To put your footer here go to View > Header and Footer
Three-variable models with x4, x1 ----------------------------------------- y | Coef. Std.Err. t P>|t|-------------+--------------------------- x4 | -.23654 .17329 -1.37 0.205 x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052const. | 71.648 14.142 5.07 0.001----------------------------------------- x4 | -.64280 .04454 -14.43 0.000 x1 | 1.0519 .22368 4.70 0.001 x3 | -.41004 .19923 -2.06 0.070const. | 111.68 4.5625 24.48 0.000-----------------------------------------Model with x1, x2 and x4 would be selected!- despite x4 now being non-significant!
13To put your footer here go to View > Header and Footer
Backward elimination gives x1,x2 --------------------------------------- y | Coef. Std.Err. t P>|t|-----+--------------------------------- x1 | 1.5511 .74477 2.08 0.071 x2 | .51017 .7238 0.70 0.501 x3 | .10191 .7547 0.14 0.896 x4 | -.14406 .7091 -0.20 0.844--------------------------------------- x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052 x4 | -.23654 .17329 -1.37 0.205--------------------------------------- x1 | 1.4683 .12130 12.10 0.000 x2 | .66225 .04585 14.44 0.000---------------------------------------
14To put your footer here go to View > Header and Footer
Stepwise selection procedure…
This is similar to forward selection, but at each stage of the process, all x’s in the model are re-assessed to check if those that entered the model at an earlier stage still remain “important”.
Note: Software packages allow automatic use of one of these with pre-specified p-values for selection and deletion of variables. Usually available only with quantitative x’s.
15To put your footer here go to View > Header and Footer
Discussion… in small groups • Look back at results. What do you observe
with the forward and backward procedures. Do they give the same results?
• Did the selection using forward seem sensible, given that for x4, the p-value =0.205?
• Can you work out what model would results with a stepwise selection procedures?
• Is it a good idea to use such automatic selection procedures available in software packages? If not, why not?
16To put your footer here go to View > Header and Footer
Discussion continued…
Suppose a medical researcher told you that a model without x2 was not meaningful, how would you proceed with your model selection?
What other latent (lurking) variables, measurable or non-measurable, might affect y?
What further steps would you undertaken before accepting the final model?
17To put your footer here go to View > Header and Footer
Practical work follows to ensure learning objectives are
achieved…