Date post: | 06-Jul-2018 |
Category: |
Documents |
Upload: | rama-dulce |
View: | 217 times |
Download: | 0 times |
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 115
- 1 -
BUILDING REGRESSION MODELS ndash PART 1
Topics Outline
bull Partial F Test
bull Adjusted 2r
bull pC Statistic
bull IncludeExclude Decisions
bull Variable Selection Procedures
Partial F Test
There are many situations where a set of explanatory variables form a logical group It is then
common to include all of the variables in the equation or exclude all of them An example of this
is when one of the explanatory variables is categorical with more than two categories
In this case you model it by including dummy variables ndash one fewer than the number of categories
If you decide that the categorical variable is worth including you might want to keep all of thedummies Otherwise you might decide to exclude all of them
Consider the following general situation Suppose you have already estimated a reduced
multiple regression model that includes the variables 1 x through j x
ε β β α +++= j j x x y L11
Now you are proposing to estimate a larger referred to as full model that includes 1+ j x through k x
in addition to the variables 1 x through j x
ε β β β β α +++++= ++ k k j j j j x x x x y LL 1111
That is the full model includes all of the variables from the smaller model but it also includes
k ndash j extra variables
The partial F test is used to determine whether the extra variables provide enough extra
explanatory power as a group to warrant their inclusion in the equation In other words
the partial F test tests whether the full model is significantly better than the reduced model
The null and alternative hypotheses can be stated as follows
0 10 ===
+ k j H β β L (The extra variables have no effect on y)a H At least one of
k j β β 1 K+is not zero (At least one of the extra variables has an effect on y)
The test statistic is
(full)
termsextraof number
(full)(reduced)
MSE
SSE SSE
F
minus
=
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 215
- 2 -
This test statistic measures how much the sum of squared residuals SSE decreases by including
the extra variables in the equation It must decrease by some amount because the sum of squaredresiduals cannot increase when extra variables are added to an equation But if it does not
decrease sufficiently the extra variables might not explain enough to justify their inclusion in the
equation and they should probably be excluded
If the null hypothesis is true this test statistic has an F distribution with df1 = k ndash j and
df2 = n ndash k ndash 1 degrees of freedom If the corresponding P-value is sufficiently smallyou can reject the null hypothesis that the extra variables have no explanatory power
To perform the partial F test in Excel run two regressions one for the reduced model
(with explanatory variables 1 x through j x ) and one for the full model (with explanatory
variables 1 x through k x ) and use the appropriate values from their ANOVA tables to calculate
the F test statistic Then use Excels FDIST function to calculate the corresponding P-value
Reminder The ANOVA table for the full equation has the following form
Source of
Variation
Degrees
of
Freedom
Sum
of Squares
Mean Squares
(Variance)F statistic P-value
Regression k SSRk
SSR MSR =
MSE
MSRF = Prob gt F
Error n ndash k ndash 1 SSE1minusminus
=
k n
SSE MSE
Total 1minusn SST
Notes
1 Many users look only at the 2r and se values to check whether extra variables are doing a
ldquogood jobrdquo For example they might cite that 2r went from 80 to 90 or that se wentfrom 500 to 400 as evidence that extra variables provide a ldquosignificantlyrdquo better fit
Although these are important indicators they are not the basis for a formal hypothesis test
The partial F test is the formal test of significance for an extra set of variables
2 If the partial F test shows that a group of variables is significant it does not imply that each
variable in this group is significant Some of these variables can have low t -values(and consequently large P-values) Some analysts favor excluding the individual variables
that arent significant whereas others favor keeping the whole group or excluding the whole
group Either approach is valid Fortunately the final model building results are often nearlythe same either way
3 StatTools performs partial F tests as part of the procedure for building regression models
when the option Block is chosen in the Regression Type dropdown list
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315
- 3 -
Example 1
Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses
based on the effect of atmospheric temperature and the amount of attic insulation
Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses
1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style
1 2753 40 3 1
2 3638 27 3 0
M M M M M
14 323 38 3 0
15 525 58 10 0
(a) Develop and analyze an appropriate regression model
The explanatory variables considered are
1 x ndash atmospheric temperature
2 x ndash the amount of attic insulation
3 x ndash dummy variable = 1 if the style is ranch 0 otherwise
Assuming that the slope between heating oil consumption and atmospheric temperature 1 x
and between heating oil consumption and the amount of attic insulation 2 x is the same for
both styles of houses the regression model is
ε β β β α ++++= 332211 x x x y
The regression results for this model are
Regression Statistics
Multiple R 09942
R Square 09884
Adjusted R Square 09853
Standard Error 157489
Observations 15
ANOVA
df SS MS F Significance F
Regression 3 2334069094 778023031 3136822 00000
Residual 11 27283200 2480291
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 5925401 143370 413295 00000 5609846 6240956
Temperature -55251 02044 -270267 00000 -59751 -50752
Insulation -213761 14480 -147623 00000 -245632 -181891
Style -389727 83584 -46627 00007 -573695 -205759
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415
- 4 -
(b) Interpret the regression coefficients
The regression equation is
321 972738376121525155401592ˆ x x x y minusminusminus=
Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style
For houses that are ranch style because 3 x = 1 the regression equation reduces to
21 376121525155674553ˆ x x y minusminus=
For houses that are not ranch style because 3 x = 0 the regression equation reduces to
21 376121525155401592ˆ x x y minusminus=
The regression coefficients are interpreted as follows
1b = ndash55251 Holding constant the attic insulation and the house style for each additional
1degF increase in atmospheric temperature you estimate that the predicted
heating oil consumption decreases by 55251 gallons
2b = ndash213761 Holding constant the atmospheric temperature and the house style for each
additional 1-inch increase in attic insulation you estimate that the predicted
heating oil consumption decreases by 213761 gallons
3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)
compared with having a house that is not ranch style ( 3 x = 0) Thus with
atmospheric temperature and attic insulation held constant you estimate that the
predicted heating oil consumption is 389727 gallons less for a ranch-style house
than for a house that is not ranch style
(c) Does each of the three variables make a significant contribution to the regression model
The three t -test statistics representing the slopes for temperature insulation and ranch style are
ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small
(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage
is explained by variation in temperature insulation and whether the house is ranch style
(d) Determine whether adding the interaction terms makes a significant contribution to the model
To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows
214 x x x = (interaction between temperature and insulation)
315 x x x = (interaction between temperature and style)
326 x x x = (interaction between insulation and style)
The regression model is now
ε β β β β β β α +++++++= 665544332211 x x x x x x y
The regression results for this model are
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515
- 5 -
Regression Statistics
Multiple R 09966
R Square 09931
Adjusted R Square 09880
Standard Error 142506
Observations 15
ANOVA
df SS MS F Significance F
Regression 6 2345105818 390850970 1924607 00000
Residual 8 16246475 2030809
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 6428867 267059 240728 00000 5813028 7044706
Temperature -69263 07531 -91969 00000 -86629 -51896
Insulation -278825 35801 -77882 00001 -361383 -196268
Style -846088 299956 -28207 00225 -1537787 -154389
TempInsulation 01702 00886 19204 00911 -00342 03746
TempStyle 06596 04617 14286 01910 -04051 17242
InsulationStyle 49870 35137 14193 01936 -31156 130895
To test whether the three interactions significantly improve the regression model you use the
partial F test The null and alternative hypotheses are
0 6540 === β β β H (There are no interactions among 21 x x and 3 x )
a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x
andor 2 x interacts with 3 x )
From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809
From the reduced regression output (see part (a))SSE (reduced) = 27283200
The test statistic is
811510809203
8908367
0809203
3
6475162432002728
(full)
termsextraof number
(full)(reduced)
==
minus
=
minus
=
MSE
SSE SSE
F
df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 215
- 2 -
This test statistic measures how much the sum of squared residuals SSE decreases by including
the extra variables in the equation It must decrease by some amount because the sum of squaredresiduals cannot increase when extra variables are added to an equation But if it does not
decrease sufficiently the extra variables might not explain enough to justify their inclusion in the
equation and they should probably be excluded
If the null hypothesis is true this test statistic has an F distribution with df1 = k ndash j and
df2 = n ndash k ndash 1 degrees of freedom If the corresponding P-value is sufficiently smallyou can reject the null hypothesis that the extra variables have no explanatory power
To perform the partial F test in Excel run two regressions one for the reduced model
(with explanatory variables 1 x through j x ) and one for the full model (with explanatory
variables 1 x through k x ) and use the appropriate values from their ANOVA tables to calculate
the F test statistic Then use Excels FDIST function to calculate the corresponding P-value
Reminder The ANOVA table for the full equation has the following form
Source of
Variation
Degrees
of
Freedom
Sum
of Squares
Mean Squares
(Variance)F statistic P-value
Regression k SSRk
SSR MSR =
MSE
MSRF = Prob gt F
Error n ndash k ndash 1 SSE1minusminus
=
k n
SSE MSE
Total 1minusn SST
Notes
1 Many users look only at the 2r and se values to check whether extra variables are doing a
ldquogood jobrdquo For example they might cite that 2r went from 80 to 90 or that se wentfrom 500 to 400 as evidence that extra variables provide a ldquosignificantlyrdquo better fit
Although these are important indicators they are not the basis for a formal hypothesis test
The partial F test is the formal test of significance for an extra set of variables
2 If the partial F test shows that a group of variables is significant it does not imply that each
variable in this group is significant Some of these variables can have low t -values(and consequently large P-values) Some analysts favor excluding the individual variables
that arent significant whereas others favor keeping the whole group or excluding the whole
group Either approach is valid Fortunately the final model building results are often nearlythe same either way
3 StatTools performs partial F tests as part of the procedure for building regression models
when the option Block is chosen in the Regression Type dropdown list
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315
- 3 -
Example 1
Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses
based on the effect of atmospheric temperature and the amount of attic insulation
Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses
1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style
1 2753 40 3 1
2 3638 27 3 0
M M M M M
14 323 38 3 0
15 525 58 10 0
(a) Develop and analyze an appropriate regression model
The explanatory variables considered are
1 x ndash atmospheric temperature
2 x ndash the amount of attic insulation
3 x ndash dummy variable = 1 if the style is ranch 0 otherwise
Assuming that the slope between heating oil consumption and atmospheric temperature 1 x
and between heating oil consumption and the amount of attic insulation 2 x is the same for
both styles of houses the regression model is
ε β β β α ++++= 332211 x x x y
The regression results for this model are
Regression Statistics
Multiple R 09942
R Square 09884
Adjusted R Square 09853
Standard Error 157489
Observations 15
ANOVA
df SS MS F Significance F
Regression 3 2334069094 778023031 3136822 00000
Residual 11 27283200 2480291
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 5925401 143370 413295 00000 5609846 6240956
Temperature -55251 02044 -270267 00000 -59751 -50752
Insulation -213761 14480 -147623 00000 -245632 -181891
Style -389727 83584 -46627 00007 -573695 -205759
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415
- 4 -
(b) Interpret the regression coefficients
The regression equation is
321 972738376121525155401592ˆ x x x y minusminusminus=
Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style
For houses that are ranch style because 3 x = 1 the regression equation reduces to
21 376121525155674553ˆ x x y minusminus=
For houses that are not ranch style because 3 x = 0 the regression equation reduces to
21 376121525155401592ˆ x x y minusminus=
The regression coefficients are interpreted as follows
1b = ndash55251 Holding constant the attic insulation and the house style for each additional
1degF increase in atmospheric temperature you estimate that the predicted
heating oil consumption decreases by 55251 gallons
2b = ndash213761 Holding constant the atmospheric temperature and the house style for each
additional 1-inch increase in attic insulation you estimate that the predicted
heating oil consumption decreases by 213761 gallons
3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)
compared with having a house that is not ranch style ( 3 x = 0) Thus with
atmospheric temperature and attic insulation held constant you estimate that the
predicted heating oil consumption is 389727 gallons less for a ranch-style house
than for a house that is not ranch style
(c) Does each of the three variables make a significant contribution to the regression model
The three t -test statistics representing the slopes for temperature insulation and ranch style are
ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small
(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage
is explained by variation in temperature insulation and whether the house is ranch style
(d) Determine whether adding the interaction terms makes a significant contribution to the model
To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows
214 x x x = (interaction between temperature and insulation)
315 x x x = (interaction between temperature and style)
326 x x x = (interaction between insulation and style)
The regression model is now
ε β β β β β β α +++++++= 665544332211 x x x x x x y
The regression results for this model are
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515
- 5 -
Regression Statistics
Multiple R 09966
R Square 09931
Adjusted R Square 09880
Standard Error 142506
Observations 15
ANOVA
df SS MS F Significance F
Regression 6 2345105818 390850970 1924607 00000
Residual 8 16246475 2030809
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 6428867 267059 240728 00000 5813028 7044706
Temperature -69263 07531 -91969 00000 -86629 -51896
Insulation -278825 35801 -77882 00001 -361383 -196268
Style -846088 299956 -28207 00225 -1537787 -154389
TempInsulation 01702 00886 19204 00911 -00342 03746
TempStyle 06596 04617 14286 01910 -04051 17242
InsulationStyle 49870 35137 14193 01936 -31156 130895
To test whether the three interactions significantly improve the regression model you use the
partial F test The null and alternative hypotheses are
0 6540 === β β β H (There are no interactions among 21 x x and 3 x )
a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x
andor 2 x interacts with 3 x )
From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809
From the reduced regression output (see part (a))SSE (reduced) = 27283200
The test statistic is
811510809203
8908367
0809203
3
6475162432002728
(full)
termsextraof number
(full)(reduced)
==
minus
=
minus
=
MSE
SSE SSE
F
df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315
- 3 -
Example 1
Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses
based on the effect of atmospheric temperature and the amount of attic insulation
Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses
1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style
1 2753 40 3 1
2 3638 27 3 0
M M M M M
14 323 38 3 0
15 525 58 10 0
(a) Develop and analyze an appropriate regression model
The explanatory variables considered are
1 x ndash atmospheric temperature
2 x ndash the amount of attic insulation
3 x ndash dummy variable = 1 if the style is ranch 0 otherwise
Assuming that the slope between heating oil consumption and atmospheric temperature 1 x
and between heating oil consumption and the amount of attic insulation 2 x is the same for
both styles of houses the regression model is
ε β β β α ++++= 332211 x x x y
The regression results for this model are
Regression Statistics
Multiple R 09942
R Square 09884
Adjusted R Square 09853
Standard Error 157489
Observations 15
ANOVA
df SS MS F Significance F
Regression 3 2334069094 778023031 3136822 00000
Residual 11 27283200 2480291
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 5925401 143370 413295 00000 5609846 6240956
Temperature -55251 02044 -270267 00000 -59751 -50752
Insulation -213761 14480 -147623 00000 -245632 -181891
Style -389727 83584 -46627 00007 -573695 -205759
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415
- 4 -
(b) Interpret the regression coefficients
The regression equation is
321 972738376121525155401592ˆ x x x y minusminusminus=
Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style
For houses that are ranch style because 3 x = 1 the regression equation reduces to
21 376121525155674553ˆ x x y minusminus=
For houses that are not ranch style because 3 x = 0 the regression equation reduces to
21 376121525155401592ˆ x x y minusminus=
The regression coefficients are interpreted as follows
1b = ndash55251 Holding constant the attic insulation and the house style for each additional
1degF increase in atmospheric temperature you estimate that the predicted
heating oil consumption decreases by 55251 gallons
2b = ndash213761 Holding constant the atmospheric temperature and the house style for each
additional 1-inch increase in attic insulation you estimate that the predicted
heating oil consumption decreases by 213761 gallons
3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)
compared with having a house that is not ranch style ( 3 x = 0) Thus with
atmospheric temperature and attic insulation held constant you estimate that the
predicted heating oil consumption is 389727 gallons less for a ranch-style house
than for a house that is not ranch style
(c) Does each of the three variables make a significant contribution to the regression model
The three t -test statistics representing the slopes for temperature insulation and ranch style are
ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small
(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage
is explained by variation in temperature insulation and whether the house is ranch style
(d) Determine whether adding the interaction terms makes a significant contribution to the model
To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows
214 x x x = (interaction between temperature and insulation)
315 x x x = (interaction between temperature and style)
326 x x x = (interaction between insulation and style)
The regression model is now
ε β β β β β β α +++++++= 665544332211 x x x x x x y
The regression results for this model are
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515
- 5 -
Regression Statistics
Multiple R 09966
R Square 09931
Adjusted R Square 09880
Standard Error 142506
Observations 15
ANOVA
df SS MS F Significance F
Regression 6 2345105818 390850970 1924607 00000
Residual 8 16246475 2030809
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 6428867 267059 240728 00000 5813028 7044706
Temperature -69263 07531 -91969 00000 -86629 -51896
Insulation -278825 35801 -77882 00001 -361383 -196268
Style -846088 299956 -28207 00225 -1537787 -154389
TempInsulation 01702 00886 19204 00911 -00342 03746
TempStyle 06596 04617 14286 01910 -04051 17242
InsulationStyle 49870 35137 14193 01936 -31156 130895
To test whether the three interactions significantly improve the regression model you use the
partial F test The null and alternative hypotheses are
0 6540 === β β β H (There are no interactions among 21 x x and 3 x )
a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x
andor 2 x interacts with 3 x )
From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809
From the reduced regression output (see part (a))SSE (reduced) = 27283200
The test statistic is
811510809203
8908367
0809203
3
6475162432002728
(full)
termsextraof number
(full)(reduced)
==
minus
=
minus
=
MSE
SSE SSE
F
df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415
- 4 -
(b) Interpret the regression coefficients
The regression equation is
321 972738376121525155401592ˆ x x x y minusminusminus=
Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style
For houses that are ranch style because 3 x = 1 the regression equation reduces to
21 376121525155674553ˆ x x y minusminus=
For houses that are not ranch style because 3 x = 0 the regression equation reduces to
21 376121525155401592ˆ x x y minusminus=
The regression coefficients are interpreted as follows
1b = ndash55251 Holding constant the attic insulation and the house style for each additional
1degF increase in atmospheric temperature you estimate that the predicted
heating oil consumption decreases by 55251 gallons
2b = ndash213761 Holding constant the atmospheric temperature and the house style for each
additional 1-inch increase in attic insulation you estimate that the predicted
heating oil consumption decreases by 213761 gallons
3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)
compared with having a house that is not ranch style ( 3 x = 0) Thus with
atmospheric temperature and attic insulation held constant you estimate that the
predicted heating oil consumption is 389727 gallons less for a ranch-style house
than for a house that is not ranch style
(c) Does each of the three variables make a significant contribution to the regression model
The three t -test statistics representing the slopes for temperature insulation and ranch style are
ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small
(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage
is explained by variation in temperature insulation and whether the house is ranch style
(d) Determine whether adding the interaction terms makes a significant contribution to the model
To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows
214 x x x = (interaction between temperature and insulation)
315 x x x = (interaction between temperature and style)
326 x x x = (interaction between insulation and style)
The regression model is now
ε β β β β β β α +++++++= 665544332211 x x x x x x y
The regression results for this model are
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515
- 5 -
Regression Statistics
Multiple R 09966
R Square 09931
Adjusted R Square 09880
Standard Error 142506
Observations 15
ANOVA
df SS MS F Significance F
Regression 6 2345105818 390850970 1924607 00000
Residual 8 16246475 2030809
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 6428867 267059 240728 00000 5813028 7044706
Temperature -69263 07531 -91969 00000 -86629 -51896
Insulation -278825 35801 -77882 00001 -361383 -196268
Style -846088 299956 -28207 00225 -1537787 -154389
TempInsulation 01702 00886 19204 00911 -00342 03746
TempStyle 06596 04617 14286 01910 -04051 17242
InsulationStyle 49870 35137 14193 01936 -31156 130895
To test whether the three interactions significantly improve the regression model you use the
partial F test The null and alternative hypotheses are
0 6540 === β β β H (There are no interactions among 21 x x and 3 x )
a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x
andor 2 x interacts with 3 x )
From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809
From the reduced regression output (see part (a))SSE (reduced) = 27283200
The test statistic is
811510809203
8908367
0809203
3
6475162432002728
(full)
termsextraof number
(full)(reduced)
==
minus
=
minus
=
MSE
SSE SSE
F
df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515
- 5 -
Regression Statistics
Multiple R 09966
R Square 09931
Adjusted R Square 09880
Standard Error 142506
Observations 15
ANOVA
df SS MS F Significance F
Regression 6 2345105818 390850970 1924607 00000
Residual 8 16246475 2030809
Total 14 2361352293
Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept 6428867 267059 240728 00000 5813028 7044706
Temperature -69263 07531 -91969 00000 -86629 -51896
Insulation -278825 35801 -77882 00001 -361383 -196268
Style -846088 299956 -28207 00225 -1537787 -154389
TempInsulation 01702 00886 19204 00911 -00342 03746
TempStyle 06596 04617 14286 01910 -04051 17242
InsulationStyle 49870 35137 14193 01936 -31156 130895
To test whether the three interactions significantly improve the regression model you use the
partial F test The null and alternative hypotheses are
0 6540 === β β β H (There are no interactions among 21 x x and 3 x )
a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x
andor 2 x interacts with 3 x )
From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809
From the reduced regression output (see part (a))SSE (reduced) = 27283200
The test statistic is
811510809203
8908367
0809203
3
6475162432002728
(full)
termsextraof number
(full)(reduced)
==
minus
=
minus
=
MSE
SSE SSE
F
df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615
- 6 -
Because of the large P-value you conclude that the interactions do not make a significant
contribution to the model given that the model already includes temperature 1 x insulation 2 x
and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x
and 3 x but no interaction terms is the better model
If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model
Adjusted r2
Adding new explanatory variables will always keep the 2r value the same or increase it
it can never decrease it In general adding explanatory variables to the model causes the
prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing
SST
SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger
even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo
where you keep adding variables to the model some of which have no conceptual relationship to
the response variable just to inflate the 2r value
To avoid overestimating the impact of adding an explanatory variable on the amount of
variability explained by the estimated regression equation many analysts prefer adjusting 2r for
the number of explanatory variables The adjusted r 2 is defined as
( )1
111 22
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease
when unnecessary explanatory variables are added to the regression model Therefore it serves
as an index that you can monitor If you add variables and the adjusted 2r decreases the extra
variables are essentially not pulling their weight and should probably be omitted
For the full model of the Heating Oil Consumption example (with the interaction terms)
n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is
( ) ( ) 98800)751)(00690(11615
1159931011
1
111 22
=minus=
minusminus
minusminusminus=
minusminus
minusminusminus=
k n
nr r adj
The adjusted 2r for the reduced model (without the interaction terms) is 09853
The adjusted 2r for the full model indicates too small an improvement in explaining the variation
in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant
Note It can happen that the value of2
adjr is negative This is not a mistake but a result of a model that fit
the data very poorly In this case some software systems set2
adjr equal to 0 Excel will print the actual va
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715
- 7 -
pC Statistic
Another measure often used in the evaluation of competing regression models is the pC statistic
developed by Mallows The formula for computing pC is
)22()full(
)(minusminusminus= k n
MSE
k SSE C
p
where
)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2
MSE (full) is the mean square error for a regression model that has all explanatory variables included
Theory says that if the value of pC is large then the mean square error of the fitted values is large
indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is
much greater than k + 1 then there is a large bias component in the regression usually indicating
omission of an important variable Therefore when evaluating which regression is best it is
recommended that regressions with small pC values and those with values near k + 1 be considered
Although the pC measure is highly recommended as a useful criterion in choosing between
alternate regressions keep in mind that the bias is measured with respect to the total group of
variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group
IncludeExclude Decisions
Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly
the most difficult part of any real regression analysis problem You are always trying to get the
best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer
explanatory variables are easier to interpret and are less likely to be affected by interaction or
collinearity problems On the other hand more variables certainly increase 2r and they usually
reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model
The best regression models in addition to satisfying the conditions of multiple regression have
bull Relatively few explanatory variables
bull Relatively high 2r and2
adjr indicating that much of the variability in y is accounted for by
the regression model
bull A small value of pC (close to or less than k + 1)
bull A relatively small value of es the standard deviation of the residuals indicating that the
magnitude of the errors is small
bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than
a simple summary with the mean and that the individual parameters are reliably different from zero
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815
- 8 -
Here are several guidelines for including and excluding variables These guidelines are not
ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful
Guidelines for IncludingExcluding Variables in a Regression Model
1 Look at a variables t -value and its associated P-value If the P-value is above some accepted
significance level such as 005 this variable is a candidate for exclusion
2 It is a mathematical fact that
ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded
from the equation
ndash If t -value gt 1 the opposite will occur
Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on
statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach
3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values
are less relevant Instead a partial F test can be used to make the includeexclude decision
4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical
relationship with the response variable and their low t -values possibly the result of an
unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable
might have a significant t -value just by chance This does not necessarily mean that it should
be included in the equation
You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo
If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat
cleaner model and you probably wont see any dramatic shifts in pC 2r 2
adjr or es
On the other hand if you decide to keep such a variable in the model the model is less parsimonious
and you have one more variable to interpret but otherwise there is no real penalty for including it
In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what
makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to
favor those that are less expensive to measure The statistician George Boc who had an
illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915
- 9 -
Variable Selection Procedures
Model building is the process of developing an estimated regression equation that describes the
relationship between a response variable and one or more explanatory variables
The major issues in model building are finding the proper functional form of the relationship and
selecting the explanatory variables to be included in the model
Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables
according to prescribed rules These rules can vary from package to package but usually the t test
for the slope or the partial F test is used and the corresponding P-value serves as a criterion to
determine whether variables are added or deleted The levels of significance 1α and 2α for
determining whether an explanatory variable should be entered into the model or removed from
the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010
The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression
Today many businesses use these variable selection procedures as part of the research technique
called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables
The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution
The forward selection procedure does not permit a variable to be removed from the model once it
has been entered The procedure stops if the P-value for each of the explanatory variables not in
the model is greater than the prescribed P-value to Enter
The backward elimination procedure begins with a model that includes all potential
explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the
explanatory variables in the model have a P-value greater than P-value to Leave
The stepwise regression procedure is much like a forward procedure except that it also considers
possible deletions along the way Because of the nature of the stepwise regression procedure
an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can
be removed from or entered into the model
The best subsets regression procedure works by trying possible subsets from the list of possible
explanatory variables This procedure does not actually compute all possible regressions
There are ways to exclude models known to be worse than some already examined models
Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on
The user can then select the best model based on such measures as pC 2r 2
adjr es
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015
- 10 -
In most cases the final results of these four procedures are very similar However there is no guarantee
that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied
Excel does not come with any variable selection techniques built in StatTools can be used for
forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques
Example 2
Standby Hours
The operations manager at WTT-TV station is looking for ways to reduce labor expenses
Currently the graphic artists at the station receive hourly pay for a significant number of hours
during which they are idle These hours are called standby hours The operations manager wants to
determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related
to the excessive number of standby hours the station is currently experiencing
1 x ndash the total number of staff present
2 x ndash remote hours
3 x ndash Dubner hours
4 x ndash total labor hours
The data are organized and stored in Standbyxlsx
Week Standby Total Staff Remote Dubner Total Labor
1 245 338 414 323 2001
2 177 333 598 340 2030M M M M M M
25 261 315 164 223 1839
26 232 331 270 272 1935
How to build a multiple regression model with the most appropriate mix of explanatory variables
Solution
(a) Compute the variance inflation factors to measure the amount of collinearity among the
explanatory variables (Reminder2
11
j
jr
VIF minus
= )
This is always a good starting point for any multiple regression analysis It involves running
four regressions ndash one regression for each explanatory variable against the other x variables
The following table summarizes the results
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115
- 11 -
Total Staff Remote Dubner Total Labor
and all other X and all other X and all other X and all other X
Multiple R 06437 04349 05610 07070
R Square 04143 01891 03147 04998
Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118
Observations 26 26 26 26
VIF 17074 12333 14592 19993
All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to
a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables
(b) Run forward selection backward elimination and stepwise regression and compare the results
StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)
The correlations between the response variable and the explanatory variables are
Total Staff Remote Dubner Total Labor
Standby 06050 ndash 00953 ndash 02443 04136
As the computer output shows the forward selection and stepwise regression methods
produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours
(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it
in the final output) Because it is less than 005 total staff is included in the regression model
The next step involves selecting a second independent variable for the model The second variable
chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269
for remote hours is less than 005 remote hours is included in the regression model
After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be
eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model
The next step involves selecting a third independent variable for the model Because none of
the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours
The backward elimination procedure produces a model that includes all explanatory variables
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215
- 12 -
Forward Selection
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090
Stepwise Regression
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092
983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089
983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088
983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156
983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154
983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315
- 13 -
Backward Elimination983117983157983148983156983145983152983148983141
983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142
983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141
983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141
983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155
983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091
983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095
983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140
983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093
983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154
983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091
983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094
983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092
983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096
983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097
983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141
983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156
983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154
983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088
(c) Which of the two models suggested by the above procedures would you choose based on the
pC selection criterion
The model suggested by the forward selection and stepwise regression procedures includes
two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination
procedure suggests the ldquofullrdquo model ndash with all explanatory variables included
For the model suggested by the forward selection and stepwise regression procedures
n = 26 k = 2)(k SSE = 288020725
MSE (full) = 10134677
4193820419328)2426(46771013
072528802)22(
)full(
)(=minus=minusminusminus=minusminusminus= k n
MSE
k SSE C p
For the model suggested by the backward elimination proceduren = 26 k = 4
)(k SSE = SSE (full) = 212828217
MSE (full) = 10134677
51621)2826(46771013
821721282)22()full(
)(=minus=minusminusminus=minusminusminus= k n MSE
k SSE C p
The model chosen by the forward selection and stepwise regression procedures has a pC value of
84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model
For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5
Thus according to the pC criterion the model including all four variables is the better model
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415
- 14 -
(d) Below are the results from the best subsets regression procedure of all possible regression
models for the standby hours data Which is the best model
Model k + 1 pC 2r 2
adjr es
X1 2 1332 03660 03396 3862
X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703
X4 2 2418 01710 01365 4416
X1X2 3 842 04899 04456 3539
X1X3 3 1065 04499 04021 3675
X1X4 3 1480 03754 03211 3916
X2X3 3 3231 00612 ndash 00205 4801
X2X4 3 2325 02238 01563 4365
X3X4 3 1182 04288 03791 3745
X1X2X3 4 784 05362 04729 3450
X1X2X4 4 934 05092 04423 3549
X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726
X1X2X3X4 5 500 06231 05513 3184
Because model building requires you to compare models with different numbers of explanatory
variables the adjusted coefficient of determination2
adjr is more appropriate than 2r
(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this
criterion the best model is the model with all four explanatory variables
The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1
Note Although it was not the case here the pC statistic often provides several alternative
models for you to evaluate in greater depth Moreover the best model or models using the pC
criterion might differ from the model selected using the adjusted 2r andor the models selected
using the three procedures discussed in (a) through (c)
(e) Perform a residual analysis to evaluate the regression assumptions for the best model
The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model
None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the
predicted values of y does not show any patterns or evidence of unequal variance
The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155
8172019 14_Building_Regression_Models_Part1pdf
httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515
- 15 -
983085983094983088
983085983089983088
983092983088
983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983124983151983156983137983148 983123983156983137983142983142
983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983122983141983149983151983156983141
983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983108983157983138983150983141983154
983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157
983137 983148 983155
983124983151983156983137983148 983116983137983138983151983154
983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156
983085983094983088
983085983089983088
983092983088
983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155
983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161
983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156
983088
983090
983092
983094
983096
983089983088
983085 983092 983090 983086
983096 983091
983085 983090 983092 983086
983091 983092
983085 983093 983086
983096 983093
983089 983090 983086
983094 983091
983091 983089 983086
983089 983090
983092 983097 983086
983094 983089
983110 983154 983141 983153 983157 983141 983150 983139 983161
983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155
983085983094983088
983085983089983088
983092983088
983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155
983127983141983141983147
983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155