+ All Categories
Home > Documents > 14_Building_Regression_Models_Part1.pdf

14_Building_Regression_Models_Part1.pdf

Date post: 06-Jul-2018
Category:
Upload: rama-dulce
View: 217 times
Download: 0 times
Share this document with a friend
16
- 1 - BUILDING REGRESSION MODELS – PART 1 Topics Outline  Partial F  Test  Adjusted 2 r    p C  Statistic  Include/Exclude Decisions  Variable Selection Procedures  Partial F Test There are many situations where a set of explanatory variables form a logical group. It is then common to include all of the variables in the equation or exclude all of them. An example of this is when one of the explanatory variables is categorical with more than two categories. In this case you model it by including dummy variables – one fewer than the number of categories. If you decide that the categorical variable is worth including, you might want to keep all of the dummies. Otherwise, you might decide to exclude all of them. Consider the following general situation. Suppose you have already estimated a reduced multiple regression model that includes the variables 1  x  through  j  x : ε  β  β α + + + =  j  j  x  x  y  L 1 1  Now you are proposing to estimate a larger, referred to as full, model that includes 1 +  j  x  through k  x  in addition to the variables 1  x  through  j  x : ε  β  β  β  β α  + + + + + = + + k k  j  j  j  j  x  x  x  x  y  L L 1 1 1 1  That is, the full model includes all of the variables from the smaller model, but it also includes k – j extra variables. The partial F test is used to determine whether the extra variables provide enough extra explanatory power as a group to warrant their inclusion in the equation. In other words, the partial F  test tests whether the full model is significantly better than the reduced model. The null and alternative hypotheses can be stated as follows. 0 : 1 0  = = = + k  j  H  β  β  L  (The extra variables have no effect on  y.) : a  H  At least one of k  j  β  β , , 1 K + is not zero (At least one of the extra variables has an effect on  y.) The test statistic is: (full)  terms extra of number (full) (reduced)  MSE SSE SSE F =  
Transcript
Page 1: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 115

- 1 -

BUILDING REGRESSION MODELS ndash PART 1

Topics Outline

bull Partial F Test

bull Adjusted 2r

bull pC Statistic

bull IncludeExclude Decisions

bull Variable Selection Procedures

Partial F Test

There are many situations where a set of explanatory variables form a logical group It is then

common to include all of the variables in the equation or exclude all of them An example of this

is when one of the explanatory variables is categorical with more than two categories

In this case you model it by including dummy variables ndash one fewer than the number of categories

If you decide that the categorical variable is worth including you might want to keep all of thedummies Otherwise you might decide to exclude all of them

Consider the following general situation Suppose you have already estimated a reduced

multiple regression model that includes the variables 1 x through j x

ε β β α +++= j j x x y L11

Now you are proposing to estimate a larger referred to as full model that includes 1+ j x through k x

in addition to the variables 1 x through j x

ε β β β β α +++++= ++ k k j j j j x x x x y LL 1111

That is the full model includes all of the variables from the smaller model but it also includes

k ndash j extra variables

The partial F test is used to determine whether the extra variables provide enough extra

explanatory power as a group to warrant their inclusion in the equation In other words

the partial F test tests whether the full model is significantly better than the reduced model

The null and alternative hypotheses can be stated as follows

0 10 ===

+ k j H β β L (The extra variables have no effect on y)a H At least one of

k j β β 1 K+is not zero (At least one of the extra variables has an effect on y)

The test statistic is

(full)

termsextraof number

(full)(reduced)

MSE

SSE SSE

F

minus

=

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 215

- 2 -

This test statistic measures how much the sum of squared residuals SSE decreases by including

the extra variables in the equation It must decrease by some amount because the sum of squaredresiduals cannot increase when extra variables are added to an equation But if it does not

decrease sufficiently the extra variables might not explain enough to justify their inclusion in the

equation and they should probably be excluded

If the null hypothesis is true this test statistic has an F distribution with df1 = k ndash j and

df2 = n ndash k ndash 1 degrees of freedom If the corresponding P-value is sufficiently smallyou can reject the null hypothesis that the extra variables have no explanatory power

To perform the partial F test in Excel run two regressions one for the reduced model

(with explanatory variables 1 x through j x ) and one for the full model (with explanatory

variables 1 x through k x ) and use the appropriate values from their ANOVA tables to calculate

the F test statistic Then use Excels FDIST function to calculate the corresponding P-value

Reminder The ANOVA table for the full equation has the following form

Source of

Variation

Degrees

of

Freedom

Sum

of Squares

Mean Squares

(Variance)F statistic P-value

Regression k SSRk

SSR MSR =

MSE

MSRF = Prob gt F

Error n ndash k ndash 1 SSE1minusminus

=

k n

SSE MSE

Total 1minusn SST

Notes

1 Many users look only at the 2r and se values to check whether extra variables are doing a

ldquogood jobrdquo For example they might cite that 2r went from 80 to 90 or that se wentfrom 500 to 400 as evidence that extra variables provide a ldquosignificantlyrdquo better fit

Although these are important indicators they are not the basis for a formal hypothesis test

The partial F test is the formal test of significance for an extra set of variables

2 If the partial F test shows that a group of variables is significant it does not imply that each

variable in this group is significant Some of these variables can have low t -values(and consequently large P-values) Some analysts favor excluding the individual variables

that arent significant whereas others favor keeping the whole group or excluding the whole

group Either approach is valid Fortunately the final model building results are often nearlythe same either way

3 StatTools performs partial F tests as part of the procedure for building regression models

when the option Block is chosen in the Regression Type dropdown list

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315

- 3 -

Example 1

Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses

based on the effect of atmospheric temperature and the amount of attic insulation

Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses

1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style

1 2753 40 3 1

2 3638 27 3 0

M M M M M

14 323 38 3 0

15 525 58 10 0

(a) Develop and analyze an appropriate regression model

The explanatory variables considered are

1 x ndash atmospheric temperature

2 x ndash the amount of attic insulation

3 x ndash dummy variable = 1 if the style is ranch 0 otherwise

Assuming that the slope between heating oil consumption and atmospheric temperature 1 x

and between heating oil consumption and the amount of attic insulation 2 x is the same for

both styles of houses the regression model is

ε β β β α ++++= 332211 x x x y

The regression results for this model are

Regression Statistics

Multiple R 09942

R Square 09884

Adjusted R Square 09853

Standard Error 157489

Observations 15

ANOVA

df SS MS F Significance F

Regression 3 2334069094 778023031 3136822 00000

Residual 11 27283200 2480291

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 5925401 143370 413295 00000 5609846 6240956

Temperature -55251 02044 -270267 00000 -59751 -50752

Insulation -213761 14480 -147623 00000 -245632 -181891

Style -389727 83584 -46627 00007 -573695 -205759

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415

- 4 -

(b) Interpret the regression coefficients

The regression equation is

321 972738376121525155401592ˆ x x x y minusminusminus=

Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style

For houses that are ranch style because 3 x = 1 the regression equation reduces to

21 376121525155674553ˆ x x y minusminus=

For houses that are not ranch style because 3 x = 0 the regression equation reduces to

21 376121525155401592ˆ x x y minusminus=

The regression coefficients are interpreted as follows

1b = ndash55251 Holding constant the attic insulation and the house style for each additional

1degF increase in atmospheric temperature you estimate that the predicted

heating oil consumption decreases by 55251 gallons

2b = ndash213761 Holding constant the atmospheric temperature and the house style for each

additional 1-inch increase in attic insulation you estimate that the predicted

heating oil consumption decreases by 213761 gallons

3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)

compared with having a house that is not ranch style ( 3 x = 0) Thus with

atmospheric temperature and attic insulation held constant you estimate that the

predicted heating oil consumption is 389727 gallons less for a ranch-style house

than for a house that is not ranch style

(c) Does each of the three variables make a significant contribution to the regression model

The three t -test statistics representing the slopes for temperature insulation and ranch style are

ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small

(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage

is explained by variation in temperature insulation and whether the house is ranch style

(d) Determine whether adding the interaction terms makes a significant contribution to the model

To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows

214 x x x = (interaction between temperature and insulation)

315 x x x = (interaction between temperature and style)

326 x x x = (interaction between insulation and style)

The regression model is now

ε β β β β β β α +++++++= 665544332211 x x x x x x y

The regression results for this model are

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515

- 5 -

Regression Statistics

Multiple R 09966

R Square 09931

Adjusted R Square 09880

Standard Error 142506

Observations 15

ANOVA

df SS MS F Significance F

Regression 6 2345105818 390850970 1924607 00000

Residual 8 16246475 2030809

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 6428867 267059 240728 00000 5813028 7044706

Temperature -69263 07531 -91969 00000 -86629 -51896

Insulation -278825 35801 -77882 00001 -361383 -196268

Style -846088 299956 -28207 00225 -1537787 -154389

TempInsulation 01702 00886 19204 00911 -00342 03746

TempStyle 06596 04617 14286 01910 -04051 17242

InsulationStyle 49870 35137 14193 01936 -31156 130895

To test whether the three interactions significantly improve the regression model you use the

partial F test The null and alternative hypotheses are

0 6540 === β β β H (There are no interactions among 21 x x and 3 x )

a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x

andor 2 x interacts with 3 x )

From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809

From the reduced regression output (see part (a))SSE (reduced) = 27283200

The test statistic is

811510809203

8908367

0809203

3

6475162432002728

(full)

termsextraof number

(full)(reduced)

==

minus

=

minus

=

MSE

SSE SSE

F

df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 2: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 215

- 2 -

This test statistic measures how much the sum of squared residuals SSE decreases by including

the extra variables in the equation It must decrease by some amount because the sum of squaredresiduals cannot increase when extra variables are added to an equation But if it does not

decrease sufficiently the extra variables might not explain enough to justify their inclusion in the

equation and they should probably be excluded

If the null hypothesis is true this test statistic has an F distribution with df1 = k ndash j and

df2 = n ndash k ndash 1 degrees of freedom If the corresponding P-value is sufficiently smallyou can reject the null hypothesis that the extra variables have no explanatory power

To perform the partial F test in Excel run two regressions one for the reduced model

(with explanatory variables 1 x through j x ) and one for the full model (with explanatory

variables 1 x through k x ) and use the appropriate values from their ANOVA tables to calculate

the F test statistic Then use Excels FDIST function to calculate the corresponding P-value

Reminder The ANOVA table for the full equation has the following form

Source of

Variation

Degrees

of

Freedom

Sum

of Squares

Mean Squares

(Variance)F statistic P-value

Regression k SSRk

SSR MSR =

MSE

MSRF = Prob gt F

Error n ndash k ndash 1 SSE1minusminus

=

k n

SSE MSE

Total 1minusn SST

Notes

1 Many users look only at the 2r and se values to check whether extra variables are doing a

ldquogood jobrdquo For example they might cite that 2r went from 80 to 90 or that se wentfrom 500 to 400 as evidence that extra variables provide a ldquosignificantlyrdquo better fit

Although these are important indicators they are not the basis for a formal hypothesis test

The partial F test is the formal test of significance for an extra set of variables

2 If the partial F test shows that a group of variables is significant it does not imply that each

variable in this group is significant Some of these variables can have low t -values(and consequently large P-values) Some analysts favor excluding the individual variables

that arent significant whereas others favor keeping the whole group or excluding the whole

group Either approach is valid Fortunately the final model building results are often nearlythe same either way

3 StatTools performs partial F tests as part of the procedure for building regression models

when the option Block is chosen in the Regression Type dropdown list

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315

- 3 -

Example 1

Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses

based on the effect of atmospheric temperature and the amount of attic insulation

Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses

1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style

1 2753 40 3 1

2 3638 27 3 0

M M M M M

14 323 38 3 0

15 525 58 10 0

(a) Develop and analyze an appropriate regression model

The explanatory variables considered are

1 x ndash atmospheric temperature

2 x ndash the amount of attic insulation

3 x ndash dummy variable = 1 if the style is ranch 0 otherwise

Assuming that the slope between heating oil consumption and atmospheric temperature 1 x

and between heating oil consumption and the amount of attic insulation 2 x is the same for

both styles of houses the regression model is

ε β β β α ++++= 332211 x x x y

The regression results for this model are

Regression Statistics

Multiple R 09942

R Square 09884

Adjusted R Square 09853

Standard Error 157489

Observations 15

ANOVA

df SS MS F Significance F

Regression 3 2334069094 778023031 3136822 00000

Residual 11 27283200 2480291

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 5925401 143370 413295 00000 5609846 6240956

Temperature -55251 02044 -270267 00000 -59751 -50752

Insulation -213761 14480 -147623 00000 -245632 -181891

Style -389727 83584 -46627 00007 -573695 -205759

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415

- 4 -

(b) Interpret the regression coefficients

The regression equation is

321 972738376121525155401592ˆ x x x y minusminusminus=

Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style

For houses that are ranch style because 3 x = 1 the regression equation reduces to

21 376121525155674553ˆ x x y minusminus=

For houses that are not ranch style because 3 x = 0 the regression equation reduces to

21 376121525155401592ˆ x x y minusminus=

The regression coefficients are interpreted as follows

1b = ndash55251 Holding constant the attic insulation and the house style for each additional

1degF increase in atmospheric temperature you estimate that the predicted

heating oil consumption decreases by 55251 gallons

2b = ndash213761 Holding constant the atmospheric temperature and the house style for each

additional 1-inch increase in attic insulation you estimate that the predicted

heating oil consumption decreases by 213761 gallons

3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)

compared with having a house that is not ranch style ( 3 x = 0) Thus with

atmospheric temperature and attic insulation held constant you estimate that the

predicted heating oil consumption is 389727 gallons less for a ranch-style house

than for a house that is not ranch style

(c) Does each of the three variables make a significant contribution to the regression model

The three t -test statistics representing the slopes for temperature insulation and ranch style are

ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small

(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage

is explained by variation in temperature insulation and whether the house is ranch style

(d) Determine whether adding the interaction terms makes a significant contribution to the model

To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows

214 x x x = (interaction between temperature and insulation)

315 x x x = (interaction between temperature and style)

326 x x x = (interaction between insulation and style)

The regression model is now

ε β β β β β β α +++++++= 665544332211 x x x x x x y

The regression results for this model are

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515

- 5 -

Regression Statistics

Multiple R 09966

R Square 09931

Adjusted R Square 09880

Standard Error 142506

Observations 15

ANOVA

df SS MS F Significance F

Regression 6 2345105818 390850970 1924607 00000

Residual 8 16246475 2030809

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 6428867 267059 240728 00000 5813028 7044706

Temperature -69263 07531 -91969 00000 -86629 -51896

Insulation -278825 35801 -77882 00001 -361383 -196268

Style -846088 299956 -28207 00225 -1537787 -154389

TempInsulation 01702 00886 19204 00911 -00342 03746

TempStyle 06596 04617 14286 01910 -04051 17242

InsulationStyle 49870 35137 14193 01936 -31156 130895

To test whether the three interactions significantly improve the regression model you use the

partial F test The null and alternative hypotheses are

0 6540 === β β β H (There are no interactions among 21 x x and 3 x )

a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x

andor 2 x interacts with 3 x )

From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809

From the reduced regression output (see part (a))SSE (reduced) = 27283200

The test statistic is

811510809203

8908367

0809203

3

6475162432002728

(full)

termsextraof number

(full)(reduced)

==

minus

=

minus

=

MSE

SSE SSE

F

df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 3: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 315

- 3 -

Example 1

Heating Oil ConsumptionA real estate developer wants to predict the heating oil consumption in single-family houses

based on the effect of atmospheric temperature and the amount of attic insulation

Data are collected from a sample of 15 single-family houses Of the 15 houses selected houses

1 4 6 7 8 10 and 12 are ranch-style houses The data are organized and stored in Heating_OilxlsxHouse Gallons Temperature (F) Insulation (inch) Style

1 2753 40 3 1

2 3638 27 3 0

M M M M M

14 323 38 3 0

15 525 58 10 0

(a) Develop and analyze an appropriate regression model

The explanatory variables considered are

1 x ndash atmospheric temperature

2 x ndash the amount of attic insulation

3 x ndash dummy variable = 1 if the style is ranch 0 otherwise

Assuming that the slope between heating oil consumption and atmospheric temperature 1 x

and between heating oil consumption and the amount of attic insulation 2 x is the same for

both styles of houses the regression model is

ε β β β α ++++= 332211 x x x y

The regression results for this model are

Regression Statistics

Multiple R 09942

R Square 09884

Adjusted R Square 09853

Standard Error 157489

Observations 15

ANOVA

df SS MS F Significance F

Regression 3 2334069094 778023031 3136822 00000

Residual 11 27283200 2480291

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 5925401 143370 413295 00000 5609846 6240956

Temperature -55251 02044 -270267 00000 -59751 -50752

Insulation -213761 14480 -147623 00000 -245632 -181891

Style -389727 83584 -46627 00007 -573695 -205759

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415

- 4 -

(b) Interpret the regression coefficients

The regression equation is

321 972738376121525155401592ˆ x x x y minusminusminus=

Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style

For houses that are ranch style because 3 x = 1 the regression equation reduces to

21 376121525155674553ˆ x x y minusminus=

For houses that are not ranch style because 3 x = 0 the regression equation reduces to

21 376121525155401592ˆ x x y minusminus=

The regression coefficients are interpreted as follows

1b = ndash55251 Holding constant the attic insulation and the house style for each additional

1degF increase in atmospheric temperature you estimate that the predicted

heating oil consumption decreases by 55251 gallons

2b = ndash213761 Holding constant the atmospheric temperature and the house style for each

additional 1-inch increase in attic insulation you estimate that the predicted

heating oil consumption decreases by 213761 gallons

3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)

compared with having a house that is not ranch style ( 3 x = 0) Thus with

atmospheric temperature and attic insulation held constant you estimate that the

predicted heating oil consumption is 389727 gallons less for a ranch-style house

than for a house that is not ranch style

(c) Does each of the three variables make a significant contribution to the regression model

The three t -test statistics representing the slopes for temperature insulation and ranch style are

ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small

(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage

is explained by variation in temperature insulation and whether the house is ranch style

(d) Determine whether adding the interaction terms makes a significant contribution to the model

To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows

214 x x x = (interaction between temperature and insulation)

315 x x x = (interaction between temperature and style)

326 x x x = (interaction between insulation and style)

The regression model is now

ε β β β β β β α +++++++= 665544332211 x x x x x x y

The regression results for this model are

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515

- 5 -

Regression Statistics

Multiple R 09966

R Square 09931

Adjusted R Square 09880

Standard Error 142506

Observations 15

ANOVA

df SS MS F Significance F

Regression 6 2345105818 390850970 1924607 00000

Residual 8 16246475 2030809

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 6428867 267059 240728 00000 5813028 7044706

Temperature -69263 07531 -91969 00000 -86629 -51896

Insulation -278825 35801 -77882 00001 -361383 -196268

Style -846088 299956 -28207 00225 -1537787 -154389

TempInsulation 01702 00886 19204 00911 -00342 03746

TempStyle 06596 04617 14286 01910 -04051 17242

InsulationStyle 49870 35137 14193 01936 -31156 130895

To test whether the three interactions significantly improve the regression model you use the

partial F test The null and alternative hypotheses are

0 6540 === β β β H (There are no interactions among 21 x x and 3 x )

a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x

andor 2 x interacts with 3 x )

From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809

From the reduced regression output (see part (a))SSE (reduced) = 27283200

The test statistic is

811510809203

8908367

0809203

3

6475162432002728

(full)

termsextraof number

(full)(reduced)

==

minus

=

minus

=

MSE

SSE SSE

F

df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 4: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 415

- 4 -

(b) Interpret the regression coefficients

The regression equation is

321 972738376121525155401592ˆ x x x y minusminusminus=

Predicted Consumption = 5925401 ndash 55251Temperature ndash 213761 Insulation ndash 389727Style

For houses that are ranch style because 3 x = 1 the regression equation reduces to

21 376121525155674553ˆ x x y minusminus=

For houses that are not ranch style because 3 x = 0 the regression equation reduces to

21 376121525155401592ˆ x x y minusminus=

The regression coefficients are interpreted as follows

1b = ndash55251 Holding constant the attic insulation and the house style for each additional

1degF increase in atmospheric temperature you estimate that the predicted

heating oil consumption decreases by 55251 gallons

2b = ndash213761 Holding constant the atmospheric temperature and the house style for each

additional 1-inch increase in attic insulation you estimate that the predicted

heating oil consumption decreases by 213761 gallons

3b = ndash389727 b3 measures the effect on oil consumption of having a ranch-style house ( 3 x = 1)

compared with having a house that is not ranch style ( 3 x = 0) Thus with

atmospheric temperature and attic insulation held constant you estimate that the

predicted heating oil consumption is 389727 gallons less for a ranch-style house

than for a house that is not ranch style

(c) Does each of the three variables make a significant contribution to the regression model

The three t -test statistics representing the slopes for temperature insulation and ranch style are

ndash 270267 ndash 147623 and ndash 46627 Each of the corresponding P-values is extremely small

(less than 0001) Thus each of the three variables makes a significant contribution to the modelIn addition the coefficient of determination indicates that 9884 of the variation in oil usage

is explained by variation in temperature insulation and whether the house is ranch style

(d) Determine whether adding the interaction terms makes a significant contribution to the model

To evaluate possible interactions between the explanatory variables three interaction termsare constructed as follows

214 x x x = (interaction between temperature and insulation)

315 x x x = (interaction between temperature and style)

326 x x x = (interaction between insulation and style)

The regression model is now

ε β β β β β β α +++++++= 665544332211 x x x x x x y

The regression results for this model are

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515

- 5 -

Regression Statistics

Multiple R 09966

R Square 09931

Adjusted R Square 09880

Standard Error 142506

Observations 15

ANOVA

df SS MS F Significance F

Regression 6 2345105818 390850970 1924607 00000

Residual 8 16246475 2030809

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 6428867 267059 240728 00000 5813028 7044706

Temperature -69263 07531 -91969 00000 -86629 -51896

Insulation -278825 35801 -77882 00001 -361383 -196268

Style -846088 299956 -28207 00225 -1537787 -154389

TempInsulation 01702 00886 19204 00911 -00342 03746

TempStyle 06596 04617 14286 01910 -04051 17242

InsulationStyle 49870 35137 14193 01936 -31156 130895

To test whether the three interactions significantly improve the regression model you use the

partial F test The null and alternative hypotheses are

0 6540 === β β β H (There are no interactions among 21 x x and 3 x )

a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x

andor 2 x interacts with 3 x )

From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809

From the reduced regression output (see part (a))SSE (reduced) = 27283200

The test statistic is

811510809203

8908367

0809203

3

6475162432002728

(full)

termsextraof number

(full)(reduced)

==

minus

=

minus

=

MSE

SSE SSE

F

df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 5: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 515

- 5 -

Regression Statistics

Multiple R 09966

R Square 09931

Adjusted R Square 09880

Standard Error 142506

Observations 15

ANOVA

df SS MS F Significance F

Regression 6 2345105818 390850970 1924607 00000

Residual 8 16246475 2030809

Total 14 2361352293

Coefficients Standard Error t Stat P-value Lower 95 Upper 95

Intercept 6428867 267059 240728 00000 5813028 7044706

Temperature -69263 07531 -91969 00000 -86629 -51896

Insulation -278825 35801 -77882 00001 -361383 -196268

Style -846088 299956 -28207 00225 -1537787 -154389

TempInsulation 01702 00886 19204 00911 -00342 03746

TempStyle 06596 04617 14286 01910 -04051 17242

InsulationStyle 49870 35137 14193 01936 -31156 130895

To test whether the three interactions significantly improve the regression model you use the

partial F test The null and alternative hypotheses are

0 6540 === β β β H (There are no interactions among 21 x x and 3 x )

a H At least one of 654 β β β is not zero ( 1 x interacts with 2 x andor 1 x interacts with 3 x

andor 2 x interacts with 3 x )

From the full regression output (see above)SSE (full) = 16246475 MSE (full) = 2030809

From the reduced regression output (see part (a))SSE (reduced) = 27283200

The test statistic is

811510809203

8908367

0809203

3

6475162432002728

(full)

termsextraof number

(full)(reduced)

==

minus

=

minus

=

MSE

SSE SSE

F

df1 = k ndash j = 6 ndash 3 = 3 df2 = n ndash k ndash 1 = 15 ndash 6 ndash 1 = 8P-value = FDIST(1811538) = 02230

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 6: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 615

- 6 -

Because of the large P-value you conclude that the interactions do not make a significant

contribution to the model given that the model already includes temperature 1 x insulation 2 x

and whether the house is ranch style 3 x Therefore the multiple regression model using 21 x x

and 3 x but no interaction terms is the better model

If you rejected this null hypothesis you would then test the contribution of each interactionseparately in order to determine which interaction terms to include in the model

Adjusted r2

Adding new explanatory variables will always keep the 2r value the same or increase it

it can never decrease it In general adding explanatory variables to the model causes the

prediction errors to become smaller thus reducing the sum of squares due to error SSEBecause SSR = SST ndash SSE when SSE becomes smaller SSR becomes larger causing

SST

SSRr =2 to increase Therefore if a variable is added to the model 2r usually becomes larger

even if the variable added is not statistically significant This can lead to ldquofishing expeditionsrdquo

where you keep adding variables to the model some of which have no conceptual relationship to

the response variable just to inflate the 2r value

To avoid overestimating the impact of adding an explanatory variable on the amount of

variability explained by the estimated regression equation many analysts prefer adjusting 2r for

the number of explanatory variables The adjusted r 2 is defined as

( )1

111 22

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r imposes a ldquopenaltyrdquo for each new term that is added to the model in an attemptto make models of different sizes (numbers of explanatory variables) comparable It can decrease

when unnecessary explanatory variables are added to the regression model Therefore it serves

as an index that you can monitor If you add variables and the adjusted 2r decreases the extra

variables are essentially not pulling their weight and should probably be omitted

For the full model of the Heating Oil Consumption example (with the interaction terms)

n = 15 k = 6 and 2r = 09931 Thus the adjusted 2r is

( ) ( ) 98800)751)(00690(11615

1159931011

1

111 22

=minus=

minusminus

minusminusminus=

minusminus

minusminusminus=

k n

nr r adj

The adjusted 2r for the reduced model (without the interaction terms) is 09853

The adjusted 2r for the full model indicates too small an improvement in explaining the variation

in the consumption of heating oil to justify keeping the interaction terms in the model even if thepartial F test were significant

Note It can happen that the value of2

adjr is negative This is not a mistake but a result of a model that fit

the data very poorly In this case some software systems set2

adjr equal to 0 Excel will print the actual va

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 7: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 715

- 7 -

pC Statistic

Another measure often used in the evaluation of competing regression models is the pC statistic

developed by Mallows The formula for computing pC is

)22()full(

)(minusminusminus= k n

MSE

k SSE C

p

where

)(k SSE is the error sum of squares for a regression model that has k explanatory variables k = 1 2

MSE (full) is the mean square error for a regression model that has all explanatory variables included

Theory says that if the value of pC is large then the mean square error of the fitted values is large

indicating either a poor fit substantial bias in the fit or both In addition if the value of pC is

much greater than k + 1 then there is a large bias component in the regression usually indicating

omission of an important variable Therefore when evaluating which regression is best it is

recommended that regressions with small pC values and those with values near k + 1 be considered

Although the pC measure is highly recommended as a useful criterion in choosing between

alternate regressions keep in mind that the bias is measured with respect to the total group of

variables provided by the researcher This criterion cannot determine when the researcher hasforgotten about some variable not included in the total group

IncludeExclude Decisions

Finding the best xrsquos (or the best form of the xrsquos) to include in a regression model is undoubtedly

the most difficult part of any real regression analysis problem You are always trying to get the

best fit possible The principle of parsimony suggests using the fewest number of explanatoryvariables that can predict the response variable adequately Regression models with fewer

explanatory variables are easier to interpret and are less likely to be affected by interaction or

collinearity problems On the other hand more variables certainly increase 2r and they usually

reduce the standard error of estimate se This presents a trade-off which is the heart of thechallenge of selecting a good model

The best regression models in addition to satisfying the conditions of multiple regression have

bull Relatively few explanatory variables

bull Relatively high 2r and2

adjr indicating that much of the variability in y is accounted for by

the regression model

bull A small value of pC (close to or less than k + 1)

bull A relatively small value of es the standard deviation of the residuals indicating that the

magnitude of the errors is small

bull Relatively small P-values for the F - and t -statistics showing that the overall model is better than

a simple summary with the mean and that the individual parameters are reliably different from zero

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 8: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 815

- 8 -

Here are several guidelines for including and excluding variables These guidelines are not

ironclad rules They typically involve choices at the margin that is between equations that arevery similar and seem equally useful

Guidelines for IncludingExcluding Variables in a Regression Model

1 Look at a variables t -value and its associated P-value If the P-value is above some accepted

significance level such as 005 this variable is a candidate for exclusion

2 It is a mathematical fact that

ndash If t -value lt 1 then se will decrease and adjusted 2r will increase if this variable is excluded

from the equation

ndash If t -value gt 1 the opposite will occur

Because of this some statisticians advocate excluding variables with t -values less than 1 andincluding variables with t -values greater than 1 However analysts who base the decision on

statistical significance at the usual 5 level as in guideline 1 typically exclude a variablefrom the equation unless its t -value is at least 2 (approximately) This latter approach is morestringent ndash fewer variables will be retained ndash but it is probably the more popular approach

3 When there is a group of variables that are in some sense logically related it is sometimes agood idea to include all of them or exclude all of them In this case their individual t -values

are less relevant Instead a partial F test can be used to make the includeexclude decision

4 Use economic theoretical or practical considerations to decide whether to include or excludevariables Some variables might really belong in an equation because of their theoretical

relationship with the response variable and their low t -values possibly the result of an

unlucky sample should not necessarily disqualify them from being in the equationSimilarly a variable that has no economic or physical relationship with the response variable

might have a significant t -value just by chance This does not necessarily mean that it should

be included in the equation

You should not agonize too much about whether to include or exclude a variable ldquoat the marginrdquo

If you decide to exclude a variable that doesnt add much explanatory power you get a somewhat

cleaner model and you probably wont see any dramatic shifts in pC 2r 2

adjr or es

On the other hand if you decide to keep such a variable in the model the model is less parsimonious

and you have one more variable to interpret but otherwise there is no real penalty for including it

In real applications there are often several equations that for all practical purposes are equallyuseful for describing the relationships or making predictions There are so many aspects of what

makes a model useful that human judgment is necessary to make a final choice For examplein addition to favoring explanatory variables that can be measured reliably you may want to

favor those that are less expensive to measure The statistician George Boc who had an

illustrious academic career at the University of Wisconsin is often quoted sayingldquoAll models are wrong but some models are usefulrdquo

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 9: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 915

- 9 -

Variable Selection Procedures

Model building is the process of developing an estimated regression equation that describes the

relationship between a response variable and one or more explanatory variables

The major issues in model building are finding the proper functional form of the relationship and

selecting the explanatory variables to be included in the model

Many statistical packages provide some assistance by including automatic model-building optionsThese options estimate a series of regression models by successively adding or deleting variables

according to prescribed rules These rules can vary from package to package but usually the t test

for the slope or the partial F test is used and the corresponding P-value serves as a criterion to

determine whether variables are added or deleted The levels of significance 1α and 2α for

determining whether an explanatory variable should be entered into the model or removed from

the model are typically referred to as P-value to Enter and P-value to LeaveUsually by default P-value to Enter = 005 and P-value to Leave = 010

The four most common types of model-building procedures that statistical packages implement areforward selection backward elimination stepwise regression and best subsets regression

Today many businesses use these variable selection procedures as part of the research technique

called data mining which tries to identify significant statistical relationships in very large datasets that contain extremely large number of variables

The forward selection procedure begins with no explanatory variables in the model and successivelyadds variables one at a time until no remaining variables make a significant contribution

The forward selection procedure does not permit a variable to be removed from the model once it

has been entered The procedure stops if the P-value for each of the explanatory variables not in

the model is greater than the prescribed P-value to Enter

The backward elimination procedure begins with a model that includes all potential

explanatory variables It then deletes one explanatory variable at a time by comparing its P-valueto the prescribed P-value to Leave The backward elimination procedure does not permit avariable to be reentered once it has been removed The procedure stops when none of the

explanatory variables in the model have a P-value greater than P-value to Leave

The stepwise regression procedure is much like a forward procedure except that it also considers

possible deletions along the way Because of the nature of the stepwise regression procedure

an explanatory variable can enter the model at one step be removed at a subsequent stepand then enter the model at a later step The procedure stops when no explanatory variables can

be removed from or entered into the model

The best subsets regression procedure works by trying possible subsets from the list of possible

explanatory variables This procedure does not actually compute all possible regressions

There are ways to exclude models known to be worse than some already examined models

Typical computer output reports results for a collection of ldquobestrdquo models usually the two bestone-variable models the two best two-variable models the two best three-variable models and so on

The user can then select the best model based on such measures as pC 2r 2

adjr es

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 10: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1015

- 10 -

In most cases the final results of these four procedures are very similar However there is no guarantee

that they will all produce exactly the same final equation Deciding which estimated regressionequation to use remains a topic for discussion Ultimately the analystrsquos judgment must be applied

Excel does not come with any variable selection techniques built in StatTools can be used for

forward selection backward elimination and stepwise regression but it cannot perform the bestsubsets regression SAS and Minitab can perform all four techniques

Example 2

Standby Hours

The operations manager at WTT-TV station is looking for ways to reduce labor expenses

Currently the graphic artists at the station receive hourly pay for a significant number of hours

during which they are idle These hours are called standby hours The operations manager wants to

determine which factors most heavily affect standby hours of graphic artists Over a period of 26weeks he collected data concerning standby hours ( y) and four factors that he suspects are related

to the excessive number of standby hours the station is currently experiencing

1 x ndash the total number of staff present

2 x ndash remote hours

3 x ndash Dubner hours

4 x ndash total labor hours

The data are organized and stored in Standbyxlsx

Week Standby Total Staff Remote Dubner Total Labor

1 245 338 414 323 2001

2 177 333 598 340 2030M M M M M M

25 261 315 164 223 1839

26 232 331 270 272 1935

How to build a multiple regression model with the most appropriate mix of explanatory variables

Solution

(a) Compute the variance inflation factors to measure the amount of collinearity among the

explanatory variables (Reminder2

11

j

jr

VIF minus

= )

This is always a good starting point for any multiple regression analysis It involves running

four regressions ndash one regression for each explanatory variable against the other x variables

The following table summarizes the results

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 11: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1115

- 11 -

Total Staff Remote Dubner Total Labor

and all other X and all other X and all other X and all other X

Multiple R 06437 04349 05610 07070

R Square 04143 01891 03147 04998

Adjusted R Square 03345 00786 02213 04316Standard Error 164715 1249392 575525 1144118

Observations 26 26 26 26

VIF 17074 12333 14592 19993

All the VIF values are relatively small ranging from a high of 19993 for the total labor hours to

a low of 12333 for remote hours Thus on the basis of the criteria that all VIF values should beless than 5 there is little evidence of collinearity among the set of explanatory variables

(b) Run forward selection backward elimination and stepwise regression and compare the results

StatTools reression output from running the three procedures is shown on the next two pagesA significance level of 005 is used to enter a variable into the model or to delete a variablefrom the model (that is P-value to Enter = P-value to Leave = 005)

The correlations between the response variable and the explanatory variables are

Total Staff Remote Dubner Total Labor

Standby 06050 ndash 00953 ndash 02443 04136

As the computer output shows the forward selection and stepwise regression methods

produce the same results for these data The first variable entered into the model is total staffthe variable that correlates most highly with the response variable standby hours

(r = 06050) The P-value for the t -test of total staff is 00011 (Note StatTools does not show it

in the final output) Because it is less than 005 total staff is included in the regression model

The next step involves selecting a second independent variable for the model The second variable

chosen is one that makes the largest contribution to the model given that the first variable hasbeen selected For this model the second variable is remote hours Because the P-value of 00269

for remote hours is less than 005 remote hours is included in the regression model

After the remote hours variable is entered into the model the stepwise regression proceduredetermines whether total staff is still an important contributing variable or whether it can be

eliminated from the model Because the P-value of 00001 for total staff is less than 005total staff remains in the regression model

The next step involves selecting a third independent variable for the model Because none of

the other variables meets the 005 criterion for entry into the model the stepwise procedureterminates with a model that includes total staff present and the number of remote hours

The backward elimination procedure produces a model that includes all explanatory variables

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 12: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1215

- 12 -

Forward Selection

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983154983161

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983089

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983090

Stepwise Regression

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983090 983090983095983094983094983090983086983093983092983090983097 983089983091983096983091983089983086983090983095983089983092 983089983089983086983088983092983093983088 983088983086983088983088983088983092

983125983150983141983160983152983148983137983145983150983141983140 983090983091 983090983096983096983088983090983086983088983095983090983093 983089983090983093983090983086983090983094983092983088

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983094983095983092983096 983089983089983094983086983092983096983088983090 983085983090983086983096983091983096983097 983088983086983088983088983097983091 983085983093983095983089983086983094983091983090983093 983085983096983097983086983095983089983095983089

983124983151983156983137983148 983123983156983137983142983142 983089983086983095983094983092983097 983088983086983091983095983097983088 983092983086983094983093983094983090 983088983086983088983088983088983089 983088983086983097983096983088983096 983090983086983093983092983097983088

983122983141983149983151983156983141 983085983088983086983089983091983097983088 983088983086983088983093983096983096 983085983090983086983091983094983091983093 983088983086983088983090983094983097 983085983088983086983090983094983088983094 983085983088983086983088983089983095983091

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983150983156983141983154 983151983154

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983109983160983145983156

983124983151983156983137983148 983123983156983137983142983142 983088983086983094983088983093983088 983088983086983091983094983094983088 983088983086983091983091983097983094 983091983096983086983094983090983088983094 983109983150983156983141983154

983122983141983149983151983156983141 983088983086983094983097983097983097 983088983086983092983096983097983097 983088983086983092983092983093983094 983091983093983086983091983096983095983091 983109983150983156983141983154

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 13: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1315

- 13 -

Backward Elimination983117983157983148983156983145983152983148983141

983122983085983123983153983157983137983154983141983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142

983123983157983149983149983137983154983161 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141

983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

983108983141983143983154983141983141983155 983151983142 983123983157983149 983151983142 983117983141983137983150 983151983142983110983085983122983137983156983145983151 983152983085983126983137983148983157983141

983105983118983119983126983105 983124983137983138983148983141 983110983154983141983141983140983151983149 983123983153983157983137983154983141983155 983123983153983157983137983154983141983155

983109983160983152983148983137983145983150983141983140 983092 983091983093983089983096983089983086983095983097983091983095 983096983095983097983093983086983092983092983096983092 983096983086983094983095983096983094 983088983086983088983088983088983091

983125983150983141983160983152983148983137983145983150983141983140 983090983089 983090983089983090983096983090983086983096983090983089983095 983089983088983089983091983086983092983094983095983095

983107983151983141983142983142983145983139983145983141983150983156983123983156983137983150983140983137983154983140

983156983085983126983137983148983157983141 983152983085983126983137983148983157983141983107983151983150983142983145983140983141983150983139983141 983113983150983156983141983154983158983137983148 983097983093

983122983141983143983154983141983155983155983145983151983150 983124983137983138983148983141 983109983154983154983151983154 983116983151983159983141983154 983125983152983152983141983154

983107983151983150983155983156983137983150983156 983085983091983091983088983086983096983091983089983096 983089983089983088983086983096983097983093983092 983085983090983086983097983096983091983091 983088983086983088983088983095983089 983085983093983094983089983086983092983093983089983092 983085983089983088983088983086983090983089983090983091

983124983151983156983137983148 983123983156983137983142983142 983089983086983090983092983093983094 983088983086983092983089983090983089 983091983086983088983090983090983097 983088983086983088983088983094983093 983088983086983091983096983096983095 983090983086983089983088983090983094

983122983141983149983151983156983141 983085983088983086983089983089983096983092 983088983086983088983093983092983091 983085983090983086983089983095983097983096 983088983086983088983092983088983096 983085983088983086983090983091983089983092 983085983088983086983088983088983093983092

983108983157983138983150983141983154 983085983088983086983090983097983095983089 983088983086983089983089983095983097 983085983090983086983093983089983096983097 983088983086983088983089983097983097 983085983088983086983093983092983090983091 983085983088983086983088983093983089983096

983124983151983156983137983148 983116983137983138983151983154 983088983086983089983091983088983093 983088983086983088983093983097983091 983090983086983090983088983088983092 983088983086983088983091983097983089 983088983086983088983088983095983090 983088983086983090983093983091983097

983117983157983148983156983145983152983148983141983122983085983123983153983157983137983154983141

983105983140983146983157983155983156983141983140 983123983156983109983154983154 983151983142 983109983160983145983156

983123983156983141983152 983113983150983142983151983154983149983137983156983145983151983150 983122 983122983085983123983153983157983137983154983141 983109983155983156983145983149983137983156983141 983118983157983149983138983141983154

983105983148983148 983126983137983154983145983137983138983148983141983155 983088983086983095983096983097983092 983088983086983094983090983091983089 983088983086983093983093983089983091 983091983089983086983096983091983093983088

(c) Which of the two models suggested by the above procedures would you choose based on the

pC selection criterion

The model suggested by the forward selection and stepwise regression procedures includes

two explanatory variables total staff ( 1 x ) and remote hours ( 2 x ) The backward elimination

procedure suggests the ldquofullrdquo model ndash with all explanatory variables included

For the model suggested by the forward selection and stepwise regression procedures

n = 26 k = 2)(k SSE = 288020725

MSE (full) = 10134677

4193820419328)2426(46771013

072528802)22(

)full(

)(=minus=minusminusminus=minusminusminus= k n

MSE

k SSE C p

For the model suggested by the backward elimination proceduren = 26 k = 4

)(k SSE = SSE (full) = 212828217

MSE (full) = 10134677

51621)2826(46771013

821721282)22()full(

)(=minus=minusminusminus=minusminusminus= k n MSE

k SSE C p

The model chosen by the forward selection and stepwise regression procedures has a pC value of

84193 which is substantially above the suggested criterion of k + 1 = 2 + 1 = 3 for that model

For the model chosen by the backward elimination procedure k + 1 = 4 + 1 = 5 and pC = 5

Thus according to the pC criterion the model including all four variables is the better model

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 14: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1415

- 14 -

(d) Below are the results from the best subsets regression procedure of all possible regression

models for the standby hours data Which is the best model

Model k + 1 pC 2r 2

adjr es

X1 2 1332 03660 03396 3862

X2 2 3321 00091 ndash 00322 4828X3 2 3039 00597 00205 4703

X4 2 2418 01710 01365 4416

X1X2 3 842 04899 04456 3539

X1X3 3 1065 04499 04021 3675

X1X4 3 1480 03754 03211 3916

X2X3 3 3231 00612 ndash 00205 4801

X2X4 3 2325 02238 01563 4365

X3X4 3 1182 04288 03791 3745

X1X2X3 4 784 05362 04729 3450

X1X2X4 4 934 05092 04423 3549

X1X3X4 4 775 05378 04748 3444X2X3X4 4 1214 04591 03853 3726

X1X2X3X4 5 500 06231 05513 3184

Because model building requires you to compare models with different numbers of explanatory

variables the adjusted coefficient of determination2

adjr is more appropriate than 2r

(although sometimes it is a matter of preference) The adjusted 2r reaches a maximum valueof 05513 when all four explanatory variables are included in the model Therefore using this

criterion the best model is the model with all four explanatory variables

The same conclusion is reached when using the pC selection criterion because only the modelwith all four explanatory variables considered has a pC value close to or below k + 1

Note Although it was not the case here the pC statistic often provides several alternative

models for you to evaluate in greater depth Moreover the best model or models using the pC

criterion might differ from the model selected using the adjusted 2r andor the models selected

using the three procedures discussed in (a) through (c)

(e) Perform a residual analysis to evaluate the regression assumptions for the best model

The best model turned out to be the model containing all four explanatory variablesOn the next page are the plots for the residual analysis of this model

None of the residual plots versus the total staff the remote hours the Dubner hours and thetotal labor hours reveal apparent patterns In addition a plot of the residuals versus the

predicted values of y does not show any patterns or evidence of unequal variance

The histogram of the residuals indicates only moderate departure from normality (skewness = 054)The plot of the residuals versus time shows no indication of autocorrelation in the residuals

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155

Page 15: 14_Building_Regression_Models_Part1.pdf

8172019 14_Building_Regression_Models_Part1pdf

httpslidepdfcomreaderfull14buildingregressionmodelspart1pdf 1515

- 15 -

983085983094983088

983085983089983088

983092983088

983090983096983088 983091983088983088 983091983090983088 983091983092983088 983091983094983088 983091983096983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983124983151983156983137983148 983123983156983137983142983142

983124983151983156983137983148 983123983156983137983142983142 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983093983088 983091983093983088 983093983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983122983141983149983151983156983141

983122983141983149983151983156983141 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983090983088983088 983091983088983088 983092983088983088 983093983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983108983157983138983150983141983154

983108983157983138983150983141983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983094983088983088 983089983096983088983088 983090983088983088983088 983090983090983088983088 983122 983141 983155 983145 983140 983157

983137 983148 983155

983124983151983156983137983148 983116983137983138983151983154

983124983151983156983137983148 983116983137983138983151983154 983122983141983155983145983140983157983137983148 983120983148983151983156

983085983094983088

983085983089983088

983092983088

983089983088983088 983089983093983088 983090983088983088 983090983093983088 983122 983141 983155 983145 983140 983157 983137 983148 983155

983120983154983141983140983145983139983156983141983140 983123983156983137983150983140983138983161

983122983141983155983145983140983157983137983148983155 983158983155 983110983145983156

983088

983090

983092

983094

983096

983089983088

983085 983092 983090 983086

983096 983091

983085 983090 983092 983086

983091 983092

983085 983093 983086

983096 983093

983089 983090 983086

983094 983091

983091 983089 983086

983089 983090

983092 983097 983086

983094 983089

983110 983154 983141 983153 983157 983141 983150 983139 983161

983112983145983155983156983151983143983154983137983149 983151983142 983122983141983155983145983140983157983137983148983155

983085983094983088

983085983089983088

983092983088

983089 983090 983091 983092 983093 983094 983095 983096 983097 983089983088 983089983089 983089983090 983089983091 983089983092 983089983093 983089983094 983089983095 983089983096 983089983097 983090983088 983090983089 983090983090 983090983091 983090983092 983090983093 983090983094 983122 983141 983155 983145 983140 983157 983137 983148 983155

983127983141983141983147

983124983145983149983141 983123983141983154983145983141983155 983120983148983151983156 983151983142 983122983141983155983145983140983157983137983148983155


Recommended