ch11

438 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION

Chapter 9 introduced regression modeling of the relationship between two quantitative

variables. Multivariate relationships require more complex models, containing several

explanatory variables. Some of these may be predictors of theoretical interest, and

some may be control variables.

To predict Y = college GPA, for example, it is sensible to use several predictors

in the same model. Possibilities include X1 = high school GPA, X2 = math college

entrance exam score, X3 = verbal college entrance exam score, and X4 = rating by

high school guidance counselor. This chapter presents models for the relationship

between a response variable Y and a collection of explanatory variables.

A multivariable model provides better predictions of Y than does a model with

a single explanatory variable. Such a model also can analyze relationships between

variables while controlling for other variables. This is important because Chapter 10

showed that after controlling for a variable, an association can appear quite different

from when the variable is ignored. Thus, this model provides information not available

with simple models that analyze only two variables at a time.

Sections 11.1 and 11.2 extend the regression model to a multiple regression

model that can have multiple explanatory variables. Section 11.3 defines correlation

and r-squared measures that describe association between Y and a set of explanatory

variables. Section 11.4 presents inference procedures for multiple regression. Section

11.5 shows how to allow statistical interaction in the model, and Section 11.6 presents

a test of whether a complex model provides a better fit than a simpler model. The

final two sections introduce measures that summarize the association between the

response variable and an explanatory variable while controlling other variables.

11.1 The Multiple Regression Model

Chapter 9 modeled the relationship between the explanatory variable X and the mean

of the response variable Y by the straight-line (linear) equation E(Y ) = α + βx. We

refer to this model containing a single predictor as a bivariate model, because it

contains only two variables.

The Multiple Regression Function

Suppose there are two explanatory variables, denoted by X1 and X2. As in earlier

chapters, we use lower-case letters to denote observations or particular values of the

variables. The bivariate regression function generalizes to the multiple regression

function

E(Y ) = α + β1x1 + β2x2.

In this equation, α, β1, and β2 are parameters discussed below. For particular values

of x1 and x2, the equation specifies the population mean of Y for all subjects with

those values of x1 and x2. When there are additional explanatory variables, each has

a βx term, for example E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x4 with four predictors.

The multiple regression function is more difficult to portray graphically than the

bivariate regression function. With two explanatory variables, the x1 and x2 axes are

11.1. THE MULTIPLE REGRESSION MODEL 439

Figure 11.1: Graphical Depiction of a Multiple Regression Function with TwoExplanatory Variables

((Fig. 11.1 in 3e))

perpendicular but lie in a horizontal plane and the Y axis is vertical and perpendicular

to both the x1 and x2 axes. The equation E(Y ) = α + β1x1 + β2x2 traces a plane (a

flat surface) cutting through three-dimensional space, as Figure 11.1 portrays.

The simplest interpretation treats all but one explanatory variable as control vari-

ables and fixes them at particular levels. This leaves an equation relating the mean

of Y to the remaining explanatory variable.

Example 11.1 Do Higher Levels of Education Cause Higher Crime Rates?

Exercise 39 in Chapter 9 contains recent data on several variables for the 67

counties in the state of Florida. For each county, let Y = crime rate (annual number

of crimes per 1000 population), X1 = education (percentage of adult residents having

at least a high school education), and X2 = urbanization (percentage living in an

urban environment).

The bivariate relationship between crime rate and education is approximated by

E(Y ) = −51.3 + 1.5x1. Surprisingly, the association is moderately positive, the cor-

relation being r = 0.47. As the percentage of county residents having at least a high

school education increases, so does the crime rate.

A closer look at the data reveals strong positive associations between crime rate

and urbanization (r = 0.68) and between education and urbanization (r = 0.79).

This suggests that the association between crime rate and education may be spurious.

Perhaps urbanization is a common causal factor. See Figure 11.2. As urbanization

increases, both crime rate and education increase, resulting in a positive correlation

between crime rate and education.

The relation between crime rate and both predictors considered together is ap-

proximated by the multiple regression function

E(Y ) = 58.9 − 0.6x1 + 0.7x2.

For instance, the expected crime rate for a county at the mean levels of education

Figure 11.2: The Positive Association Between Crime Rate and Education MayBe Spurious, Explained by the Effects of Urbanization on Each

((Fig. 11.2 in 3e))


(x1 = 70) and urbanization (x2 = 50) is E(Y ) = 58.9− 0.6(70) + 0.7(50) = 52 annual

crimes per 1000 population.

Let’s study the effect of x1, controlling for x2. We first set x2 at its mean level of

50. Then, the relationship between crime rate and education is

E(Y ) = 58.9 − 0.6x1 + 0.7(50) = 58.9 − 0.6x1 + 35.0 = 93.9 − 0.6x1.

Figure 11.3 plots this line. Controlling for x2 by fixing it at 50, the relationship be-

tween crime rate and education is negative, rather than positive. The slope decreased

and changed sign from 1.5 in the bivariate relationship to −0.6. At this fixed level of

urbanization, a negative relationship exists between education and crime rate. We use

the term partial regression equation to distinguish the equation E(Y ) = 93.9 − 0.6x1

from the regression equation E(Y ) = −51.3 + 1.5x1 for the bivariate relationship

between Y and x1. The partial regression equation refers to part of the potential

observations, in this case counties having x2 = 50.

Figure 11.3: Partial Relationships Between E(Y ) and x1 for the Multiple Re-gression Equation E(Y ) = 58.9− 0.6x1 + 0.7x2. These partial regression equa-tions fix x2 to equal 50 or 40.

((Fig. 11.3 in 3e))

Next we fix x2 at a different level, say x2 = 40 instead of 50. Then, you can check

that E(Y ) = 86.9 − 0.6x1. Thus, decreasing x2 by 10 units shifts the partial line

relating Y to x1 downward by 10β2 = 7.0 units (see Figure 11.3). The slope of −0.6

for the partial relationship remains the same, so the line is parallel to the original one.

Setting x2 at a variety of values yields a collection of parallel lines, each having slope

β1 = −0.6.

Similarly, setting x1 at a variety of values yields a collection of parallel lines,

each having slope 0.7, relating the mean of Y to x2. In other words, controlling for

education, the slope of the partial relationship between crime rate and urbanization

is β2 = 0.7.

In summary, education has an overall positive effect on crime rate, but it has

a negative effect when controlling for urbanization. The partial association has the

opposite direction from the bivariate association. This is called Simpson’s para-

dox. Figure 11.4 illustrates how this happens. It shows the scatterplot relating crime

rate to education, portraying the overall positive association between these variables.

The diagram circles the 19 counties that are highest in urbanization. That subset of

points for which urbanization is nearly constant has a negative trend between crime

rate and education. The high positive association between education and urbanization

is reflected by the fact that most of the highlighted observations that are highest on

urbanization also have high values on education.

2

11.1. THE MULTIPLE REGRESSION MODEL 441

Figure 11.4: Scatterplot Relating Crime Rate and Education. The circled pointsare the counties highest on Urbanization. A regression line fitting the circledpoints has negative slope, even though the regression line passing through all

the points has positive slope (Simpson’s paradox).

((Fig. 11.4 in 3e))

Interpretation of Regression Coefficients

We have seen that for a fixed value of x2, the equation E(Y ) = α + β1x1 + β2x2

simplifies to a straight-line equation in x1 with slope β1. The slope is the same for

each fixed value of x2. When we fix the value of x2, we are holding it constant: We are

controlling for x2. That’s the basis of the major difference between the interpretation

of slopes in multiple regression and in bivariate regression:

• In multiple regression, a slope describes the effect of an explanatory variable

while controlling effects of the other explanatory variables in the model.

• Bivariate regression has only a single explanatory variable. So, a slope in bi-

variate regression describes the effect of that variable while ignoring all other

possible explanatory variables.

The parameter β1 measures the partial effect of x1 on Y , that is, the effect of a

one-unit increase in x1, holding x2 constant. The partial effect of x2 on Y , holding

x1 constant, has slope β2. Similarly, for the multiple regression model with several

predictors, the beta coefficient of a predictor describes the change in the mean of Y for

a one-unit increase in that predictor, controlling for the other variables in the model.

The parameter α represents the mean of Y when each explanatory variable equals 0.

The parameters β1, β2, . . . are called partial regression coefficients. The

adjective partial distinguishes these parameters from the regression coefficient β in

the bivariate model E(Y ) = α + βx, which ignores rather than controls effects of

other explanatory variables.

This multiple regression model assumes that the slope of the partial relationship

between Y and each predictor is identical for all combinations of values of the other

explanatory variables. This means that the model is appropriate when there is no

statistical interaction, in the sense of Section 10.3. If the true partial slope between

Y and x1 is very different at x2 = 50 than at x2 = 40, for example, we need a more

complex model. Section 11.5 will show this model.

A partial slope in a multiple regression model usually differs from the slope in the

bivariate model for that predictor, but it need not. With two predictors, the partial

slopes and bivariate slopes are equal if the correlation between X1 and X2 equals 0.

When X1 and X2 are independent causes of Y , the effect of X1 on Y does not change

when we control for X2.


Prediction Equation and Residuals

Corresponding to the multiple regression equation, software finds a prediction equation

by estimating the model parameters using sample data. For simplicity of notation, so

far we’ve used just two predictors. In general, let k denote the number of predictors.

Notation for Prediction Equation

The prediction equation that estimates the multiple regression equation E(Y ) =

α + β1x1 + β2x2 + · · · + βkxk is denoted by y = a + b1x1 + b2x2 + · · · + bkxk.

For multiple regression, it is almost imperative to use computer software to find the

prediction equation. The calculation formulas are complex and are not shown in this

text.

We get the predicted value of Y for a subject by substituting the x-values for that

subject into the prediction equation. Like the bivariate model, the multiple regression

model has residuals that measure prediction errors. For a subject with predicted

response y and observed response y, the residual is y − y. The next section shows an

example.

The sum of squared errors (SSE),

SSE =∑

(y − y)2

summarizes the closeness of fit of the prediction equation to the response data. Most

software calls SSE the residual sum of squares. The formula for SSE is the same

as in Chapter 9. The only difference is that the predicted value y results from using

several explanatory variables instead of just a single predictor.

The parameter estimates in the prediction equation satisfy the least squares

criterion: The prediction equation has the smallest SSE value of all possible equations

of form Y = a + b1x1 + · · · + bkxk.

11.2 Example with Multiple Regression Com-

puter Output

We illustrate the methods of this chapter with the data introduced in the following

example:

Example 11.2 Multiple Regression for Mental Health Study

A study in Alachua County, Florida, investigated the relationship between certain

mental health indices and several explanatory variables. Primary interest focused on

an index of mental impairment, which incorporates various dimensions of psychiatric

symptoms, including aspects of anxiety and depression. This measure, which is the

response variable Y , ranged from 17 to 41 in the sample. Higher scores indicate

greater psychiatric impairment.

The two explanatory variables used here are X1 = life events score and X2 =

socioeconomic status (SES). The life events score is a composite measure of both

the number and severity of major life events the subject experienced within the past

three years. These events range from severe personal disruptions such as a death in the

family, a jail sentence, or an extramarital affair, to less severe events such as getting

a new job, the birth of a child, moving within the same city, or having a child marry.

This measure1 ranged from 3 to 97 in the sample. A high score represents a greater

number and/or greater severity of these life events. The SES score is a composite

index based on occupation, income, and education. Measured on a standard scale, it

ranged from 0 to 100. The higher the score, the higher the status.

Table 11.1 shows data on the three variables for a random sample of 40 adults in

the county. [These data are based on a larger survey. The authors thank Dr. Charles

Holzer for permission to use the study as the basis of this example.] Table 11.2

summarizes the sample means and standard deviations of the three variables.

Table 11.1: Scores on Y = Mental Impairment, X1 =Life Events, and X2 = Socioeconomic Status

Y X1 X2 Y X1 X2 Y X1 X2

17 46 84 26 50 40 30 44 5319 39 97 26 48 52 31 35 3820 27 24 26 45 61 31 95 2920 3 85 27 21 45 31 63 5320 10 15 27 55 88 31 42 721 44 55 27 45 56 32 38 3221 37 78 27 60 70 33 45 5522 35 91 28 97 89 34 70 5822 78 60 28 37 50 34 57 1623 32 74 28 30 90 34 40 2924 33 67 28 13 56 41 49 324 18 39 28 40 56 41 89 7525 81 87 29 5 4026 22 95 30 59 72

Table 11.2: Estimated Means and Stan-dard Deviations of Mental Impairment, LifeEvents, and Socioeconomic Status (SES)

Variable Mean Standard Deviation

Mental Impairment 27.30 5.46Life Events 44.42 22.62SES 56.60 25.28

1Developed by E. Paykel et al., Arch. Gen. Psychiatry, vol. 75, 1971, pp. 340–347.

443


2

Scatterplot Matrix for Bivariate Relationships

Plots of the data provide an informal check of whether the relationships are linear.

Most software can construct scatterplots on a single diagram for each pair of the

variables. Figure 11.5 shows the plots for the variables from Table 11.1. This type of

plot is called a scatterplot matrix. Like a correlation matrix, it shows each pair of

variables twice. In one plot, a variable is on the y-axis and in one it is on the x-axis.

Mental impairment (the response variable) is on the y-axis for the plots in the first

row of Figure 11.5, so these are the plots of interest to us. The plots show no evidence

of nonlinearity, and models with linear effects seem appropriate. The plots suggest

that life events has a mild positive effect and SES has a mild negative effect on mental

impairment.

Partial Plots for Partial Relationships

The multiple regression model states that each predictor has a linear effect with com-

mon slope, controlling for the other predictors. To check this, we could use software to

plot Y versus each predictor, for subsets of points that are nearly constant on the other

predictors. With a single control variable, for example, we could sort the observations

into four groups using the quartiles as boundaries, and then either construct four sep-

arate scatterplots or mark the observations on a single scatterplot according to their

group. With several control variables, however, keeping them all nearly constant can

reduce the sample to relatively few observations. A more informative single picture is

provided by the partial regression plot. It displays the relationship between the

response variable and an explanatory variable after removing the effects of the other

predictors in the multiple regression model. It does this by plotting the residuals from

models using these two variables as responses and the other explanatory variables as

predictors.

For example, here’s how to find the partial regression plot for the effect of x1

when the multiple regression model also has explanatory variables x2 and x3. Find

the residuals from the model using x2 and x3 to predict Y . Also find the residuals

from the model using x2 and x3 to predict x1. Then plot the residuals from the

first analysis (on the y-axis) against the residuals from the second analysis. For these

residuals, the effects of x2 and x3 are removed. The least squares slope for the points

in this plot is necessarily the same as the estimated partial slope b1 for the multiple

regression model.

Figure 11.6 shows a partial regression plot (from SPSS) for Y = mental impairment

and x1 = life events, controlling for x2 = SES. It plots the residuals on the y-axis

from the model E(Y ) = α + βx2 against the residuals on the x-axis from the model

E(X1) = α + βx2. Both axes have negative and positive values, because they refer

to residuals. Recall that residuals (prediction errors) can be positive or negative,

and have a mean of 0. Figure 11.6 suggests that the partial effect of life events is

approximately linear and is positive.

11.2. EXAMPLE WITH MULTIPLE REGRESSION COMPUTER OUTPUT445

Figure 11.7 shows the partial regression plot for SES. It shows that its partial effect

is also approximately linear but is negative. It is simple to obtain partial regression

plots with standard software such as SPSS. (See the appendix.)

Sample Computer Printouts

Tables 11.3 and 11.4 are SPSS printouts of the coefficients table for the bivariate re-

lationships between mental impairment and the separate explanatory variables. The

estimated regression coefficients fall in the column labelled ‘B’. The prediction equa-

tions are

y = 23.31 + 0.090x1 and y = 32.17 − 0.086x2.

In the sample, mental impairment is positively related to life events, since the coeffi-

cient of x1 (0.090) is positive. The greater the number and severity of life events in the

previous three years, the higher the mental impairment (i.e., the poorer the mental

health) tends to be. Mental impairment is negatively related to socioeconomic status.

The greater the SES level, the lower the mental impairment tends to be. The corre-

lations between mental impairment and the explanatory variables are modest, 0.372

for life events and -0.399 for SES (listed by SPSS as ‘Standardized coefficients’; the

‘beta’ label is misleading and refers to the alternate term beta weights for standardized

regression coefficients).

Table 11.3: Bivariate Regression Analysis for Y = Mental Impairment (IMPAIR)and x1 = Life Events (LIFE)

Coefficients(a)

Model Unstandardized Standardized

Coefficients Coefficients

B Std. Error Beta t Sig.

1 (Constant) 23.309 1.807 12.901 .000

LIFE .090 .036 .372 2.472 .018

a Dependent Variable: IMPAIR

Table 11.5 shows part of a SPSS printout for the multiple regression model E(Y ) =

α + β1x1 + β2x2. The prediction equation is

Y = a + b1x1 + b2x2 = 28.230 + 0.103x1 − 0.097x2.

Controlling for SES, the sample relationship between mental impairment and life

events is positive, since the coefficient of life events (b1 = 0.103) is positive. The

estimated mean of mental impairment increases by about 0.1 for every 1-unit increase

in the life events score, controlling for SES. Since b2 = −0.097, a negative association

exists between mental impairment and SES, controlling for life events. For example,


Table 11.4: Bivariate Regression Analysis for Y = Mental Impairment and x2

= Socioeconomic Status (SES)

Coefficients(a)

Model Unstandardized Standardized



1 (Constant) 32.172 1.988 16.186 .000

SES -.086 .032 -.399 -2.679 .011

a Dependent Variable: IMPAIR

over the 100-unit range of potential SES values (from a minimum of 0 to a maximum

of 100), the estimated mean mental impairment changes by 100(−0.097) = −9.7.

Since mental impairment ranges only from 17 to 41 with a standard deviation of 5.5,

a decrease of 9.7 points in the mean is noteworthy.

Table 11.5: Fit of Multiple Regression Model for Y = Mental Impairment, x1 =Life Events (LIFE), and x2 = Socioeconomic Status (SES)

Unstandardized Standardized



(Constant) 28.230 2.174 12.984 .000

LIFE .103 .032 .428 3.177 .003

SES -.097 .029 -.451 -3.351 .002

Dependent Variable: IMPAIR

From Table 11.1, the first subject in the sample had y = 17, x1 = 46, and x2 = 84.

This subject’s predicted mental impairment is

y = 28.230 + 0.103(46) − 0.097(84) = 24.8.

The prediction error (residual) is y − y = 17 − 24.8 = −7.8.

Table 11.6 summarizes some results of the regression analyses. It shows standard

errors in parentheses below the parameter estimates. The partial slopes for the mul-

tiple regression model are similar to the slopes for the bivariate models. In each case,

the introduction of the second predictor does little to alter the effect of the other

11.3. MULTIPLE CORRELATION AND R2 447

one. This suggests that these predictors may have nearly independent sample effects

on Y . In fact, the sample correlation between X1 and X2 is only 0.123. The next

section shows how to measure the joint association of the explanatory variables with

the response variable, and shows how to interpret the R2 value listed for the multiple

regression model.

Table 11.6: Summary of Regression Models forMental Impairment

Predictors in Regression ModelEffect Multiple Life Events SES

Intercept 28.230 23.309 32.172Life events 0.103 0.090 —

(0.032) (0.036)SES -0.097 — -0.086

(0.029) (0.032)

R2 0.339 0.138 0.159(n) (40) (40) (40)

11.3 Multiple Correlation and R2

The correlation r and its square describe strength of linear association for bivariate

relationships. This section presents analogous measures for the multiple regression

model. They describe the strength of association between Y and the set of explanatory

variables acting together as predictors in the model.

The Multiple Correlation

The explanatory variables collectively are strongly associated with Y if the observed y-

values correlate highly with the y-values from the prediction equation. The correlation

between the observed and predicted values summarizes this association.

Multiple Correlation

The multiple correlation for a regression model is the correlation between the ob-

served y-values and the predicted y-values.

For each subject, the prediction equation provides a predicted value y. So, each

subject has a y-value and a y-value. For example, above we saw that the first subject

in the sample had y = 17 and y = 24.8. For the first three subjects in Table 11.1, the

observed and predicted y-values are:


y y

17 24.8

19 22.8

20 28.7The sample correlation computed between the y- and y-values is the multiple corre-

lation. It is denoted by R.

The predicted values cannot correlate negatively with the observed values. The

predictions must be at least as good as the sample mean y, which is the prediction

when all partial slopes = 0, and y has zero correlation with y. So, R always falls

between 0 and 1. In this respect, the correlation between y and y differs from the

correlation between y and a predictor x, which falls between −1 and +1. The larger

the multiple correlation R, the better the predictions of y by the set of explanatory

variables.

R2: The Coefficient of Multiple Determination

Another measure uses the proportional reduction in error concept, generalizing r2 for

bivariate models. This measure summarizes the relative improvement in predictions

using the prediction equation instead of y. It has the following elements:

Rule 1 (Predict Y without using x1, . . . , xk): The best predictor is then the sample

mean, y.

Rule 2 (Predict Y using x1, . . . , xk): The best predictor is the prediction equation

y = a + b1x1 + b2x2 + · · · + bkxk.

Prediction Errors: The prediction error for a subject is the difference between

the observed and predicted values of y. With rule 1, the error is y − y. With rule 2,

it is the residual y − y. In either case, we summarize the error by the sum of the

squared prediction errors. For rule 1, this is TSS =∑

(y − y)2, called the total sum

of squares. For rule 2, it is SSE =∑

(y − y)2, the sum of squared errors using the

prediction equation, called the residual sum of squares.

Definition of Measure: The proportional reduction in error from using the pre-

diction equation y = a + b1x1 + · · · + bkxk instead of y to predict y is called the

coefficient of multiple determination , or for simplicity, R-squared.

R-squared: The Coefficient of Multiple Determination

R2 =TSS − SSE

TSS=

∑

(y − y)2 −∑

(y − y)2

∑

(y − y)2

R2 measures the proportion of the total variation in y that is explained by the

predictive power of all the explanatory variables, through the multiple regression

model. The symbol reflects that it is the square of the multiple correlation. The

uppercase notation R2 distinguishes this measure from r2 for the bivariate model.

11.3. MULTIPLE CORRELATION AND R2 449

Their formulas are identical, and r2 is the special case of R2 applied to a regression

model with one explanatory variable. For the multiple regression model to be useful

for prediction, it should provide improved predictions relative not only to y but also

to the separate bivariate models for y and each explanatory variable.

Example 11.3 Multiple Correlation and R2 for Mental Impairment

For the data on Y = mental impairment, X1 = life events, and X2 = so-

cioeconomic status, introduced in Example 11.2, the prediction equation is y =

28.23 + 0.103x1 − 0.097x2. Table 11.5 showed some output for this model. Soft-

ware also reports ANOVA tables with sums of squares and R and R2 tables. Table

11.7 shows some SPSS output.

Table 11.7: ANOVA Table and Model Summary for Regression of Mental Impair-ment (IMPAIR) on Life Events (LIFE) and Socioeconomic Status (SES)

ANOVA

Sum of

Squares df Mean Square F Sig.

Regression 394.238 2 197.119 9.495 .000

Residual 768.162 37 20.761

Total 1162.400 39

Model Summary

R R Square Adjusted R Square Std. Error of the Estimate

.582 .339 .303 4.556

Predictors: (Constant), SES, LIFE

Dependent Variable: IMPAIR

From the ‘Sum of Squares’ column, the total sum of squares is TSS =∑

(y− y)2 =

1162.4, and the residual sum of squares from using the prediction equation to predict

y is SSE =∑

(y − y)2 = 768.2. Thus,

R2 =TSS − SSE

TSS=

1162.4 − 768.2

1162.4= 0.339.

Using life events and SES together to predict mental impairment provides a 33.9%

reduction in the prediction error relative to using only y. The multiple regression

model provides a substantially larger reduction in error than either bivariate model

(Table 11.6 reported r2 values of 0.138 and 0.159 for them). It is more useful than

those models for predictive purposes.

The multiple correlation between mental impairment and the two explanatory

variables is R = +√

0.339 = 0.582. This equals the correlation between the observed

y- and predicted y-values for the model.


SPSS reports R and R2 in a separate ‘Model Summary’ table, as Table 11.7 shows.

Most software also reports an adjusted version of R2 that is a less biased estimate

of the population value. Exercise 63 defines this measure, and Table 11.7 reports its

value of 0.303.

2

Properties of R and R2

The properties of R2 are similar to those of r2 for bivariate models.

• R2 falls between 0 and 1.

• The larger the value of R2, the better the set of explanatory variables (x1, . . . , xk)

collectively predict y.

• R2 = 1 only when all the residuals are 0, that is, when all y = y, so that

SSE = 0. In that case, the prediction equation passes through all the data

points.

• R2 = 0 when the predictions do not vary as any of the x-values vary. In that

case, b1 = b2 = · · · = bk = 0, and y is identical to y, since the explanatory

variables do not add any predictive power. When this happens, the correlation

between y and each explanatory variable equals 0.

• R2 cannot decrease when we add an explanatory variable to the model. It is

impossible to explain less variation in y by adding explanatory variables to a

regression model.

• R2 for the multiple regression model is at least as large as the r2-values for the

separate bivariate models. That is, R2 for the multiple regression model is at

least as large as r2Y X1

when Y as a linear function of x1, r2Y X2

when Y as a

linear function of x2, and so forth.

Properties of the multiple correlation R follow directly from the ones for R2, since

R is the positive square root of R2. For instance, the multiple correlation for the

model E(Y ) = α + β1x1 + β2x2 + β3x3 is at least as large as the multiple correlation

for the model E(Y ) = α + β1x1 + β2x2.

The numerator of R2, TSS − SSE, summarizes the variation in Y explained by

the multiple regression model. This difference, which equals∑

(y − y)2, is called the

regression sum of squares. The ANOVA table in Table 11.7 lists the regression

sum of squares as 394.2. (Some software, such as SAS, labels this the ‘Model’ sum

of squares.) The total sum of squares TSS of the y-values about y partitions into

the variation explained by the regression model (regression sum of squares) plus the

variation not explained by the model (the residual sum of squares, SSE).

11.4. INFERENCES FOR MULTIPLE REGRESSION COEFFICIENTS 451

Multicollinearity with Many Explanatory Variables

When there are many explanatory variables but the correlations among them are

strong, once you have included a few of them in the model, R2 usually doesn’t increase

much more when you add additional ones. For example, for the “house selling price”

data set at the text website (introduced in Example 9.10), r2 is 0.71 with the house’s

tax assessment as a predictor of selling price. Then, R2 increases to 0.77 when we

add house size as a second predictor. But then it increases only to 0.79 when we

add number of bathrooms, number of bedrooms, and whether the house is new as

additional predictors.

When R2 does not increase much, this does not mean that the additional variables

are uncorrelated with Y . It means merely that they don’t add much new power for

predicting Y , given the values of the predictors already in the model. These other

variables may have small associations with Y , given the variables already in the model.

This often happens in social science research when the explanatory variables are highly

correlated, no one having much unique explanatory power. Section 14.3 discusses this

condition, called multicollinearity .

Figure 11.8, which portrays the portion of the total variability in Y explained by

each of three predictors, shows a common occurrence. The size of the set for a predictor

in this figure represents the size of its r2-value in predicting Y . The amount a set for

a predictor overlaps with the set for another predictor represents its association with

that predictor. The part of the set for a predictor that does not overlap with other

sets represents the part of the variability in Y explained uniquely by that predictor.

In Figure 11.8, all three predictors have moderate associations with Y , and together

they explain considerable variation. Once x1 and x2 are in the model, however, x3

explains little additional variation in Y , because of its strong correlations with x1 and

x2. Because of this overlap, R2 increases only slightly when x3 is added to a model

already containing x1 and x2.

For predictive purposes, we gain little by adding explanatory variables to a model

that are strongly correlated with ones already in the model, since R2 will not increase

much. Ideally, we should use explanatory variables having weak correlations with

each other but strong correlations with Y . In practice, this is not always possible,

especially if we want to include certain variables in the model for theoretical reasons.

In practice, the sample size you need to do a multiple regression well gets larger

when you want to use more explanatory variables. Technical difficulties caused by

multicollinearity are less severe for larger sample sizes. Ideally, the sample size should

be at least about 10 times the number of explanatory variables (for example, at least

about 40 for 4 explanatory variables).

11.4 Inferences for Multiple Regression Coeffi-

cients

The multiple regression function

E(Y ) = α + β1x1 + · · · + βkxk


describes the relationship between the explanatory variables and the mean of the

response variable. For particular values of the explanatory variables, α+β1x1 + · · ·+βkxk represents the mean of Y for the population having those values.

To make inferences about the parameters, we formulate the entire multiple regres-

sion model. This consists of this equation together with a set of assumptions:

• The population distribution of Y is normal, for each combination of values of

x1, . . . , xk.

• The standard deviation, σ, of the conditional distribution of responses on Y is

the same at each combination of values of x1, . . . , xk.

• The sample is randomly selected.

Under these assumptions, the true sampling distributions exactly equal those

quoted in this section. In practice, the assumptions are never satisfied perfectly.

Two-sided inferences are robust to the normality and common σ assumptions. More

important are the assumptions of randomization and that the regression function de-

scribes well how the mean of Y depends on the explanatory variables. We’ll see ways

to check the latter assumption in Section 14.2.

Two types of significance tests are used in multiple regression. The first is a global

test of independence. It checks whether any of the explanatory variables are statisti-

cally related to Y . The second studies the partial regression coefficients individually,

to assess which explanatory variables have significant partial effects on Y .

Testing the Collective Influence of the Explanatory Vari-

ables

Do the explanatory variables collectively have a statistically significant effect on the

response variable? We check this by testing

H0 : β1 = β2 = · · · = βk = 0.

This states that the mean of Y does not depend on the values of x1, . . . , xk. Under

the inference assumptions, this states that Y is statistically independent of all k

explanatory variables.

The alternative hypothesis is

Ha : At least one βi 6= 0.

This states that at least one explanatory variable is related to Y , controlling for the

others. The test judges whether using x1, · · · , xk together to predict y, with the

prediction equation y = a + b1x1 + · · · + bkxk, is better than using y.

These hypotheses about {βi} are equivalent to

H0 : Population multiple correlation = 0 Ha : Population multiple correlation > 0.


The equivalence occurs because the multiple correlation equals 0 only in those situa-

tions in which all the partial regression coefficients equal 0. Also, H0 is equivalent to

H0: population R-squared = 0.

For these hypotheses about the k predictors, the test statistic equals

F =R2/k

(1 − R2)/[n − (k + 1)].

The sampling distribution of this statistic is called the F distribution . We next

study this distribution and its properties.

The F Distribution

The symbol for the F test statistic and its distribution honors the most eminent

statistician in history, R. A. Fisher, who discovered the F distribution in 1922. Like

the chi-squared distribution, the F distribution can take only nonnegative values and

it is somewhat skewed to the right. Figure 11.9 illustrates.

The shape of the F distribution is determined by two degrees of freedom terms,

denoted by df1 and df2:

df1 = k, the number of explanatory variables in the model.

df2 = n − (k + 1) = n − number of parameters in regression equation.

The first of these, df1 = k, is the divisor of the numerator term (R2) in the F test

statistic. The second, df2 = n−(k+1), is the divisor of the denominator term (1−R2).

The number of parameters in the multiple regression model is k + 1, representing the

k beta terms and the alpha term.

The mean of the F distribution is approximately equal to 1.2 The larger the R2

value, the larger the ratio R2/(1 − R2), and the larger the F test statistic becomes.

Thus, larger values of the F test statistic provide stronger evidence against H0. Under

the presumption that H0 is true, the P -value is the probability the F test statistic

is larger than the observed F value. This is the right-tail probability under the F

distribution beyond the observed F -value, as Figure 11.9 shows.

Table D at the end of the text lists the F scores having P -values of 0.05, 0.01,

and 0.001, for various combinations of df1 and df2. This table allows us to determine

whether P > 0.05, 0.01 < P < 0.05, 0.001 < P < 0.01, or P < 0.001. Software for

regression reports the actual P -value.

Example 11.4 F Test for Mental Health Impairment Data

In Example 11.2, we used multiple regression for n = 40 observations on Y =

mental impairment, with k = 2 explanatory variables, life events and SES. The null

hypothesis that mental impairment is statistically independent of life events and SES

is H0: β1 = β2 = 0.

2It equals df2/(df2 − 2), which is usually close to 1 unless n is quite small.


In Example 11.3 we found that this model has R2 = 0.339. The F test statistic

value is

F =R2/k

(1 − R2)/[n − (k + 1)]=

0.339/2

0.661/[40 − (2 + 1)]= 9.5.

The two degrees of freedom terms for the F distribution are df1 = k = 2 and df2 =

n − (k + 1) = 40 − 3 = 37, the two divisors in this statistic.

From Table D, when df1 = 2 and df2 = 37, the F -value with right-tail probability

of 0.001 falls between 8.77 and 8.25. Since the observed F test statistic of 9.5 falls

above these two, it is farther out in the tail and has smaller tail probability than 0.001.

Thus, the P -value is P < 0.001. Part of the SPSS printout in Table 11.7 showed the

ANOVA table

Sum of Squares df Mean Square F Sig.

Regression 394.238 2 197.119 9.495 .000

Residual 768.162 37 20.761

in which we see the F statistic. The P -value, which rounded to three decimal places

is P = 0.000, appears under the heading ‘Sig’ in the ANOVA table.

This extremely small P -value provides strong evidence against H0. It suggests

that at least one of the explanatory variables is related to mental impairment. Equiv-

alently, we can conclude that the population multiple correlation and R-squared are

positive. So, we obtain significantly better predictions of y using the multiple regres-

sion equation than by using y.

2

Normally, unless the sample size is small and the associations are weak, this F

test has a small P -value. If we choose variables wisely for a study, at least one of

them should have some explanatory power.

Inferences for Individual Regression Coefficients

Suppose the P -value is small for the F test that all the regression coefficients equal 0.

This does not imply that every explanatory variable has an effect on Y (controlling

for the other explanatory variables in the model), but merely that at least one of them

has an effect. More narrowly focused analyses judge which partial effects are nonzero

and estimate the sizes of those effects. These inferences make the same assumptions as

the F test, the most important being randomization and that the regression function

describes well how the mean of Y depends on the explanatory variables.

Consider an arbitrary explanatory variable xi, with coefficient βi in the multiple

regression model. The test for its partial effect on Y has H0: βi = 0. If βi = 0,

the mean of Y is identical for all values of xi, controlling for the other explanatory

variables in the model. The alternative can be two-sided, Ha: βi 6= 0, or one-sided,

Ha: βi > 0 or Ha: βi < 0, to predict the direction of the partial effect.

The test statistic for H0: βi = 0, using sample estimate bi of βi, is

t =bi

se,


where se is the standard error of bi. As usual, the t test statistic takes the best

estimate (bi) of the parameter (βi), subtracts the H0 value of the parameter (0), and

divides by the standard error. The formula for se is complex, but software provides

its value. If H0 is true and the model assumptions hold, the t statistic has the t

distribution with df = n − (k + 1). The df value is the same as df2 in the F test.

It is more informative to estimate the size of a partial effect than to test whether it

is zero. Recall that βi represents the change in the mean of Y for a one-unit increase

in xi, controlling for the other variables. A confidence interval for βi is

bi ± t(se).

The t score comes from the t table, with df = n − (k + 1)2. For example, a 95%

confidence interval for the partial effect of x1 is b1 ± t.025(se).

Example 11.5 Inferences for Separate Predictors of Mental Impairment

For the multiple regression model for Y = mental impairment, X1 = life events,

and X2 = SES,

E(Y ) = α + β1x1 + β2x2

let’s consider the effect of life events. The hypothesis that mental impairment is

statistically independent of life events, controlling for SES, is H0: β1 = 0. If H0 is

true, the multiple regression equation reduces to E(Y ) = α + β2x2. If H0 is false,

then β1 6= 0 and the full model provides a better fit than the bivariate model.

Table 11.5 contained the results,

B Std. Error t Sig.

(Constant) 28.230 2.174 12.984 .000

LIFE .103 .032 3.177 .003

SES -.097 .029 -3.351 .002

This tells us that the point estimate of β1 is b1 = 0.103 and has standard error

se = 0.032. The test statistic equals

t =b1se

=0.103

0.032= 3.2.

This appears under the heading ‘t’ in the table in the row for the variable LIFE. The

statistic has df = n−(k+1) = 40−3 = 37. The P -value appears under ‘Sig’ in the row

for LIFE. It is 0.003, the probability that the t statistic exceeds 3.2 in absolute value.

There is strong evidence that mental impairment is related to life events, controlling

for SES.

A 95% confidence interval for β1 uses t0.025 = 2.026, the t-value for df = 37 having

a probability of 0.05/2 = 0.025 in each tail. This interval equals

b1 ± t0.025(se) = 0.103 ± 2.026(0.032), which is (0.04, 0.17).

Controlling for SES, we are 95% confident that the change in mean mental impairment

per one-unit increase in life events falls between 0.04 and 0.17. The interval does not


contain 0. This is in agreement with rejecting H0: β1 = 0 in favor of Ha: β1 6= 0 at

the α = 0.05 level.

Since this interval contains only positive numbers, the relationship between men-

tal impairment and life events is positive, controlling for SES. It may be simpler to

interpret the interval (0.04, 0.17) by noting that an increase of 100 units in life events

corresponds to anywhere from a 100(0.04) = 4 to a 100(0.17) = 17 unit increase in

mean mental impairment. The interval is relatively wide, because of the small sample

size.

2

How is the t test for a partial regression coefficient different from the t test of

H0: β = 0 for the bivariate model, E(Y ) = α + βx, studied in Section 9.5? That t

test evaluates whether Y and X are associated, ignoring other variables, because it

applies to the bivariate model. By contrast, the test just presented evaluates whether

variables are associated, controlling for other variables.

A note of caution: Suppose there is multicollinearity, that is, a lot of overlap

among the explanatory variables in the sense that any one is well predicted by the

others. Then, possibly none of the individual partial effects has a small P -value, even

if R2 is large and a large F statistic occurs in the global test for the βs. Any particular

variable may explain uniquely little of the variation in Y , even though together the

variables explain a lot of the variation.

Variability and Mean Squares in the ANOVA Table∗

The precision of the least squares estimates relates to the size of the conditional

standard deviation σ that measures variability of y at fixed values of the predictors.

The smaller the variability of y-values about the regression equation, the smaller the

standard errors become. The estimate of σ is

s =

√

∑

(y − y)2

n − (k + 1)=

√

SSE

df.

The degrees of freedom value is also df for t inferences for regression coefficients, and

it is df2 for the F test about the collective effect of the predictors. (When a model has

only k = 1 predictor, df simplifies to n − 2, the term in the s formula of Section 9.3.)

Part of the SPSS printout in Table 11.7 showed the ANOVA table

Sum of Squares df Mean Square F Sig.

Regression 394.238 2 197.119 9.495 .000

Residual 768.162 37 20.761

containing the sums of squares for the multiple regression model with the mental

impairment data. We see that SSE = 768.2. Since n = 40 for k = 2 predictors, we

have df = n − (k + 1) = 40 − 3 = 37 and

s =

√

SSE

df=

√

768.2

37=

√20.76 = 4.56.


If the conditional distributions are approximately bell-shaped, nearly all mental im-

pairment scores fall within about 14 units (3 standard deviations) of the mean specified

by the regression function.

SPSS reports the conditional standard deviation under the heading ‘Std. Error of

the Estimate’ in the Model Summary table that also has the R and R2values (See

Table 11.7). This is a poor choice of label by SPSS, because s refers to the variability

in Y -values, not the variability of a sampling distribution of an estimator.

The square of s, which estimates the conditional variance, is called the mean

square error , often abbreviated by MSE. Software shows it in the ANOVA table in

the ‘Mean Square’ column, in the row labeled ‘Residual’ (or ‘Error’ in some software).

For example, MSE = 20.76 in the above table. Some software (such as SAS) better

labels the conditional standard deviation estimate s as ‘Root MSE,’ because it is the

square root of the mean square error.

The F Statistic Is a Ratio of Mean Squares∗

An alternative formula for the F test statistic for testing H0 : β1 = · · · = βk = 0 uses

the two mean squares in the ANOVA table. Specifically,

F =Regression mean square

Residual mean square (MSE)=

197.1

20.8= 9.5.

This gives the same value as the F test statistic formula based on R2.

The regression mean square equals the regression sum of squares divided by its

degrees of freedom. The df equals k, the number of explanatory variables in the

model, which is df1 for the F test. On the printout shown above, the regression mean

square equalsRegression SS

df1=

394.2

2= 197.1.

Relationship Between F and t Statistics∗

We’ve seen that the F distribution is used to test that all partial regression coefficients

equal 0. Some regression software also lists F test statistics instead of t test statistics

for the tests about the individual regression coefficients. The two statistics are related

and have the same P -values. The square of the t statistic for testing that a partial

regression coefficient equals 0 is an F test statistic having the F distribution with

df1 = 1 and df2 = n − (k + 1).

To illustrate, in Example 11.5, for H0: β1 = 0 and Ha: β1 6= 0, the test statistic

was t = 3.18 with df = 37. Alternatively, we could use F = t2 = 3.182 = 10.1, which

has the F distribution with df1 = 1 and df2 = 37. The P -value for this F value is

0.002, the same as Table 11.5 reports for the two-sided t test.

In general, if a statistic has the t distribution with d degrees of freedom, then

the square of that statistic has the F distribution with df1 = 1 and df2 = d. A

disadvantage of the F approach is that it lacks information about the direction of the

association. It cannot be used for one-sided alternative hypotheses.


11.5 Interaction between Predictors in their Ef-

fects

The multiple regression equation

E(Y ) = α + β1x1 + β2x2 + · · · + βkxk

assumes that the partial relationship between Y and each xi is linear and that the

slope βi of that relationship is identical for all values of the other explanatory variables.

This implies a parallelism of lines relating the two variables, at various values of the

other variables, as Figure 11.3 illustrated.

This model is sometimes too simple to be adequate. Often, there is interaction,

with the relationship between two variables changing according to the value of a third

variable. Section 10.3 introduced this concept.

Interaction

For quantitative variables, interaction exists between two explanatory variables in their

effects on Y when the effect of one variable changes as the level of the other variable

changes.

For example, suppose the relationship between x1 and the mean of Y is E(Y ) =

2 + 5x1 when x2 = 0, it is E(Y ) = 4 + 15x1 when x2 = 50, and it is E(Y ) = 6 + 25x1

when x2 = 100. The slope for the partial effect of x1 changes markedly as the fixed

value for x2 changes. There is then interaction between x1 and x2 in their effects on

Y .

Cross-product Terms

A common approach for allowing interaction introduces cross-product terms of

the explanatory variables into the multiple regression model. With two explanatory

variables, the model is

E(Y ) = α + β1x1 + β2x2 + β3x1x2.

This is a special case of the multiple regression model with three explanatory variables,

in which x3 is an artificial variable created as the cross-product x3 = x1x2 of the two

primary explanatory variables.

Let’s see why this model permits interaction. Consider how Y is related to x1,

controlling for x2. We rewrite the equation in terms of x1 as

E(Y ) = (α + β2x2) + (β1 + β3x2)x1 = α′ + β′x1

where

α′ = α + β2x2 and β′ = β1 + β3x2.

So, for fixed x2, the mean of Y changes linearly as a function of x1. The slope of the

relationship is β′ = (β1 +β3x2). This depends on the value of x2. As x2 changes, the

11.5. INTERACTION BETWEEN PREDICTORS IN THEIR EFFECTS459

slope for the effect of x1 changes. In summary, the mean of Y is a linear function of

x1, but the slope of the line depends on the value of x2.

Note that now we can interpret β1 as the effect of x1 only when x2 = 0. Unless

x2 = 0 is a particular value of interest for x2, it is not particularly useful to form

confidence intervals or perform significance tests about β2 in this model.

Similarly, the mean of Y is a linear function of x2, but the slope varies according

to the value of x1. The coefficient β2 of x2 refers to the effect of x2 only at x1 = 0.

Example 11.6 Interaction Model for Mental Impairment

For the data set on Y = mental impairment, X1 = life events, and X2 = SES,

we create a third explanatory variable x3 that gives the cross product of x1 and

x2 for the 40 individuals. For the first subject, for example, x1 = 46, x2 = 84, so

x3 = 46(84) = 3864. Software makes it easy to create this variable without doing the

calculations yourself. Table 11.8 shows part of the printout for the interaction model.

The prediction equation is

y = 26.0 + 0.156x1 − 0.060x2 − 0.00087x1x2.

Table 11.8: Interaction Model for Y = Mental Impairment, X1 = Life Events, andX2 = SES

Sum of Mean

Squares DF Square F Sig

Regression 403.631 3 134.544 6.383 0.0014

Residual 758.769 36 21.077

Total 1162.400 39

R R Square

.589 .347

B Std. Error t Sig

(Constant) 26.036649 3.948826 6.594 0.0001

LIFE 0.155865 0.085338 1.826 0.0761

SES -0.060493 0.062675 -0.965 0.3409

LIFE*SES -0.000866 0.001297 -0.668 0.5087

Figure 11.10 portrays the relationship between predicted mental impairment and

life events for a few distinct SES values. For an SES score of x2 = 0, the relationship

between y and x1 is

y = 26.0 + 0.156x1 − 0.060(0) − 0.00087x1(0) = 26.0 + 0.156x1.


When x2 = 50, the prediction equation is

y = 26.0 + 0.156x1 − 0.060(50) − 0.00087(50)x1 = 23.0 + 0.113x1.

When x2 = 100, the prediction equation is

Y = 20.0 + 0.069x1.

The higher the value of SES, the smaller the slope between predicted mental impair-

ment and life events, and so the weaker is the effect of life events. This suggests that

subjects who possess greater resources, in the form of higher SES, are better able to

withstand the mental stress of potentially traumatic life events.

2

Testing an Interaction Term

For two explanatory variables, the model allowing interaction is

E(Y ) = α + β1x1 + β2x2 + β3x1x2.

The simpler model assuming no interaction is the special case β3 = 0. The hypothesis

of no interaction is H0: β3 = 0. As usual, the t test statistic divides the estimate of

the parameter (β3) by its standard error.

From Table 11.8, t = −0.00087/0.0013 = −0.67. The P -value for Ha: β3 6= 0

is P = 0.51. Little evidence exists of interaction. The variation in the slope of the

relationship between mental impairment and life events for various SES levels could

be due to sampling variability. The sample size here is small, however, and this makes

it difficult to estimate effects precisely. Studies based on larger sample sizes (e.g.,

Holzer 1977) have shown that interaction of the type seen in this example does exist

for these variables.

In Table 11.8, neither the test of H0: β1 = 0 or of H0: β2 = 0 have small P -

values. Yet, the tests of both H0: β1 = 0 and H0: β2 = 0 are highly significant for

the ‘no interaction’ model E(Y ) = α+β1x1 +β2x2; from Table 11.5, the P -values are

0.003 and 0.002. This loss of significance occurs because x3 = x1x2 is quite strongly

correlated with x1 and x2, with rX1X3= 0.779 and rX2X3

= 0.646. These substantial

correlations are not surprising, since x3 = x1x2 is completely determined by x1 and

x2.

Since considerable overlap occurs in the variation in Y that is explained by x1

and by x1x2, and also by x2 and x1x2, the partial variability explained by each is

relatively small. For example, much of the predictive power contained in x1 is also

contained in x2 and x1x2. The unique contribution of x1 (or x2) to the model is

relatively small, and nonsignificant, when x2 (or x1) and x1x2 are in the model.

When the evidence of interaction is weak, as it is here with a P -value of 0.51, it

is best to drop the interaction term from the model before testing hypotheses about

partial effects such as H0: β1 = 0 or H0: β2 = 0. On the other hand, if the evidence

of interaction is strong, it no longer makes sense to test these other hypotheses. If

there is interaction, then the effect of each variable exists and differs according to the

level of the other variable.

11.5. INTERACTION BETWEEN PREDICTORS IN THEIR EFFECTS461

Centering the Explanatory Variables∗

For the mental health data, we’ve seen that x1 and x2 are highly significant in the

model with only those predictors (see Table 11.5) but lose their significance after

entering the interaction term, even though the interaction is not significant (see Table

11.8). We also saw that the coefficients of x1 and x2 in an interaction model are not

usually meaningful, because they refer to the effect of a predictor only when the other

predictor equals 0.

Suppose we center the scores for each variable around 0, by subtracting the mean.

Letting xC1 = x1 −µX1

and xC2 = x2 −µX2

, we then express the interaction model as

E(Y ) = α + β1xC1 + β2xC

2 + β3xC1 xC

2

= α + β1(x1 − µX1) + β2(x2 − µX2

) + β3(x1 − µX1)(x2 − µX2

).

Now, β1 refers to the effect of x1 at the mean of x2, and β2 refers to the effect of x2

at the mean of x1.

When we rerun the interaction model for the mental health data after centering

the predictors about their sample means, that is, with LIFE CEN = LIFE - 44.425

and SES CEN = SES - 56.60, we get

B Std. Error t Sig

(Constant) 27.359555 0.731366 37.409 0.0001

LIFE_CEN 0.106850 0.033185 3.220 0.0027

SES_CEN -0.098965 0.029390 -3.367 0.0018

LIFE_CEN*SES_CEN -0.000866 0.001297 -0.668 0.5087

The estimate for the interaction term is the same as for the model with uncentered

predictors, but now the estimates (and standard errors) for the effects of x1 and x2

alone are similar to the values for the no-interaction model. Also, their statistical

significance is similar as in that model.

Centering the predictor variables before using them in a model allowing interaction

has two benefits. First, the estimates of the effects of x1 and x2 are more meaningful,

being effects at the mean rather than at 0. Second,q the estimates and their standard

errors are similar as in the no-interaction model. The cross-product term with centered

variables does not overlap with the other terms like it does in the ordinary model.

Generalizations and Limitations∗

When the number of explanatory variables exceeds two, a model allowing interaction

can have cross-products for each pair of explanatory variables. For example, with

three explanatory variables, an interaction model is

E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3.

This is a special case of multiple regression with six explanatory variables, identifying

x4 = x1x2, x5 = x1x3, and x6 = x2x3. Significance tests can judge which, if any, of

the cross-product terms are needed in the model.


When interaction exists and the model contains cross-product terms, it is more

difficult to summarize simply the relationships. One approach is to sketch a collection

of lines such as those in Figure 11.10 to describe graphically how the relationship

between two variables changes according to the values of other variables. Another

possibility is to divide the data into groups according to the value on a control variable

(e.g., high on x2, medium on x2, low on x2) and report the slope between Y and x1

within each subset as a means of describing the interaction.

The interaction terms in the above model are called second-order, to distinguish

them from higher-order interaction terms with products of more than two variables

at a time. Such terms are occasionally used in more complex models, not considered

in this chapter.

11.6 Comparing Regression Models

When the number of explanatory variables increases, the multiple regression model

becomes more difficult to interpret and some variables may become redundant. This

is especially true when some explanatory variables are cross-products of others, to

allow for interaction. Not all predictors may be needed in the model. We next present

a test of whether a model fits significantly better than a simpler model containing

only some of the predictors.

Complete and Reduced Models

We refer to the full model with all the predictors as the complete model. The

model containing only some of these predictors is called the reduced model. The

reduced model is said to be nested within the complete model, being a special case of

it.

The complete and reduced models are identical if the partial regression coefficients

for the extra variables in the complete model all equal 0. In that case, none of the

extra predictors increases the explained variability in Y , in the population of interest.

Testing whether the complete model is identical to the reduced model is equivalent to

testing whether the extra parameters in the complete model equal 0. The alternative

hypothesis is that at least one of these extra parameters is not 0, in which case the

complete model is better than the reduced model.

For instance, a complete model with three explanatory variables and all the second-

order interaction terms is


The reduced model without the interaction terms is

E(Y ) = α + β1x1 + β2x2 + β3x3.

The test comparing the complete model to the reduced model has H0: β4 = β5 =

β6 = 0.

11.6. COMPARING REGRESSION MODELS 463

Comparing Models by Comparing SSE or R2 Values

The test statistic for comparing two regression models compares the residual sums

of squares for the two models. Denote SSE =∑

(y − y)2 for the reduced model by

SSEr and for the complete model by SSEc. Now, SSEr ≥ SSEc, because the reduced

model has fewer predictors and tends to make poorer predictions. Even if H0 were

true, we would not expect the estimates of the extra parameters and the difference

(SSEr−SSEc) to equal 0. Some reduction in error occurs from fitting the extra terms

because of sampling variability.

The test statistic uses the reduction in error, SSEr − SSEc, that results from

adding the extra variables. It has df = the number of extra terms in the complete

model. An equivalent statistic uses the R2 values, R2c for the complete model and R2

r

for the reduced model. The test statistic equals

F =(SSEr − SSEc)/df1

SSEc/df2=

(R2c − R2

r)/df1

(1 − R2c)/df2

,

where df1 is the number of extra terms in the complete model and df2 is the usual

residual df for the complete model, which is df2 = n − (k + 1). A relatively large

reduction in error (or relatively large increase in R2) yields a large F test statistic

and small P -value. As usual for F statistics, the P -value is the right-tail probability.

Example 11.7 Comparing Models for Mental Impairment

For the mental impairment data, a comparison of the complete model

E(Y ) = α + β1x1 + β2x2 + β3x1x2

to the reduced model

E(Y ) = α + β1x1 + β2x2

analyzes whether interaction exists. The complete model has just one additional term,

and the null hypothesis is H0: β3 = 0.

The sum of squared errors for the complete model is SSEc = 758.8 (Table 11.8),

while for the reduced model it is SSEr = 768.2 (Table 11.7). The difference

SSEr − SSEc = 768.2 − 758.8 = 9.4

has df1 = 1 since the complete model has one more parameter. Since the sample size

is n = 40, df2 = n − (k + 1) = 40 − (3 + 1) = 36, the df for SSE in Table 11.8. The F

test statistic equals

F =(SSEr − SSEc)/df1

SSEc/df2=

9.4/1

758.8/36= 0.45.

Equivalently, the R2 values for the two models are R2r = 0.339 and R2

c = 0.347, so

F =

(

R2c − R2

r

)

/df1(

1 − R2c

)

/df2

=(0.347 − 0.339)/1

(1 − 0.347)/36= 0.45.


From software, the P -value from the F distribution with df1 = 1 and df2 = 36 is

P = 0.51. There is little evidence that the complete model is better. The null

hypothesis seems plausible, so the reduced model is adequate.

When H0 contains a single parameter, the t test is available. In fact, from the

previous section (and Table 11.8), the t statistic equals

t =b3se

=−0.00087

0.0013= −0.67.

It also has a P -value of 0.51 for Ha: β3 6= 0. We get the same result with the t test as

with the F test for complete and reduced models. In fact, the F test statistic equals

the square of the t statistic. (Refer to the final subsection in Section 11.4.)

2

The t test method is limited to testing one parameter at a time. The F test can

test several regression parameters together to analyze whether at least one of them is

nonzero, such as in the global F test of H0 : β1 = · · · = βk = 0 or the test comparing

a complete model to a reduced model. F tests are equivalent to t tests only when H0

contains a single parameter.

11.7 Partial Correlation∗

Multiple regression models describe the effect of an explanatory variable on the re-

sponse variable while controlling for other variables of interest. Related measures

describe the strength of the association. For example, to describe the association

between mental impairment and life events, controlling for SES, we could ask, “Con-

trolling for SES, what proportion of the variation in mental impairment does life

events explain?”

These measures describe the partial association between Y and a particular pre-

dictor, whereas the multiple correlation and R2 describe the association between Y

and the entire set of predictors in the model. The partial correlation is based on the

ordinary correlations between each pair of variables. For a single control variable, it

is defined as follows:

Partial Correlation

The sample partial correlation between Y and X2, controlling for X1, is

rY X2·X1=

rY X2− rY X1

rX1X2√

(

1 − r2Y X1

) (

1 − r2X1X2

)

.

In the symbol rY X2·X1, the variable to the right of the dot represents the controlled

variable. The analogous formula for rY X1·X2(i.e., controlling X2) is

rY X1·X2=

rY X1− rY X2

rX1X2√

(

1 − r2Y X2

) (

1 − r2X1X2

)

.

11.7. PARTIAL CORRELATION∗ 465

Since one variable is controlled, the partial correlations rY X1·X2and rY X2·X1

are

called first-order partial correlations.

Example 11.8 Partial Correlation Between Education and Crime Rate

Example 11.1 discussed a data set for counties in Florida, with Y = crime rate,

X1 = education, and X2 = urbanization. The pairwise correlations are rY X1=

0.468, rY X2= 0.678, and rX1X2

= 0.791. It was surprising to observe a positive

correlation between crime rate and education. Can it be explained by their joint

dependence on urbanization? This is plausible if the association disappears when we

control for urbanization.

The partial correlation between crime rate and education, controlling for urban-

ization, equals

rY X1·X2=

rY X1− rY X2

rX1X2√

(1 − r2Y X2

)(1 − r2X1X2

)=

0.468 − 0.678(0.791)√

(1 − 0.6782) (1 − 0.7912)= −0.152.

Not surprisingly, rY X1·X2is much smaller than rY X1

. It even has a different direction,

illustrating Simpson’s paradox. The relationship between crime rate and education

may well be spurious, reflecting their joint dependence on urbanization.

2

Interpreting Partial Correlations

The partial correlation has properties similar to those for the ordinary correlation

between two variables, such as a range of −1 to +1, larger absolute values representing

stronger associations, and value free of the units. We list the properties below for

rY X1·X2, but analogous properties apply to rY X2·X1

.

• rY X1·X2falls between −1 and +1.

• The larger the absolute value of rY X1·X2, the stronger the association between

Y and X1, controlling for X2.

• The value of a partial correlation does not depend on the units of measurement

of the variables.

• rY X1·X2has the same sign as the partial slope (b1) for the effect of x1 in the

prediction equation y = a+b1x1+b2x2. This happens because the same variable

(x2) is controlled in the model as in the correlation.

• Under the assumptions for conducting inference for multiple regression (see

the beginning of Section 11.4), rY X1·X2estimates the correlation between Y

and X1 at every fixed value of X2. If we could control X2 by considering

a subpopulation of subjects all having the same value on X2, then rY X1·X2

estimates the correlation between Y and X1 for that subpopulation.


• The sample partial correlation is identical to the correlation computed for the

points in the partial regression plot (Section 11.2).

Interpreting Squared Partial Correlations

Like r2 and R2, the square of a partial correlation has a proportional reduction in

error (PRE) interpretation. It states that r2Y X2·X1

is the proportion of variation in

Y explained by X2, controlling for X1. This squared measure describes the effect of

removing from consideration the portion of the total sum of squares (TSS) in Y that

is explained by X1, and then finding the proportion of the remaining unexplained

variation in Y that is explained by X2.

Squared Partial Correlation

The square of the partial correlation rY X2·X1represents the proportion of the variation

in Y that is explained by X2, out of that left unexplained by X1. It equals

r2Y X2·X1

=R2 − r2

Y X1

1 − r2Y X1

=Partial proportion explained uniquely by X2

Proportion unexplained by X1.

Recall from Section 9.4 that r2Y X1

represents the proportion of the variation in

Y explained by X1. The remaining proportion (1 − r2Y X1

) represents the variation

left unexplained. When X2 is added to the model, it accounts for some additional

variation. The total proportion of the variation in Y accounted for by X1 and X2

jointly is R2 for the model with both X1 and X2 as explanatory variables. So, R2 −r2Y X1

is the additional proportion of the variability in Y explained by X2, after the

effects of X1 have been removed or controlled. The maximum this difference could

be is 1 − r2Y X1

, the proportion of variation yet to be explained after accounting for

the influence of X1. The additional explained variation R2 − r2Y X1

divided by this

maximum possible difference is a measure that has a maximum possible value of 1. In

fact, as the above formula suggests, this ratio equals the squared partial correlation

between Y and X2, controlling for X1.

Figure 11.11 illustrates this property of the squared partial correlation. It shows

the ratio of the partial contribution of X2 beyond that of X1, namely, R2 − r2Y X1

,

divided by the proportion (1 − r2Y X1

) left unexplained by X1. Similarly, the square

of rY X1·X2equals

r2Y X1·X2

=R2 − r2

Y X2

1 − r2Y X2

,

the proportion of variation in Y explained by X1, out of that part unexplained by X2.

Example 11.9 Partial Correlation of Life Events with Mental Impairment

We return to the mental health study, with Y = mental impairment, X1 = life

events, X2 = SES. Software reports the correlation matrix,

11.7. PARTIAL CORRELATION∗ 467

IMPAIR LIFE SES

IMPAIR 1.000 .372 -.399

LIFE .372 1.000 .123

SES -.399 .123 1.000

So, rY X1= 0.372, rY X2

= −0.399, and rX1X2= 0.123. By its definition, the partial

correlation between mental impairment and life events, controlling for SES, is

rY X1·X2=

rY X1− rY X2

rX1X2√

(

1 − r2Y X2

) (

1 − r2X1X2

)

=0.372 − (−0.399)(0.123)

√

[1 − (−0.399)2] (1 − 0.1232)= 0.463.

The partial correlation, like the correlation of 0.37 between mental impairment and

life events, is moderately positive.

Since r2Y X1·X2

= (0.463)2 = 0.21, controlling for SES, 21% of the variation in

mental impairment is explained by life events. Alternatively, since R2 = 0.339 (Table

11.7),

r2Y X1·X2

=R2 − r2

Y X2

1 − r2Y X2

=0.339 − (−0.399)2

1 − (−0.399)2= 0.21.

2

Higher-Order Partial Correlations

One reason we showed the connection between squared partial correlation values and

R-squared is that this approach also works when the number of control variables

exceeds one. For example, with three predictors, let R2Y (X1,X2,X3) denote the value

of R2. The square of the partial correlation between Y and X3, controlling for X1 and

X2, relates to how much larger this is than the R2 value for the model with only X1

and X2 as predictors, which we denote by R2Y (X1,X2). The squared partial correlation

is

r2Y X3·X1,X2

=R2

Y (X1,X2,X3) − R2Y (X1,X2)

1 − R2Y (X1,X2)

.

In this expression, R2Y (X1,X2,X3) −R2

Y (X1,X2)is the increase in the proportion of

explained variance from adding x3 to the model. The denominator 1 − R2Y (X1,X2)

is

the proportion of the variation left unexplained when x1 and x2 are the only predictors

in the model.

The partial correlation rY X3·X1,X2is called a second-order partial corre-

lation , since it controls two variables. It has the same sign as b3 in the prediction

equation y = a + b1x1 + b2x2 + b3x3, which also controls x1 and x2 in describing the

effect of x1.

Inference for Partial Correlations

Controlling for a certain set of variables, the slope of the partial effect of a predictor is 0

in the same situations in which the partial correlation between Y and that predictor is

0. An alternative formula for the t test for a partial effect uses the partial correlation.


With k predictors in the model, the equivalent t test statistic is

t =partial correlation

√

(1 − squared partial correlation)/[n − (k + 1)].

This statistic has the t distribution with df = n − (k + 1). It equals the t statistic

based on the partial slope estimate and, hence, has the same P -value.

We illustrate by testing that the population partial correlation between mental

impairment and life events, controlling for SES, is 0. From Example 11.9, rY X1·X2=

0.463. There are k = 2 explanatory variables and n = 40 observations. The test

statistic equals

t =rY X1·X2

√

(1 − r2Y X1·X2

)/[n − (k + 1)]=

0.463√

[1 − (0.463)2]/37= 3.18.

This equals the test statistic for H0: β1 = 0 in Table 11.5. Thus, the P -value is also

the same, P = 0.003.

When no variables are controlled (i.e., the number of explanatory variables is

k = 1), the t statistic formula simplifies to

t =r

√

(1 − r2)/(n − 2).

This is the statistic for testing that the population bivariate correlation equals 0

(Section 9.5). Confidence intervals for partial correlations are more complex. They

require a log transformation such as shown for the correlation in Exercise 65 in Chapter

9.

11.8 Standardized Regression Coefficients∗

As in bivariate regression (Recall Section 9.4), the sizes of regression coefficients in

multiple regression models depend on the units of measurement for the variables. To

compare the relative effects of two explanatory variables, it is appropriate to compare

their coefficients only if the variables have the same units. Otherwise, standardized

versions of the regression coefficients provide more meaningful comparisons.

Standardized Regression Coefficient

The standardized regression coefficient for an explanatory variable represents the

change in the mean of Y , in Y standard deviations, for a one standard deviation increase

in that variable, controlling for the other explanatory variables in the model. We denote

them by β∗

1 , β∗

2 , . . . .

If |β∗

2 | > |β∗

1 |, for example, then a standard deviation increase in X2 has a greater

partial effect on Y than does a standard deviation increase in X1.

11.8. STANDARDIZED REGRESSION COEFFICIENTS∗ 469

The Standardization Mechanism

The standardized regression coefficients represent the values the regression coefficients

take when the units are such that Y and the explanatory variables all have equal

standard deviations. We standardize the partial regression coefficients by adjusting for

the differing standard deviation of Y and each Xi. Let sy denote the sample standard

deviation of Y , and let sx1, sx2

, . . . , sxkdenote the sample standard deviations of the

explanatory variables.

The estimates of the standardized regression coefficients are

b∗1 = b1

(

sx1

sy

)

, b∗2 = b2

(

sx2

sy

)

, . . . .

Example 11.10 Standardized Coefficients for Mental Impairment

The prediction equation relating mental impairment to life events and SES is

y = 28.23 + 0.103x1 − 0.097x2.

Table 11.2 reported the sample standard deviations sy = 5.5, sx1= 22.6, and sx2

=

25.3. Since the unstandardized coefficient of x1 is b1 = 0.103, the estimated standard-

ized coefficient is

b∗1 = b1

(

sx1

sy

)

= 0.103(

22.6

5.5

)

= 0.43.

Since b2 = −0.097, the standardized value equals

b∗2 = b2

(

sx2

sy

)

= −0.097(

25.3

5.5

)

= −0.45.

The estimated change in the mean of Y for a standard deviation increase in x1,

controlling for x2, has similar magnitude as the estimated change for a standard

deviation increase in x2, controlling for x1. However the partial effect of x1 is positive,

whereas the partial effect of x2 is negative.

Table 11.9, which repeats Table 11.5, shows how SPSS reports the estimated stan-

dardized regression coefficients. It uses the heading BETA, reflecting the alternative

name beta weights for these coefficients.

2


Table 11.9: SPSS Printout for Fit of Multiple Regression Model to Mental Impair-ment Data




(Constant) 28.230 2.174 12.984 .000

LIFE .103 .032 .428 3.177 .003

SES -.097 .029 -.451 -3.351 .002

Properties of Standardized Regression Coefficients

For bivariate regression, standardizing the regression coefficient yields the correla-

tion. For the multiple regression model, the standardized partial regression coefficient

relates to the partial correlation (Exercise 67), and it usually takes similar value.

Unlike the partial correlation, however, b∗i need not fall between −1 and +1. A

value |b∗i | > 1 occasionally occurs when Xi is highly correlated with the set of other

explanatory variables in the model. In such cases, the standard errors are usually

large and the estimates are unreliable.

Since a standardized regression coefficient is a multiple of the unstandardized

coefficient, one equals 0 when the other does. The test of H0: β∗

i = 0 is equivalent to

the t test of H0: βi = 0. It is unnecessary to have separate tests for these coefficients.

In the sample, the magnitudes of the {b∗i } have the same relative sizes as the t statistics

from those tests. For example, the predictor with the greatest standardized partial

effect is the one that has the largest t statistic, in absolute value.

Standardized Form of Prediction Equation∗

Regression equations have an expression using the standardized regression coefficients.

In this equation, the variables appear in standardized form.

Notation for Standardized Variables

Let zY , zX1, . . . , zXk

denote the standardized versions of the variables Y, X1, . . . , Xk .

For instance, zY = (y − y)/sy represents the number of standard deviations that an

observation on y falls from its mean.

Each subject’s scores on y, x1, . . . , xk have corresponding z-scores for zY , zX1, . . . , zXk

.

If a subject’s score on x1 is such that zX1= (x1 − x1)/sx1

= 2.0, for instance, then

that subject falls two standard deviations above the mean x1 on that variable.

Let zY = (y − y)/sy denote the predicted z-score for the response variable. For

the standardized variables and the estimated standardized regression coefficients, the

prediction equation is

zY = b∗1zX1+ b∗2zX2

+ · · · + b∗kzXk.

11.8. STANDARDIZED REGRESSION COEFFICIENTS∗ 471

This equation predicts how far an observation on y falls from its mean, in standard

deviation units, based on how far the explanatory variables fall from their means, in

standard deviation units. The standardized coefficients are the weights attached to

the standardized explanatory variables in contributing to the predicted standardized

response variable.

Example 11.11 Standardized Prediction Equation for Mental Impairment

Example 11.10 found that the estimated standardized regression coefficients for

the life events and SES predictors of mental impairment are b∗1 = 0.43 and b∗2 = −0.45.

The prediction equation relating the standardized variables is therefore

zY = 0.43zX1− 0.45zX2

.

Consider a subject who is two standard deviations above the mean on life events

but two standard deviations below the mean on SES. This subject has a predicted

standardized mental impairment of

zY = 0.43(2) − 0.45(−2) = 1.8.

The predicted mental impairment for that subject is 1.8 standard deviations above

the mean. If the distribution of mental impairment is approximately normal, this

subject might well have mental health problems, since only about 4% of the scores in

a normal distribution fall at least 1.8 standard deviations above their mean.

2

In the prediction equation with standardized variables, no intercept term appears.

Why is this? When the standardized explanatory variables all equal 0, those variables

all fall at their means. Then, y = y, so that

zY =y − y

sy= 0.

So, this merely tells us that a subject who falls at the mean on each explanatory

variable is predicted to fall at the mean on the response variable.

Cautions in Comparing Standardized Regression Coeffi-

cients

To assess which predictor in a multiple regression model has the greatest impact on the

response variable, it is tempting to compare their standardized regression coefficients.

Make such comparisons with caution. In some cases, the observed differences in the

b∗i may simply reflect sampling error. In particular, when multicollinearity exists, the

standard errors are high and the estimated standardized coefficients may be unstable.

Keep in mind also that the effects are partial ones, depending on which other

variables are in the model. An explanatory variable that seems important in one


system of variables may seem unimportant when other variables are controlled. For

example, it is possible that |b∗2| > |b∗1| in a model with two explanatory variables, yet

when a third explanatory variable is added to the model, |b∗2| < |b∗1|.It is unnecessary to standardize to compare the effect of the same variable for two

groups, such as in comparing the results of separate regressions for females and males,

since the units of measurement are the same in each group. In fact, it is usually unwise

to standardize in this case, because the standardized coefficients are more susceptible

than the unstandardized coefficients to differences in the standard deviations of the

predictors. For instance, Section 9.6 showed that the correlation depends strongly on

the range of x-values sampled. Two groups that have the same value for an estimated

regression coefficient have different standardized coefficients if the standard deviation

of the predictor differs for the two groups.

Finally, if an explanatory variable is highly correlated with the set of other ex-

planatory variables, it is artificial to conceive of that variable changing while the

others remain fixed in value. As an extreme example, suppose Y = height, X1 =

length of left leg, and X2 = length of right leg. The correlation between X1 and X2

is extremely close to 1. It does not make much sense to imagine how Y changes as

X1 changes while X2 is controlled.

11.9 Chapter Summary

This chapter generalized the bivariate regression model to include additional explana-

tory variables. The multiple regression equation relating a response variable Y

to a set of k explanatory variables is

E(Y ) = α + β1x1 + β2x2 + · · · + βkxk.

• The {βi} are partial regression coefficients . The value βi is the change

in the mean of Y for a one-unit change in xi, controlling for the other variables

in the model.

• The multiple correlation R describes the association between Y and the

collective set of explanatory variables. It equals the correlation between the

observed and predicted y-values. It falls between 0 and 1.

• R2 = (TSS − SSE)/TSS represents the proportional reduction in error from

predicting Y using the prediction equation y = a + b1x1 + b2x2 + · · · + bkxk

instead of y. It equals the square of the multiple correlation.

• A partial correlation , such as rY X1·X2, describes the association between

two variables, controlling for others. It falls between −1 and +1.

• The squared partial correlation between Y and xi represents the proportion of

the variation in Y that can be explained by xi, out of that part left unexplained

by a set of control variables.

Chap. 11 Problems 473

• An F statistic tests H0 : β1 = β2 = · · · = βk = 0, that the response variable

is independent of all the predictors. A small P -value suggests that at least one

predictor affects the response.

• Individual t tests and confidence intervals for {βi} analyze partial effects of each

predictor, controlling for the other variables in the model.

• Interaction between x1 and x2 in their effects on Y means that the effect of

either predictor changes as the value of the other predictor changes. We can

allow this by introducing cross-products of explanatory variables to the model,

such as the term β3(x1x2).

• To compare regression models, a complete model and a simpler reduced

model, the F test compares the SSE values or R2 values.

• Standardized regression coefficients do not depend on the units of mea-

surement. The estimated standardized coefficient b∗i describes the change in Y ,

in Y standard deviation units, for a one standard deviation increase in xi, con-

trolling for the other explanatory variables.

To illustrate, with k = 2 explanatory variables, the prediction equation is

Y = a + b1x1 + b2x2.

Fixing x2, a straight line describes the relation between Y and x1. Its slope b1is the change in y for a one-unit increase in x1, controlling for x2. The multiple

correlation R is at least as large as the correlations between Y and each predictor.

The squared partial correlation r2Y X2·X1

is the proportion of the variation of Y that

is explained by x2, out of that part of the variation left unexplained by x1. The

estimated standardized regression coefficient b∗1 = b1(sx1/sy) describes the effect of a

standard deviation change in x1, controlling for x2.

Table 11.10 summarizes the basic properties and inference methods for these mea-

sures and those introduced in Chapter 9 for bivariate regression.

The model studied in this chapter is still somewhat restrictive in the sense that

all the predictors are quantitative. The next chapter shows how to include categorical

predictors in the model.

PROBLEMS

Practicing the Basics

1. For students at Walden University, the relationship between Y = college GPA

(with range 0–4.0) and X1 = high school GPA (range 0–4.0) and X2 = college

board score (range 200–800) satisfies E(Y ) = 0.20 + 0.50x1 + 0.002x2.

a) Find the mean college GPA for students having (i) high school GPA = 4.0

and college board score = 800, (ii) x1 = 3.0 and x2 = 300.

b) Show that the relationship between Y and x1 for those students with x2 =


Table 11.10: Summary of Bivariate and Multiple Regression

BIVARIATE REGRESSION MULTIPLE REGRESSIONModel E(Y ) = α + βx E(Y ) = α + β1x1 + · · · + βkxk

Predictionequation

y = a + bx y = a + b1x1 + · · · + bkxk

Simultaneous effect Partial effect ofof x1 . . . , xk one xi

Propertiesof

measures

b = Sloper = correlation,standardized slope,−1 ≤ r ≤ 1,r has the same sign as b

R = Multiple correlation,0 ≤ R ≤ 1

bi = Partial slopeb∗i = Standardized re-gression coefficient

r2 = PRE measure,0 ≤ r2 ≤ 1

R2 = PRE measure,0 ≤ R2 ≤ 1

Partial correlation,−1 ≤ rY X1·X2

≤ 1,same sign as bi andb∗i , r2

Y X1·X2is PRE

measure

Tests of noassociation

H0: β = 0 or H0: ρ = 0,Y not associated with x

H0: β1 = · · · = βk =0 (Y not associated withx1, . . . xk)

H0: βi = 0, orH0: popul. partial corr.= 0,Y not associated withxi, controlling for otherx variables

Teststatistic

t = bse

= r√

1−r2

n−2

F =Regression MSResidual MS

t = bi

se

df = n − 2 =R2/k

(1−R2)/[n−(k+1)], df = n − (k + 1)

df1 = k, df2 = n − (k + 1)


500 is E(Y ) = 1.2 + 0.5x1.

c) Show that when x2 = 600, E(Y ) = 1.4 + 0.5x1. Thus, increasing x2 by 100

shifts the line relating Y to x1 upward by 100β2 = 0.2 units.

d) Show that setting x1 at a variety of values yields a collection of parallel lines,

each having slope 0.002, relating the mean of Y to x2.

2. For recent data in Florida on Y = selling price of home (in dollars), X1 = size

of home (in square feet), X2 = lot size (in square feet), the prediction equation

is y = −10, 536 + 53.8x1 + 2.84x2.

a) A particular home of 1240 square feet on a lot of 18,000 square feet sold for

$145,000. Find the predicted selling price and the residual, and interpret.

b) For fixed lot size, how much is the house selling price predicted to increase

for each square foot increase in home size? Why?

3. Refer to the previous exercise:

a) For fixed home size, how much would lot size need to increase to have the

same impact as a one square foot increase in home size?

b) Suppose house selling prices are changed from dollars to thousands of dollars.

Explain why the prediction equation changes to y = −10.536+0.0538+0.00284.

4. Use software with the “2005 statewide crime” data file at the text website, with

murder rate (number of murders per 100,000 people) as the response variable

and with percent of high school graduates and the poverty rate (percentage of

the population with income below the poverty level) as explanatory variables.

a) Construct the partial regression plots. Interpret.

b) Report the prediction equation. Explain how to interpret the estimated

coefficients.

c) Re-do the analyses after deleting the D.C. observation. Does this observation

have much influence on the results?

5. A regression analysis with recent U.N. data from several nations on Y = per-

centage of people who use the Internet, X1 = per capita gross domestic product

(in thousands of dollars), and X2 = percentage of people using cell phones has

results shown in Table 11.11.

a) Write the prediction equation.

b) Find the predicted Internet use for a country with per capita GDP of 10

thousand dollars and 50% using cell phones.

c) Find the prediction equations when cell-phone use is (i) 0 %, (ii) 100%, and

use them to interpret the effect of GDP.

d) Use the equations in (c) to explain the ‘no interaction’ property of the model.

6. Refer to the previous exercise.

a) Show how to obtain R-squared from the sums of squares in the ANOVA

table. Interpret it.

b) r2 = 0.78 when GDP is the sole predictor. Why do you think R2 does not

increase much when cell-phone use is added to the model, even though it is


Table 11.11:

B Std. Error t Sig

(Constant) -3.601 2.506 -1.44 0.159

GDP 1.2799 0.2703 4.74 0.000

CELLULAR 0.1021 0.0900 1.13 0.264

R Square .796

ANOVA

Sum of Squares DF

Regression 10316.8 2

Residual Error 2642.5 36

Total 12959.3 38

itself highly associated with Y (with r = 0.67)? (Hint: Would you expect X1

and X2 to be highly correlated? If so, what’s the effect?)

7. Table 9.17 showed data from Florida counties on Y = crime rate (number per

1000 residents), X1 = median income (thousands of dollars), and X2 = percent

in urban environment.

a) Figure 11.12 shows a scatterplot relating Y to X1. Predict the sign that the

estimated effect of X1 has in the prediction equation Y = a + bx1. Explain.

b) Figure 11.13 shows a partial regression plot relating Y to X1, controlling

for X2. Predict the sign that the estimated effect of X1 has in the prediction

equation Y = a + b1x1 + b2x2. Explain.

c) Table 11.12 shows part of a printout for the bivariate and multiple regression

models. Report the prediction equation relating Y to x1, and interpret the slope.

d) Report the prediction equation relating Y to both x1 and x2. Interpret the

coefficient of x1, and compare to (c).

e) The correlations are rY X1= 0.43, rY X2

= 0.68, rX1X2= 0.73. Use these to

explain why the x1 effect seems so different in (c) and (d).

f) Report the prediction equations relating crime rate to income at urbanization

levels of (i) 0, (ii) 50, (iii) 100. Interpret.

8. Refer to the previous exercise. Using software with the “Florida crime” data

file at the text website:

a) Construct box plots for each variable and scatterplots and partial regression

plots between Y and each of x1 and x2. Interpret these plots.

b) Find the prediction equations for the bivariate effects of x1 and of x2. In-

terpret.

c) Find the prediction equation for the multiple regression model. Interpret.


Table 11.12:

B Std. Error t Sig

(Constant) -11.526 16.834 -0.685 0.4960

INCOME 2.609 0.675 3.866 0.0003

B Std. Error t Sig

(Constant) 40.261 16.365 2.460 0.0166

INCOME -0.809 0.805 -1.005 0.3189

URBAN 0.646 0.111 5.811 0.0001

d) Find R2 for the multiple regression model, and show that it is not much

larger than r2 for the model using urbanization alone as the predictor. Inter-

pret.

9. Recent UN data from several nations on Y = crude birth rate (number of births

per 1000 population size), X1 = women’s economic activity (female labor force

as percentage of male), and X2 = GNP (per capita, in thousands of dollars)

has prediction equation y = 34.53 − 0.13x1 − 0.64x2.

a) Interpret the coefficient of x1.

b) Sketch on a single graph the relationship between Y and x1 when x2 = 0,

x2 = 10, and x2 = 20. Interpret the results.

c) The bivariate prediction equation with x1 is y = 37.65 − 0.31x1. The corre-

lations are rY X1= −0.58, rY X2

= −0.72, and rX1X2= 0.58. Explain why the

coefficient of x1 in the bivariate equation is quite different from in the multiple

predictor equation.

10. For recent UN data for several nations, a regression of carbon dioxide use (CO2,

a measure of air pollution) on gross domestic product (GDP) has a correlation

of 0.786. With life expectancy as a second explanatory variable, the multiple

correlation is 0.787.

a) Explain how to interpret the multiple correlation.

b) For predicting CO2, did it help much to add life expectancy to the model?

Does this mean that life expectancy is very weakly correlated with CO2? Ex-

plain.

11. Table 11.13 shows a printout from fitting the multiple regression model to recent

statewide data, excluding D.C., on Y = violent crime rate (per 100,000 people),

X1 = poverty rate (percentage with income below the poverty level), and X2 =

percent living in metropolitan areas.

a) Report the prediction equation.

b) Massachusetts had y = 805, x1 = 10.7, and x2 = 96.2. Find its predicted


Table 11.13:

Sum of

Squares DF Mean Square F Sig

Regression 2448368.07 2 1224184.04 31.249 0.0001

Residual 1841257.15 47 39175.68

Total 4289625.22 49

R R Square Std Error of the Estimate

.7555 .5708 197.928

B Std. Error t Sig

(Constant) -498.683 140.988 -3.537 0.0009

POVERTY 32.622 6.677 4.885 0.0001

METRO 9.112 1.321 6.900 0.0001

Correlations

VIOLENT POVERTY METRO

VIOLENT 1.0000 .3688 .5940

POVERTY .3688 1.0000 -.1556

METRO .5940 -.1556 1.0000


violent crime rate. Find the residual, and interpret.

c) Interpret the fit by showing the prediction equation relating y and x1 for

states with (i) x2 = 0, (ii) x2 = 50, (iii) x2 = 100. Interpret.

d) Interpret the correlation matrix.

e) Report R2 and the multiple correlation, and interpret.


a) Report the F statistic for testing H0: β1 = β2 = 0, report its df values and

P -value, and interpret.

b) Show how to construct the t statistic for testing H0: β1 = 0, report its df

and P -value for Ha: β1 6= 0, and interpret.

c) Construct a 95% confidence interval for β1, and interpret.

d) Since these analyses use data for all the states, what relevance, if any, do

the inferences have in (a)–(c)?

13. Refer to the previous two exercises. When we add x3 = percentage of single-

parent families to the model, we get the results in Table 11.14.

a) Report the prediction equation and interpret the coefficient of poverty rate.

b) Why do you think the effect of poverty rate is much lower after x3 is added

to the model?

Table 11.14:Variable Coefficient Std. Error

Intercept -1197.538Poverty 18.283 (6.136)Metropolitan 7.712 (1.109)Single-parent 89.401 (17.836)R2 0.722n 50

14. Table 11.15 comes from a regression analysis of Y = number of children in

family, X1 = mother’s educational level in years (MEDUC), and X2 = father’s

socioeconomic status (FSES), for a random sample of 49 college students at

Texas A&M University.

a) Write the prediction equation. Interpret parameter estimates.

b) For the first subject in the sample, x1 = 12, x2 = 61, and y = 5. Find the

predicted value of y and the residual, and interpret.

c) Report SSE. Use it to explain the least squares property of this prediction

equation.

d) Explain why it is not possible that rY X1·X2= 0.40.

b) Can you tell from the table whether rY X1is positive or negative? Explain.

15. The General Social Survey has asked subjects to rate various groups using the

“feeling thermometer.” The rating is between 0 and 100, more favorable as the


Table 11.15:Sum of Squares

Regression 31.8Residual 199.3

B(Constant) 5.25MEDUC -0.24FSES 0.02

score gets closer to 100 and less favorable as the score gets closer to 0. For a

small data set from the GSS, Table 11.16 shows results of fitting the multiple

regression model with feelings toward liberals as the response, using explanatory

variables political ideology (scores 1 = extremely liberal, 2 = liberal, 3 = slightly

liberal, 4 = moderate, 5 = slightly conservative, 6 = conservative, 7 = extremely

conservative) and religious attendance, using scores (1 = never, 2 = less than

once a year, 3= once or twice a year, 4 = several times a year, 5 = about once

a month, 6 = 2-3 times a month, 7 = nearly every week, 8 = every week, 9 =

several times a week). Standard errors are shown in parentheses.

a) Report the prediction equation and interpret the ideology partial effect.

b) Report the predicted value and residual for the first observation, for which

ideology = 7, religion = 9, and feelings = 10.

c) Report, and explain how to interpret, R2.

d) Tables of this form often put * by an effect having P < 0.05, ** by an effect

having P < 0.01, and *** by an effect having P < 0.001. Show how this was

determined for the ideology effect, and discuss the disadvantage of summarizing

in this manner.

e) Explain how the F value can be obtained from the R2 value reported. Report

its df values, and explain how to interpret its result.

f) The estimated standardized regression coefficients are −0.79 for ideology and

−0.23 for religion. Interpret.

16. Refer to Table 11.5. Test H0: β2 = 0 that mental impairment is independent

of SES, controlling for life events. Report the test statistic, and report and

interpret the P -value for (a) Ha: β2 6= 0, (b) Ha: β2 < 0.

17. For a random sample of 66 state precincts, data are available on

Y = Percentage of adult residents who are registered to vote

X1 = Percentage of adult residents owning homes

X2 = Percentage of adult residents who are nonwhite

X3 = Median family income (thousands of dollars)

X4 = Median age of residents


Table 11.16:Variable Coefficient

Intercept 135.31Ideology -14.07

(3.16)**

Religion -2.95(2.26)

F 13.93**

R2 0.799Adj. R2 0.742(n) (10)

X5 = Percentage of residents who have lived in the

precinct at least ten years

Table 11.16 shows a portion of the printout used to analyze the data.

a) Fill in all the missing values in the printout, indicating in each ‘Sig’ space

whether P > 0.05, 0.01 < P < 0.05, 0.001 < P < 0.01, or P < 0.001.

b) Do you think it is necessary to include all five explanatory variables in the

model? Explain.

c) To what test does the “F Value” refer? Interpret the result of that test.

d) To what test does the t-value opposite x1 refer? Interpret the result of that

test.


a) Find a 95% confidence interval for the change in the mean of Y for a 1-unit

increase in the percentage of adults owning homes, controlling for the other

variables. Interpret.

b) Find a 95% confidence interval for the change in the mean of Y for a 50-unit

increase in the percentage of adults owning homes, controlling for the other

variables. Interpret.

19. Use software with the “house selling price” data file at the text website to con-

duct a multiple regression analysis of Y = selling price of home (dollars), X1 =

size of home (square feet), X2 = number of bedrooms, X3 = number of bath-

rooms.

a) Use graphics to display the effects of the predictors. Interpret, and explain

how the highly discrete nature of x2 and x3 affects the plots.

b) Report the prediction equation and interpret the estimates.

c) Inspect the correlation matrix, and report the variables having the (i) strongest

association, (ii) weakest association.

d) Report R2, and interpret.


Table 11.17:

Sum of DF Mean F Sig R-Square

Squares Square ----

Regression ---- --- ---- ---- ----

Residual 2940.0 --- ---- Root MSE

Total 3753.3 --- ----

Parameter Standard

Variable Estimate Error t Sig

Intercept 70.0000

x1 0.1000 0.0450 ---- ----

x2 -0.1500 0.0750 ---- ----

x3 0.1000 0.2000 ---- ----

x4 -0.0400 0.0500 ---- ----

x5 0.1200 0.0500 ---- ----

e) Find the F statistic for testing the overall effect of the three predictors, re-

port its df values and its P -value, and interpret.

f) Find the t test statistic for H0: β3 = 0, report its P -value for Ha: β3 > 0,

and interpret.

20. Refer to the previous exercise. Now use only number of bathrooms and number

of bedrooms as predictors.

a) Again test the partial effect of number of bathrooms, and interpret.

b) Construct a 95% confidence interval for the coefficient of number of bath-

rooms, and interpret.

c) Find the partial correlation between selling price and number of bathrooms,

controlling for number of bedrooms. Compare it to the correlation, and inter-

pret.

d) Find the estimated standardized regression coefficients for the model, and

interpret.

e) Write the prediction equation using standardized variables. Interpret.

21. Exercise 11 showed a regression analysis for statewide data on Y = violent

crime rate, X1 = poverty rate, and X2 = percent living in metropolitan areas.

When we add an interaction term, we get the prediction equation y = 158.9 −14.72x1 − 1.29x2 + 0.76x1x2.

a) As the percentage living in metropolitan areas increases, does the effect of

poverty rate tend to increase or decrease? Explain.

b) Show how to interpret the prediction equation, by finding how it simplifies


when x2 = 0, 50, and 100.

22. A study analyzes relationships among Y = percentage vote for Democratic

candidate, X1 = percentage of registered voters who are Democrats, and X2 =

percentage of registered voters who vote in the election, for several congressional

elections in 2006. The researchers expect interaction, since they expect a higher

slope between Y and x1 at larger values of x2 than at smaller values. They

obtain the prediction equation Y = 20+0.30x1 +0.05x2 +0.005x1x2. Does this

equation support the direction of their prediction? Explain.

23. Use software with the “house selling price” data file to allow interaction between

number of bedrooms and number of bathrooms in their effects on selling price.

a) Report the prediction equation.

b) Interpret the fit by showing the equation relating y and number of bedrooms

for homes with (i) two bathrooms, (ii) three bathrooms.

c) Use a test to analyze the significance of the interaction term. Interpret.

24. A multiple regression analysis investigates the relationship between Y = college

GPA and several explanatory variables, using a random sample of 195 students

at Slippery Rock University. First, high school GPA and total SAT score are

entered into the model. The sum of squared errors is SSE = 20. Next, parents’

education and parents’ income are added, to determine if they have an effect,

controlling for high school GPA and SAT. For this expanded model SSE = 19.

Test whether this complete model is significantly better than the one containing

only high school GPA and SAT. Report and interpret the P -value.

25. Table 11.18 shows results of regressing Y = birth rate (BIRTHS, number of

births per 1000 population) on x1 = women’s economic activity (ECON) and

x2 = literacy rate (LITERACY), using UN data for 23 nations.

a) Report the value of each of the following:

(i) rY X1(ii) rY X2

(iii) R2

(iv) TSS (v) SSE (vi) mean square error

(vii) s (viii) sy (ix) se for b1(x) t for H0: β1 = 0

(xi) P for H0: β1 = 0 against Ha: β1 6= 0

(xii) P for H0: β1 = 0 against Ha: β1 < 0

(xiii) F for H0: β1 = β2 = 0

(xiv) P for H0: β1 = β2 = 0

b) Report the prediction equation, and carefully interpret the three estimated

regression coefficients.

c) Interpret the correlations rY X1and rY X2

.

d) Report R2, and interpret its value.

e) Report the multiple correlation, and interpret.

f) Though inference may not be relevant for these data, report the F statistic

for H0: β1 = β2 = 0, report its P -value, and interpret.


g) Show how to construct the t statistic for H0: β1 = 0, report its df and

P -value for Ha: β1 6= 0, and interpret.

Table 11.18:

Mean Std Deviation N

BIRTHS 22.117 10.469 23

ECON 47.826 19.872 23

LITERACY 77.696 17.665 23

Correlations

BIRTHS ECON LITER

Correlation BIRTHS 1.00000 -0.61181 -0.81872

ECON -0.61181 1.00000 0.42056

LITERACY -0.81872 0.42056 1.00000

Sig.(2-tailed) BIRTHS . 0.0019 0.0001

ECON 0.0019 . 0.0457

LITERACY 0.0001 0.0457 .

Sum of

Squares DF Mean Square F Sig

Regression 1825.969 2 912.985 31.191 0.0001

Residual 585.424 20 29.271

Total 2411.393 22

Root MSE (Std. Error of the Estimate) 5.410 R Square 0.7572

Unstandardized Coeff. Standardized

B Std. Error Coeff. (Beta) t Sig

(Constant) 61.713 5.2453 11.765 0.0001

ECON -0.171 0.0640 -0.325 -2.676 0.0145

LITERACY -0.404 0.0720 -0.682 -5.616 0.0001


a) Find the partial correlation between Y and X1, controlling for X2. Interpret

both the partial correlation and its square.

b) Find the estimate of the conditional standard deviation, and interpret its

value.

c) Show how to find the estimated standardized regression coefficient for x1

using the unstandardized estimate and the standard deviations, and interpret

its value.


d) Write the prediction equation using standardized variables. Interpret.

e) Find the predicted z-score for a country that is one standard deviation above

the mean on both predictors. Interpret.

27. Refer to Examples 11.1 and 11.8. Explain why the partial correlation between

crime rate and high school graduation rate is so different from the bivariate

correlation. (This is an example of Simpson’s paradox, which states that a

bivariate association can have a different direction than a partial association.)

28. For a group of 100 children of ages varying from 3 to 15, the correlation be-

tween vocabulary score on an achievement test and height of child is 0.65. The

correlation between vocabulary score and age for this sample is 0.85, and the

correlation between height and age is 0.75.

a) Show that the partial correlation between vocabulary and height, controlling

for age, is 0.036. Interpret.

b) Test whether this partial correlation is significantly nonzero. Interpret.

c) Is it plausible that the relationship between height and vocabulary is spuri-

ous, in the sense that it is due to their joint dependence on age? Explain.

29. A multiple regression model describes the relationship among a collection of

cities between Y = murder rate (number of murders per 100,000 residents) and

X1 = Number of police officers (per 100,000 residents)

X2 = Median length of prison sentence given to convicted murderers

(in years)

X3 = Median income of residents of city (in thousands of dollars)

X4 = Unemployment rate in city

These variables are observed for a random sample of thirty cities with popu-

lation size exceeding 35,000. For these cities, the prediction equation is y =

30 − 0.02x1 − 0.1x2 − 1.2x3 + 0.8x4, and y = 15, x1 = 100, x2 = 15, x3 = 13,

x4 = 7.8, sy = 8, sx1= 30, sx2

= 10, sx3= 2, sx4

= 2.

a) Can you tell from the coefficients of the prediction equation which explana-

tory variable has the greatest partial effect on Y ? Explain.

b) Find the standardized regression coefficients and interpret their values.

c) Write the prediction equation using standardized variables. Find the pre-

dicted z-score on murder rate for a city that is one standard deviation above

the mean on x1, x2, and x3, and one standard deviation below the mean on x4.

Interpret.

30. Exercise 11 showed a regression of violent crime rate on poverty rate and percent

living in metropolitan areas. The estimated standardized regression coefficients

are 0.473 for poverty rate and 0.668 for percent in metropolitan areas.

a) Interpret the estimated standardized regression coefficients.

b) Express the prediction equation using standardized variables, and explain

how it is used.


Concepts and Applications

31. Refer to the student survey data set (Exercise 1.11). Using software, conduct

a regression analysis using Y = political ideology with predictors number of

times per week of newspaper reading and religiosity. Prepare a report, summa-

rizing your graphical analyses, bivariate models and interpretations, multiple

regression models and interpretations, inferences, checks of effects of outliers,

and overall summary of the relationships.

32. Repeat the previous exercise using Y = college GPA with predictors high school

GPA and number of weekly hours of physical exercise

33. Refer to the student data file you created in Exercise 1.12. For variables chosen

by your instructor, fit a multiple regression model and conduct descriptive and

inferential statistical analyses. Interpret and summarize your findings.

34. Using software with the “2005 statewide crime” data file at the text website,

conduct a regression analysis of murder rate with predictors poverty rate, the

percent living in urban areas, and percent of high school graduates. Conduct

descriptive and inferential analyses. Provide interpretations, and provide a

paragraph summary of your conclusions at the end of your report.

35. Repeat the previous exercise using violent crime rate as the response variable.

36. Refer to Exercise 34. Repeat this problem, excluding the observation for D.C.

Describe the effect on the various analyses of this observation.

37. Table 27 in Chapter 9 is the “UN data” data file at the text website. Construct

a multiple regression model containing two explanatory variables that provide

good predictions for the fertility rate. How did you select this model? (Hint:

One way is based on entries in the correlation matrix.)

38. In about 200 words, explain to someone who has never studied statistics what

multiple regression does and how it can be useful.

39. Analyze the “house selling price” data file at the text website (which were

introduced in Example 9.10), using selling price of home, size of home, number

of bedrooms, and taxes. Prepare a short report summarizing your analyses and

conclusions.

40. For Example 11.2 on mental impairment, Table 11.19 shows the result of adding

religious attendance as a predictor, measured as the approximate number of

times the subject attends a religious service over the course of a year. Write a

short report, interpreting the information from this table.

41. a study3 of mortality rates found in the U.S. that states with higher income

inequality tended to have higher mortality rates. The effect of income inequality

3A. Muller, BMJ, vol. 324, 2002


Table 11.19:Variable Coefficient

Intercept 27.422Life events 0.0935

(0.0313)**

SES -0.0958

(0.0256)***

Religious attendance -0.0370(0.0219)

R2 0.358(n) (40)

disappeared after controlling for the percentage of a state’s residents that had

at least a high school education. Explain how these results relate to analyses

conducted using bivariate regression and multiple regression.

42. A 2002 study4 relating the percentage of a child’s life spent in poverty to number

of years of education completed by the mother and the percentage of a child’s

life spent in a single parent home reported the results shown in Table 11.20.

Prepare a one-page report explaining how to interpret the results in this table.

Table 11.20:




(Constant) 56.401 2.121 12.662 .000

% single parent 0.323 .014 .295 11.362 .000

mother school -3.330 .152 -.290 -11.294 .000

F 611.6 (df = 2, 4731) Sig .000

R 0.453 R Square 0.205

43. The Economist magazine5 developed a quality-of-life index for nations as the

predicted value obtained by regressing an average of life-satisfaction scores from

several surveys on gross domestic product (GDP, per capita, in dollars), life ex-

pectancy (in years), an index of political freedom (from 1 = completely free

4http://www.heritage.org/Research/Family/cda02-05.cfm5http://www.economist.com/media/pdf/QUALITYOFLIFE.pdf


to 7 = unfree), the percentage unemployed, the divorce rate (on a scale of 1

for lowest rates to 5 for highest), latitude (to distinguish between warmer and

cold climes), a political stability measure, gender equality defined as the ratio

of average male and female earnings, and community life (1 if country has high

rate of church attendance or trade-union membership, 0 otherwise). Table 11.21

shows results of the model fit for 74 countries, for which the multiple correla-

tion is 0.92. The study used the prediction equation to predict the quality of

life in 2005 for 111 nations. The top 10 ranks were for Ireland, Switzerland,

Norway, Luxembourg, Sweden, Australia, Iceland, Italy, Denmark, and Spain.

Other ranks included 13 for the U.S., 14 for Canada, 15 for New Zealand, 16

for Netherlands, and 29 for the U.K.

a) Which variables would you expect to have negative effects on quality of life?

Is this supported by the results?

b) The study states that “GDP explains more than 50% of the variation in life

satisfaction.” How does this relate to a summary measure of association?

c) The study reported that “Using so-called Beta coefficients from the regres-

sion to derive the weights of the various factors, life expectancy and GDP were

the most important.” Explain what was meant by this.

d) Although GDP seems to be an important predictor, in a bivariate sense and

a partial sense, Table 11.21 reports a very small coefficient, 0.00003. Why do

you think this is?

e) The study mentioned other predictors that were not included because they

provided no further predictive power. For example, the study stated that ed-

ucation seemed to have an effect mainly through its effects on other variables

in the model, such as GDP, life expectancy, and political freedom. Does this

mean there is no association between education and quality of life? Explain.

Table 11.21:Coefficients Standard error t statistic

Constant 2.796 0.789 3.54

GDP per person 0.00003 0.00001 3.52

Life expectancy 0.045 0.011 4.23

Political freedom −0.105 0.056 −1.87

Unemployment − 0.022 0.010 − 2.21

Divorce rate −0.188 0.064 −2.93

Latitude −1.353 0.469 −2.89

Political stability 0.152 0.052 2.92

Gender equality 0.742 0.543 1.37

Community life 0.386 0.124 3.13

44. A recent article6 used multiple regression to predict attitudes toward homosex-

6T. Shackelford and A. Besser, Individual Differences Research, 2007


uality. The researchers found that the effect of number of years of education on

a measure of tolerance toward homosexuality varied from essentially no effect

for political conservatives to a considerably positive effect for political liberals.

Explain how this is an example of statistical interaction, and explain how it

would be handled by a multiple regression model.

45. In the study mentioned in the previous exercise, a separate model did not

contain interaction terms. The best predictor of attitudes toward homosexuality

was educational level, with an estimated standardized regression coefficient of

0.21. The authors also reported, “Controlling for other variables1, an additional

year of education completed was associated with a .09 rating unit increase in

attitudes toward homosexuality.” In comparing the effect of education with the

effects of other predictors in the model, such as the age of the subject, explain

the purpose of estimating standardized coefficients. Explain how to interpret

the one reported for education.

46. For a linear model with two explanatory variables X1 and X2, which of the

following must be incorrect? Why?

a) rY X1= 0.01, rY X2

= −0.2, R = .75

b) rY X1= 0.01, rY X2

= −0.75, R = 0.2

c) rY X1= 0.4, rY X2

= 0.4, R = 0.4

47. In Exercise 1 on Y = college GPA, X1 = high school GPA, and X2 = college

board score, E(Y ) = 0.20 + 0.50x1 + 0.002x2. True or false: Since β1 = 0.50

is larger than β2 = 0.002, this implies that X1 has the greater partial effect on

Y . Explain.

48. Table 11.20 shows results of fitting various regression models to data on Y = col-

lege GPA, X1 = high school GPA, X2 = mathematics entrance exam score, and

X3 = verbal entrance exam score. Indicate which of the following statements

are false. Give a reason for your answer.

Table 11.22:Model

Estimates E(Y ) = α + βx1 E(Y ) = α + β1x1 + β2x2 E(Y ) = α + β1x1 + β2x2 + β3x3

Coefficient of x1 0.450 0.400 0.340Coefficient of x2 0.003 0.002Coefficient of x3 0.002R2 0.25 0.34 0.38

a) The correlation between Y and X1 is positive.

b) A one-unit increase in x1 corresponds to a change of 0.45 in the estimated

mean of Y , controlling for x2 and x3.

c) The value of SSE increases as we add additional variables to the model.


d) It follows from the sizes of the estimates for the third model that X1 has

the strongest partial effect on Y .

e) The value of r2Y X3

is 0.40.

f) The partial correlation rY X1·X2is positive.

g) The partial correlation rY X1·X3could be negative.

h) Controlling for X1, a 100-unit increase in X2 corresponds to a predicted

increase of 0.3 in college GPA.

i) For the first model, the estimated standardized regression coefficient equals

0.50.

49. In regression analysis, which of the following statements must be false? Why?

a) For the model E(Y ) = α + β1x1, Y is significantly related to x1 at the 0.05

level, but when x2 is added to the model, Y is not significantly related to x1 at

the 0.05 level.

b) The estimated coefficient of x1 is positive in the bivariate model, but negative

in the multiple regression model.

c) When the model is refitted after Y is multiplied by 10, R2, rY X1, rY X1·X2

, b∗1,

the F statistics and t statistics do not change.

d) rY X2·X1cannot exceed rY X2

.

e) The F statistic for testing that all the regression coefficients equal 0 has

P < 0.05, but none of the individual t tests have P < 0.05.

f) If you compute the standardized regression coefficient for a bivariate model,

you always get the correlation.

g) r2Y X1

= r2Y X2

= 0.6 and R2 = 0.6.

h) r2Y X1

= r2Y X2

= 0.6 and R2 = 1.2.

i) The correlation between Y and Y equals −0.10.

j) If x3 is added to a model already containing x1 and x2, then if the prediction

equation has b3 = 0, R2 stays the same.

k) For every F test, there is an equivalent test using the t distribution.

For Exercises 50–54, select the correct answer(s) and indicate why the other

responses are inappropriate. (More than one response may be correct.)

50. If Y = 2 + 3x1 + 5x2 − 8x3, then controlling for x2 and x3, the predicted mean

change in Y when x1 is increased from 10 to 20 equals

a) 3 b) 30 c) 0.3 d) Cannot be given—depends on specific values of x2

and x3.

51. If Y = 2 + 3x1 + 5x2 − 8x3,

a) The strongest correlation is between Y and X3.

b) The variable with the strongest partial influence on Y is X2.

c) The variable with the strongest partial influence on Y is X3, but one cannot

tell from this equation which pair has the strongest correlation.

d) None of the above.

52. If Y = 2 + 3x1 + 5x2 − 8x3,

a) rY X3< 0


b) rY X3·X1< 0

c) rY X3·X1,X2< 0

d) Insufficient information to answer.

e) Answers (a), (b), and (c) are all correct.

53. If Y = 2 + 3x1 + 5x2 − 8x3, and H0: β3 = 0 is rejected at the 0.05 level, then

a) H0: ρY X3·X1,X2= 0 is rejected at the 0.05 level.

b) H0: ρY X3= 0 is rejected at the .05 level.

c) rY X3·X1,X2> 0

54. The F test for comparing a complete model to a reduced model

a) Can be used to test the significance of a single regression parameter in a

multiple regression model.

b) Can be used to test H0: β1 = · · · = βk = 0 in a multiple regression equation.

c) Can be used to test H0: No interaction, in the model


d) Can be used to test whether the model E(Y ) = α + β1x1 + β2x2 gives a

significantly better fit than the model E(Y ) = α + β1x1 + β2x3.

55. Explain the difference in the purposes of the correlation, the multiple correla-

tion, and the partial correlation.

56. Let Y = height, X1 = length of right leg, X2 = length of left leg. Describe

what you expect for the relative sizes of the three pairwise correlations, R, and

rY X2·X1.

57. Give an example of three variables for which you expect β 6= 0 in the model

E(Y ) = α + βx1 but β1 = 0 in the model E(Y ) = α + β1x1 + β2x2.

58. For the models E(Y ) = α + βx and E(Y ) = α + β1x1 + β2x2, express null

hypotheses in terms of correlations that are equivalent to the following:

a) H0: β = 0

b) H0: β1 = β2 = 0

c) H0: β2 = 0

59. * Whenever X1 and X2 are uncorrelated, then R2 for the model E(Y ) = α +

β1x1 + β2x2 satisfies R2 = r2Y X1

+ r2Y X2

. In this case, draw a figure that

portrays the variability in Y , the part of that variability explained by each of

X1 and X2, and the total variability explained by both of them together.

60. * Which of the following sets of correlations would you expect to yield the

highest R2 value? Why?

a) rY X1= 0.4, rY X2

= 0.4, rX1X2= 0.0

b) rY X1= 0.4, rY X2

= 0.4, rX1X2= 0.5

c) rY X1= 0.4, rY X2

= 0.4, rX1X2= 1.0


61. * Suppose the correlation between Y and X1 equals the multiple correlation

between Y and X1 and X2. What does this imply about the partial correlation

rY X2·X1? Interpret.

62. * Software reports four types of sums of squares in multiple regression models.

The Type I (sometimes called sequential) sum of squares represents the vari-

ability explained by a variable, controlling for variables previously entered into

the model. The Type III (sometimes called partial) sum of squares represents

the variability explained by that variable, controlling for all other variables in

the model.

a) For any multiple regression model, explain why the Type I sum of squares

for x1 is the regression sum of squares for the bivariate model with x1 as the

predictor, whereas the Type I sum of squares for x2 equals the amount by which

SSE decreases when x2 is added to the model.

b) Explain why the Type I sum of squares for the last variable entered into a

model is the same as the Type III sum of squares for that variable.

63. * The sample value of R2 tends to overestimate the population value, because

the sample data fall closer to the sample prediction equation than to the true

population regression equation. This bias is greater if n is small or the number

of predictors k is large. A somewhat better estimate is adjusted R2,

R2adj = 1 − s2

s2y

= R2 −[

k

n − (k + 1)

]

(1 − R2),

where s2 is the estimated conditional variance (i.e., the mean square error) and

s2Y is the sample variance of Y .

a) Suppose R2 = 0.339 for a model with k = 2 explanatory variables (such as

in Table 11.5). Find R2adj for the following sample sizes: 10, 40 (as in the text

example), 100, and 1000. Show that R2adj approaches R2 in value as n increases.

b) Show that R2adj is negative when R2 < k/(n − 1). This is undesireable, and

R2adj is equated to 0 in such cases. (Also, unlike R2, R2

adj could decrease when

we add an explanatory variable to a model.)

64. * Let R2Y (X1,...Xk) denote R2 for the multiple regression model with k explana-

tory variables. Explain why

rY Xk·X1,...,Xk−1=

R2Y (X1,...,Xk) − R2

Y (X1,...,Xk−1)

1 − RY (X1,...,Xk−1).

65. * The numerator R2 − r2Y X1

of the squared partial correlation r2Y X2·X1

gives

the increase in the proportion of explained variation from adding X2 to the

model. This increment, denoted by r2Y (X2·X1)

, is called the squared semipar-

tial correlation. One can use squared semipartial correlations to partition the


variation in the response variable. For instance, for three explanatory variables,

R2Y (X1,X2,X3) = r2

Y X1+ (R2

Y (X1,X2) − r2Y X1

) + (R2Y (X1,X2,X3) − R2

Y (X1,X2))

= r2Y X1

+ r2Y (X2·X1) + r2

Y (X3·X1,X2).

The total variation in Y explained by X1, X2, and X3 together partitions into:

(i) the proportion explained by X1 (i.e., r2Y X1

), (ii) the proportion explained

by X2 beyond that explained by X1 (i.e., r2Y (X2·X1)), and (iii) the proportion

explained by X3 beyond that explained by X1 and X2 (i.e, r2Y (X3·X1,X2)).

These correlations have the same ordering as the t statistics for testing partial

effects, and some researchers use them as indices of importance of the predictors.

a) In Example 11.2 on mental impairment, show that r2Y (X2·X1)

= 0.20 and

r2Y (X1·X2) = 0.18. Interpret.

b) Explain why the squared semipartial correlation r2Y (X2·X1) cannot be larger

than the squared partial correlation r2Y X2·X1

.

66. * The least squares prediction equation provides predicted values Y with the

strongest possible correlation with Y , out of all possible prediction equations

of that form. That is, the least squares equation yields the best prediction

of Y in the sense that it represents the linear reduction of X1, . . . , Xk to the

single variable that is most strongly correlated with Y . Based on this property,

explain why the multiple correlation cannot decrease when one adds a variable

to a multiple regression model. (Hint: The prediction equation for the simpler

model is a special case of a prediction equation for the full model that has

coefficient 0 for the added variable.)

67. * Let b∗i denote the estimated standardized regression coefficient when Xi is

treated as the response variable and Y as an explanatory variable, controlling

for the same set of other variables. Then, b∗i need not equal b∗i . The partial

correlation between Y and Xi, which is symmetric in the order of the two

variables, satisfies

r2Y Xi·— = b∗i b∗i .

a) From this formula, explain why the partial correlation must fall between b∗iand b∗i . (Note: When a =

√bc, a is said to be the geometric average of b and

c.)

b) Even though b∗i does not necessarily fall between −1 and +1, explain why

b∗i b∗i cannot exceed 1.

68. * Chapters 12 and 13 show how to incorporate categorical predictors in regres-

sion models, and this exercise provides a preview. Table 11.21 shows part of a

printout for a model for the “house selling price 2” data set at the text website,

with Y = selling price of home, X1 = size of home, and X2 = whether the

house is new (1 = yes, 0 = no).

a) Report the prediction equation. By setting x2 = 0 and then 1, construct the


two separate lines for older and for new homes. Note that the model implies

that the slope effect of size on selling price is the same for each.

b) Since x2 takes only the values 0 and 1, explain why the coefficient of x2

estimates the difference of mean selling prices between new and older homes,

controlling for house size.

Table 11.23:

B Std. Error t Sig

(Constant) -26.089 5.977 -4.365 0.0001

SIZE 72.575 3.508 20.690 0.0001

NEW 19.587 3.995 4.903 0.0001

69. * Refer to the previous exercise. When we add an interaction term, we get

y = −16.6 + 66.6x1 − 31.8x2 + 29.4(x1x2).

a) Interpret the fit by reporting the prediction equation between selling price

and size of house separately for new homes (x2 = 1) and for old homes (x2 =

0). Interpret. (This fit is equivalent to fitting lines separately to the data for

new homes and for old homes.)

b) Interpret the fit by reporting the difference between the predicted selling

prices for new and old homes for houses with x1 equal to (i) 1.5, (ii) 2.0, (iii)

2.5.

c) A plot of the data shows an outlier, a new home with a very high selling

price. When that observation is removed from the data set and the model is

re-fitted, y = −16.6 + 66.6x1 + 9.0x2 + 5.0(x1x2). Re-do (a), and explain how

an outlier can have a large impact on a regression analysis.

11.9.1 Bibliography

DeMaris, A. (2004). Regression with Social Data: Modeling Continuous and Limited

Response Variables. Wiley.

Draper, N. R. and Smith, H. (1998). Applied Regression Analysis, 3rd ed., Wiley.

Holzer, C. E., III (1977). The Impact of Life Events on Psychiatric Symptomatology.

Ph.D. dissertation, University of Florida, Gainesville.

Kutner, M. H., Nachtsheim, C. J., and Neter, J. (2004). Applied Linear Regression

Models, 4th ed. McGraw-Hill/Irwin.

Weisberg, S. (2005). Applied Linear Regression, 3rd ed., Wiley.


Figure 11.5: A Scatterplot Matrix: Scatterplots for Pairs of Variables fromTable 11.1

((Fig. 11.5 from 3e))

Figure 11.6: Partial Regression Plot for Mental Impairment and Life Events,Controlling for SES. This plots the residuals from regressing mental impairmenton SES against the residuals from regressing life events on SES.

((Include new Fig. 11.6))

Figure 11.7: Partial Regression Plot for Mental Impairment and SES, Control-ling for Life Events. This plots the residuals from regressing mental impairmenton life events against the residuals from regressing SES on life events.

((Include new Fig. 11.7))

Figure 11.8: R2 Does Not Increase Much When x3 Is Added to the Model

Already Containing x1 and x2

((Fig. 11.8 in 3e))

Figure 11.9: The F Distribution and the P -Value for F Tests. Larger F valuesgive stronger evidence against H0.

((Fig. 11.9 in 3e))

Figure 11.10: Portrayal of Interaction between x1 and x2 in their Effects on Y .

((Fig. 11.10 in 3e; if possible, change to lower-case letters))


Figure 11.11: Representation of r2Y X2·X1

as the Proportion of Variability ThatCan Be Explained by X2, of that Left Unexplained by X1

((Fig 11.11 in 3e))

Figure 11.12:

-----+----+----+----+----+----+----+----+----+----+----+----+-

| 1 |

| |

| |

C | 1 1 |

R 100 + 1 1 +

I | 1 2 1 |

M | 1 1 |

E | 1 1 1 |

| 1 11 11 1 1 |

| 11 1 1 1 11 1 1 1 |

50 + 1 1 1 2 1 1 +

| 1 1 11 1 1 1 |

| 1 3 1 1 1 1 |

| 1 1 1 |

| 1 1 2 1 |

| 1 2 11 |

0 + 1 +

-----+----+----+----+----+----+----+----+----+----+----+----+--

14 16 18 20 22 24 26 28 30 32 34 36

INCOME


Figure 11.13:

Partial Regression Residual Plot

-+------+------+------+------+------+------+------+------+---

50 + 1 +

| 1 |

| 1 |

40 + +

| 1 1 |

| 1 1 |

30 + +

| 1 1 |

| 1 1 1 |

20 + 1 1 1 +

| 2 |

CRIME | 1 2 |

10 + 1 1 1 1 |

| 1 1 1 |

| 1 |

0 + 1 +

| 1 1 1 1 1 |

| 1 1 1 |

-10 + 12 1 1 1 +

| 1 1 1 1 1 1 1 1 |

| 11 1 1 1 2 |

-20 + 1 +

| 2 |

| 1 1 |

-30 + 1 +

| 1 1 |

| |

-40 + +

-+------+------+------+------+------+------+------+------+---

-6 -4 -2 0 2 4 6 8 10

INCOME

Date post:	28-Nov-2014
Category:	Documents
Upload:	chirag-patel
View:	34 times
Download:	0 times

ch11

Documents