Date post: | 28-Nov-2014 |
Category: |
Documents |
Upload: | chirag-patel |
View: | 34 times |
Download: | 0 times |
438 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Chapter 9 introduced regression modeling of the relationship between two quantitative
variables. Multivariate relationships require more complex models, containing several
explanatory variables. Some of these may be predictors of theoretical interest, and
some may be control variables.
To predict Y = college GPA, for example, it is sensible to use several predictors
in the same model. Possibilities include X1 = high school GPA, X2 = math college
entrance exam score, X3 = verbal college entrance exam score, and X4 = rating by
high school guidance counselor. This chapter presents models for the relationship
between a response variable Y and a collection of explanatory variables.
A multivariable model provides better predictions of Y than does a model with
a single explanatory variable. Such a model also can analyze relationships between
variables while controlling for other variables. This is important because Chapter 10
showed that after controlling for a variable, an association can appear quite different
from when the variable is ignored. Thus, this model provides information not available
with simple models that analyze only two variables at a time.
Sections 11.1 and 11.2 extend the regression model to a multiple regression
model that can have multiple explanatory variables. Section 11.3 defines correlation
and r-squared measures that describe association between Y and a set of explanatory
variables. Section 11.4 presents inference procedures for multiple regression. Section
11.5 shows how to allow statistical interaction in the model, and Section 11.6 presents
a test of whether a complex model provides a better fit than a simpler model. The
final two sections introduce measures that summarize the association between the
response variable and an explanatory variable while controlling other variables.
11.1 The Multiple Regression Model
Chapter 9 modeled the relationship between the explanatory variable X and the mean
of the response variable Y by the straight-line (linear) equation E(Y ) = α + βx. We
refer to this model containing a single predictor as a bivariate model, because it
contains only two variables.
The Multiple Regression Function
Suppose there are two explanatory variables, denoted by X1 and X2. As in earlier
chapters, we use lower-case letters to denote observations or particular values of the
variables. The bivariate regression function generalizes to the multiple regression
function
E(Y ) = α + β1x1 + β2x2.
In this equation, α, β1, and β2 are parameters discussed below. For particular values
of x1 and x2, the equation specifies the population mean of Y for all subjects with
those values of x1 and x2. When there are additional explanatory variables, each has
a βx term, for example E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x4 with four predictors.
The multiple regression function is more difficult to portray graphically than the
bivariate regression function. With two explanatory variables, the x1 and x2 axes are
11.1. THE MULTIPLE REGRESSION MODEL 439
Figure 11.1: Graphical Depiction of a Multiple Regression Function with TwoExplanatory Variables
((Fig. 11.1 in 3e))
perpendicular but lie in a horizontal plane and the Y axis is vertical and perpendicular
to both the x1 and x2 axes. The equation E(Y ) = α + β1x1 + β2x2 traces a plane (a
flat surface) cutting through three-dimensional space, as Figure 11.1 portrays.
The simplest interpretation treats all but one explanatory variable as control vari-
ables and fixes them at particular levels. This leaves an equation relating the mean
of Y to the remaining explanatory variable.
Example 11.1 Do Higher Levels of Education Cause Higher Crime Rates?
Exercise 39 in Chapter 9 contains recent data on several variables for the 67
counties in the state of Florida. For each county, let Y = crime rate (annual number
of crimes per 1000 population), X1 = education (percentage of adult residents having
at least a high school education), and X2 = urbanization (percentage living in an
urban environment).
The bivariate relationship between crime rate and education is approximated by
E(Y ) = −51.3 + 1.5x1. Surprisingly, the association is moderately positive, the cor-
relation being r = 0.47. As the percentage of county residents having at least a high
school education increases, so does the crime rate.
A closer look at the data reveals strong positive associations between crime rate
and urbanization (r = 0.68) and between education and urbanization (r = 0.79).
This suggests that the association between crime rate and education may be spurious.
Perhaps urbanization is a common causal factor. See Figure 11.2. As urbanization
increases, both crime rate and education increase, resulting in a positive correlation
between crime rate and education.
The relation between crime rate and both predictors considered together is ap-
proximated by the multiple regression function
E(Y ) = 58.9 − 0.6x1 + 0.7x2.
For instance, the expected crime rate for a county at the mean levels of education
Figure 11.2: The Positive Association Between Crime Rate and Education MayBe Spurious, Explained by the Effects of Urbanization on Each
((Fig. 11.2 in 3e))
440 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
(x1 = 70) and urbanization (x2 = 50) is E(Y ) = 58.9− 0.6(70) + 0.7(50) = 52 annual
crimes per 1000 population.
Let’s study the effect of x1, controlling for x2. We first set x2 at its mean level of
50. Then, the relationship between crime rate and education is
E(Y ) = 58.9 − 0.6x1 + 0.7(50) = 58.9 − 0.6x1 + 35.0 = 93.9 − 0.6x1.
Figure 11.3 plots this line. Controlling for x2 by fixing it at 50, the relationship be-
tween crime rate and education is negative, rather than positive. The slope decreased
and changed sign from 1.5 in the bivariate relationship to −0.6. At this fixed level of
urbanization, a negative relationship exists between education and crime rate. We use
the term partial regression equation to distinguish the equation E(Y ) = 93.9 − 0.6x1
from the regression equation E(Y ) = −51.3 + 1.5x1 for the bivariate relationship
between Y and x1. The partial regression equation refers to part of the potential
observations, in this case counties having x2 = 50.
Figure 11.3: Partial Relationships Between E(Y ) and x1 for the Multiple Re-gression Equation E(Y ) = 58.9− 0.6x1 + 0.7x2. These partial regression equa-tions fix x2 to equal 50 or 40.
((Fig. 11.3 in 3e))
Next we fix x2 at a different level, say x2 = 40 instead of 50. Then, you can check
that E(Y ) = 86.9 − 0.6x1. Thus, decreasing x2 by 10 units shifts the partial line
relating Y to x1 downward by 10β2 = 7.0 units (see Figure 11.3). The slope of −0.6
for the partial relationship remains the same, so the line is parallel to the original one.
Setting x2 at a variety of values yields a collection of parallel lines, each having slope
β1 = −0.6.
Similarly, setting x1 at a variety of values yields a collection of parallel lines,
each having slope 0.7, relating the mean of Y to x2. In other words, controlling for
education, the slope of the partial relationship between crime rate and urbanization
is β2 = 0.7.
In summary, education has an overall positive effect on crime rate, but it has
a negative effect when controlling for urbanization. The partial association has the
opposite direction from the bivariate association. This is called Simpson’s para-
dox. Figure 11.4 illustrates how this happens. It shows the scatterplot relating crime
rate to education, portraying the overall positive association between these variables.
The diagram circles the 19 counties that are highest in urbanization. That subset of
points for which urbanization is nearly constant has a negative trend between crime
rate and education. The high positive association between education and urbanization
is reflected by the fact that most of the highlighted observations that are highest on
urbanization also have high values on education.
2
11.1. THE MULTIPLE REGRESSION MODEL 441
Figure 11.4: Scatterplot Relating Crime Rate and Education. The circled pointsare the counties highest on Urbanization. A regression line fitting the circledpoints has negative slope, even though the regression line passing through all
the points has positive slope (Simpson’s paradox).
((Fig. 11.4 in 3e))
Interpretation of Regression Coefficients
We have seen that for a fixed value of x2, the equation E(Y ) = α + β1x1 + β2x2
simplifies to a straight-line equation in x1 with slope β1. The slope is the same for
each fixed value of x2. When we fix the value of x2, we are holding it constant: We are
controlling for x2. That’s the basis of the major difference between the interpretation
of slopes in multiple regression and in bivariate regression:
• In multiple regression, a slope describes the effect of an explanatory variable
while controlling effects of the other explanatory variables in the model.
• Bivariate regression has only a single explanatory variable. So, a slope in bi-
variate regression describes the effect of that variable while ignoring all other
possible explanatory variables.
The parameter β1 measures the partial effect of x1 on Y , that is, the effect of a
one-unit increase in x1, holding x2 constant. The partial effect of x2 on Y , holding
x1 constant, has slope β2. Similarly, for the multiple regression model with several
predictors, the beta coefficient of a predictor describes the change in the mean of Y for
a one-unit increase in that predictor, controlling for the other variables in the model.
The parameter α represents the mean of Y when each explanatory variable equals 0.
The parameters β1, β2, . . . are called partial regression coefficients. The
adjective partial distinguishes these parameters from the regression coefficient β in
the bivariate model E(Y ) = α + βx, which ignores rather than controls effects of
other explanatory variables.
This multiple regression model assumes that the slope of the partial relationship
between Y and each predictor is identical for all combinations of values of the other
explanatory variables. This means that the model is appropriate when there is no
statistical interaction, in the sense of Section 10.3. If the true partial slope between
Y and x1 is very different at x2 = 50 than at x2 = 40, for example, we need a more
complex model. Section 11.5 will show this model.
A partial slope in a multiple regression model usually differs from the slope in the
bivariate model for that predictor, but it need not. With two predictors, the partial
slopes and bivariate slopes are equal if the correlation between X1 and X2 equals 0.
When X1 and X2 are independent causes of Y , the effect of X1 on Y does not change
when we control for X2.
442 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Prediction Equation and Residuals
Corresponding to the multiple regression equation, software finds a prediction equation
by estimating the model parameters using sample data. For simplicity of notation, so
far we’ve used just two predictors. In general, let k denote the number of predictors.
Notation for Prediction Equation
The prediction equation that estimates the multiple regression equation E(Y ) =
α + β1x1 + β2x2 + · · · + βkxk is denoted by y = a + b1x1 + b2x2 + · · · + bkxk.
For multiple regression, it is almost imperative to use computer software to find the
prediction equation. The calculation formulas are complex and are not shown in this
text.
We get the predicted value of Y for a subject by substituting the x-values for that
subject into the prediction equation. Like the bivariate model, the multiple regression
model has residuals that measure prediction errors. For a subject with predicted
response y and observed response y, the residual is y − y. The next section shows an
example.
The sum of squared errors (SSE),
SSE =∑
(y − y)2
summarizes the closeness of fit of the prediction equation to the response data. Most
software calls SSE the residual sum of squares. The formula for SSE is the same
as in Chapter 9. The only difference is that the predicted value y results from using
several explanatory variables instead of just a single predictor.
The parameter estimates in the prediction equation satisfy the least squares
criterion: The prediction equation has the smallest SSE value of all possible equations
of form Y = a + b1x1 + · · · + bkxk.
11.2 Example with Multiple Regression Com-
puter Output
We illustrate the methods of this chapter with the data introduced in the following
example:
Example 11.2 Multiple Regression for Mental Health Study
A study in Alachua County, Florida, investigated the relationship between certain
mental health indices and several explanatory variables. Primary interest focused on
an index of mental impairment, which incorporates various dimensions of psychiatric
symptoms, including aspects of anxiety and depression. This measure, which is the
response variable Y , ranged from 17 to 41 in the sample. Higher scores indicate
greater psychiatric impairment.
The two explanatory variables used here are X1 = life events score and X2 =
socioeconomic status (SES). The life events score is a composite measure of both
the number and severity of major life events the subject experienced within the past
three years. These events range from severe personal disruptions such as a death in the
family, a jail sentence, or an extramarital affair, to less severe events such as getting
a new job, the birth of a child, moving within the same city, or having a child marry.
This measure1 ranged from 3 to 97 in the sample. A high score represents a greater
number and/or greater severity of these life events. The SES score is a composite
index based on occupation, income, and education. Measured on a standard scale, it
ranged from 0 to 100. The higher the score, the higher the status.
Table 11.1 shows data on the three variables for a random sample of 40 adults in
the county. [These data are based on a larger survey. The authors thank Dr. Charles
Holzer for permission to use the study as the basis of this example.] Table 11.2
summarizes the sample means and standard deviations of the three variables.
Table 11.1: Scores on Y = Mental Impairment, X1 =Life Events, and X2 = Socioeconomic Status
Y X1 X2 Y X1 X2 Y X1 X2
17 46 84 26 50 40 30 44 5319 39 97 26 48 52 31 35 3820 27 24 26 45 61 31 95 2920 3 85 27 21 45 31 63 5320 10 15 27 55 88 31 42 721 44 55 27 45 56 32 38 3221 37 78 27 60 70 33 45 5522 35 91 28 97 89 34 70 5822 78 60 28 37 50 34 57 1623 32 74 28 30 90 34 40 2924 33 67 28 13 56 41 49 324 18 39 28 40 56 41 89 7525 81 87 29 5 4026 22 95 30 59 72
Table 11.2: Estimated Means and Stan-dard Deviations of Mental Impairment, LifeEvents, and Socioeconomic Status (SES)
Variable Mean Standard Deviation
Mental Impairment 27.30 5.46Life Events 44.42 22.62SES 56.60 25.28
1Developed by E. Paykel et al., Arch. Gen. Psychiatry, vol. 75, 1971, pp. 340–347.
443
444 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
2
Scatterplot Matrix for Bivariate Relationships
Plots of the data provide an informal check of whether the relationships are linear.
Most software can construct scatterplots on a single diagram for each pair of the
variables. Figure 11.5 shows the plots for the variables from Table 11.1. This type of
plot is called a scatterplot matrix. Like a correlation matrix, it shows each pair of
variables twice. In one plot, a variable is on the y-axis and in one it is on the x-axis.
Mental impairment (the response variable) is on the y-axis for the plots in the first
row of Figure 11.5, so these are the plots of interest to us. The plots show no evidence
of nonlinearity, and models with linear effects seem appropriate. The plots suggest
that life events has a mild positive effect and SES has a mild negative effect on mental
impairment.
Partial Plots for Partial Relationships
The multiple regression model states that each predictor has a linear effect with com-
mon slope, controlling for the other predictors. To check this, we could use software to
plot Y versus each predictor, for subsets of points that are nearly constant on the other
predictors. With a single control variable, for example, we could sort the observations
into four groups using the quartiles as boundaries, and then either construct four sep-
arate scatterplots or mark the observations on a single scatterplot according to their
group. With several control variables, however, keeping them all nearly constant can
reduce the sample to relatively few observations. A more informative single picture is
provided by the partial regression plot. It displays the relationship between the
response variable and an explanatory variable after removing the effects of the other
predictors in the multiple regression model. It does this by plotting the residuals from
models using these two variables as responses and the other explanatory variables as
predictors.
For example, here’s how to find the partial regression plot for the effect of x1
when the multiple regression model also has explanatory variables x2 and x3. Find
the residuals from the model using x2 and x3 to predict Y . Also find the residuals
from the model using x2 and x3 to predict x1. Then plot the residuals from the
first analysis (on the y-axis) against the residuals from the second analysis. For these
residuals, the effects of x2 and x3 are removed. The least squares slope for the points
in this plot is necessarily the same as the estimated partial slope b1 for the multiple
regression model.
Figure 11.6 shows a partial regression plot (from SPSS) for Y = mental impairment
and x1 = life events, controlling for x2 = SES. It plots the residuals on the y-axis
from the model E(Y ) = α + βx2 against the residuals on the x-axis from the model
E(X1) = α + βx2. Both axes have negative and positive values, because they refer
to residuals. Recall that residuals (prediction errors) can be positive or negative,
and have a mean of 0. Figure 11.6 suggests that the partial effect of life events is
approximately linear and is positive.
11.2. EXAMPLE WITH MULTIPLE REGRESSION COMPUTER OUTPUT445
Figure 11.7 shows the partial regression plot for SES. It shows that its partial effect
is also approximately linear but is negative. It is simple to obtain partial regression
plots with standard software such as SPSS. (See the appendix.)
Sample Computer Printouts
Tables 11.3 and 11.4 are SPSS printouts of the coefficients table for the bivariate re-
lationships between mental impairment and the separate explanatory variables. The
estimated regression coefficients fall in the column labelled ‘B’. The prediction equa-
tions are
y = 23.31 + 0.090x1 and y = 32.17 − 0.086x2.
In the sample, mental impairment is positively related to life events, since the coeffi-
cient of x1 (0.090) is positive. The greater the number and severity of life events in the
previous three years, the higher the mental impairment (i.e., the poorer the mental
health) tends to be. Mental impairment is negatively related to socioeconomic status.
The greater the SES level, the lower the mental impairment tends to be. The corre-
lations between mental impairment and the explanatory variables are modest, 0.372
for life events and -0.399 for SES (listed by SPSS as ‘Standardized coefficients’; the
‘beta’ label is misleading and refers to the alternate term beta weights for standardized
regression coefficients).
Table 11.3: Bivariate Regression Analysis for Y = Mental Impairment (IMPAIR)and x1 = Life Events (LIFE)
Coefficients(a)
Model Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
1 (Constant) 23.309 1.807 12.901 .000
LIFE .090 .036 .372 2.472 .018
a Dependent Variable: IMPAIR
Table 11.5 shows part of a SPSS printout for the multiple regression model E(Y ) =
α + β1x1 + β2x2. The prediction equation is
Y = a + b1x1 + b2x2 = 28.230 + 0.103x1 − 0.097x2.
Controlling for SES, the sample relationship between mental impairment and life
events is positive, since the coefficient of life events (b1 = 0.103) is positive. The
estimated mean of mental impairment increases by about 0.1 for every 1-unit increase
in the life events score, controlling for SES. Since b2 = −0.097, a negative association
exists between mental impairment and SES, controlling for life events. For example,
446 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.4: Bivariate Regression Analysis for Y = Mental Impairment and x2
= Socioeconomic Status (SES)
Coefficients(a)
Model Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
1 (Constant) 32.172 1.988 16.186 .000
SES -.086 .032 -.399 -2.679 .011
a Dependent Variable: IMPAIR
over the 100-unit range of potential SES values (from a minimum of 0 to a maximum
of 100), the estimated mean mental impairment changes by 100(−0.097) = −9.7.
Since mental impairment ranges only from 17 to 41 with a standard deviation of 5.5,
a decrease of 9.7 points in the mean is noteworthy.
Table 11.5: Fit of Multiple Regression Model for Y = Mental Impairment, x1 =Life Events (LIFE), and x2 = Socioeconomic Status (SES)
Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
(Constant) 28.230 2.174 12.984 .000
LIFE .103 .032 .428 3.177 .003
SES -.097 .029 -.451 -3.351 .002
Dependent Variable: IMPAIR
From Table 11.1, the first subject in the sample had y = 17, x1 = 46, and x2 = 84.
This subject’s predicted mental impairment is
y = 28.230 + 0.103(46) − 0.097(84) = 24.8.
The prediction error (residual) is y − y = 17 − 24.8 = −7.8.
Table 11.6 summarizes some results of the regression analyses. It shows standard
errors in parentheses below the parameter estimates. The partial slopes for the mul-
tiple regression model are similar to the slopes for the bivariate models. In each case,
the introduction of the second predictor does little to alter the effect of the other
11.3. MULTIPLE CORRELATION AND R2 447
one. This suggests that these predictors may have nearly independent sample effects
on Y . In fact, the sample correlation between X1 and X2 is only 0.123. The next
section shows how to measure the joint association of the explanatory variables with
the response variable, and shows how to interpret the R2 value listed for the multiple
regression model.
Table 11.6: Summary of Regression Models forMental Impairment
Predictors in Regression ModelEffect Multiple Life Events SES
Intercept 28.230 23.309 32.172Life events 0.103 0.090 —
(0.032) (0.036)SES -0.097 — -0.086
(0.029) (0.032)
R2 0.339 0.138 0.159(n) (40) (40) (40)
11.3 Multiple Correlation and R2
The correlation r and its square describe strength of linear association for bivariate
relationships. This section presents analogous measures for the multiple regression
model. They describe the strength of association between Y and the set of explanatory
variables acting together as predictors in the model.
The Multiple Correlation
The explanatory variables collectively are strongly associated with Y if the observed y-
values correlate highly with the y-values from the prediction equation. The correlation
between the observed and predicted values summarizes this association.
Multiple Correlation
The multiple correlation for a regression model is the correlation between the ob-
served y-values and the predicted y-values.
For each subject, the prediction equation provides a predicted value y. So, each
subject has a y-value and a y-value. For example, above we saw that the first subject
in the sample had y = 17 and y = 24.8. For the first three subjects in Table 11.1, the
observed and predicted y-values are:
448 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
y y
17 24.8
19 22.8
20 28.7The sample correlation computed between the y- and y-values is the multiple corre-
lation. It is denoted by R.
The predicted values cannot correlate negatively with the observed values. The
predictions must be at least as good as the sample mean y, which is the prediction
when all partial slopes = 0, and y has zero correlation with y. So, R always falls
between 0 and 1. In this respect, the correlation between y and y differs from the
correlation between y and a predictor x, which falls between −1 and +1. The larger
the multiple correlation R, the better the predictions of y by the set of explanatory
variables.
R2: The Coefficient of Multiple Determination
Another measure uses the proportional reduction in error concept, generalizing r2 for
bivariate models. This measure summarizes the relative improvement in predictions
using the prediction equation instead of y. It has the following elements:
Rule 1 (Predict Y without using x1, . . . , xk): The best predictor is then the sample
mean, y.
Rule 2 (Predict Y using x1, . . . , xk): The best predictor is the prediction equation
y = a + b1x1 + b2x2 + · · · + bkxk.
Prediction Errors: The prediction error for a subject is the difference between
the observed and predicted values of y. With rule 1, the error is y − y. With rule 2,
it is the residual y − y. In either case, we summarize the error by the sum of the
squared prediction errors. For rule 1, this is TSS =∑
(y − y)2, called the total sum
of squares. For rule 2, it is SSE =∑
(y − y)2, the sum of squared errors using the
prediction equation, called the residual sum of squares.
Definition of Measure: The proportional reduction in error from using the pre-
diction equation y = a + b1x1 + · · · + bkxk instead of y to predict y is called the
coefficient of multiple determination , or for simplicity, R-squared.
R-squared: The Coefficient of Multiple Determination
R2 =TSS − SSE
TSS=
∑
(y − y)2 −∑
(y − y)2
∑
(y − y)2
R2 measures the proportion of the total variation in y that is explained by the
predictive power of all the explanatory variables, through the multiple regression
model. The symbol reflects that it is the square of the multiple correlation. The
uppercase notation R2 distinguishes this measure from r2 for the bivariate model.
11.3. MULTIPLE CORRELATION AND R2 449
Their formulas are identical, and r2 is the special case of R2 applied to a regression
model with one explanatory variable. For the multiple regression model to be useful
for prediction, it should provide improved predictions relative not only to y but also
to the separate bivariate models for y and each explanatory variable.
Example 11.3 Multiple Correlation and R2 for Mental Impairment
For the data on Y = mental impairment, X1 = life events, and X2 = so-
cioeconomic status, introduced in Example 11.2, the prediction equation is y =
28.23 + 0.103x1 − 0.097x2. Table 11.5 showed some output for this model. Soft-
ware also reports ANOVA tables with sums of squares and R and R2 tables. Table
11.7 shows some SPSS output.
Table 11.7: ANOVA Table and Model Summary for Regression of Mental Impair-ment (IMPAIR) on Life Events (LIFE) and Socioeconomic Status (SES)
ANOVA
Sum of
Squares df Mean Square F Sig.
Regression 394.238 2 197.119 9.495 .000
Residual 768.162 37 20.761
Total 1162.400 39
Model Summary
R R Square Adjusted R Square Std. Error of the Estimate
.582 .339 .303 4.556
Predictors: (Constant), SES, LIFE
Dependent Variable: IMPAIR
From the ‘Sum of Squares’ column, the total sum of squares is TSS =∑
(y− y)2 =
1162.4, and the residual sum of squares from using the prediction equation to predict
y is SSE =∑
(y − y)2 = 768.2. Thus,
R2 =TSS − SSE
TSS=
1162.4 − 768.2
1162.4= 0.339.
Using life events and SES together to predict mental impairment provides a 33.9%
reduction in the prediction error relative to using only y. The multiple regression
model provides a substantially larger reduction in error than either bivariate model
(Table 11.6 reported r2 values of 0.138 and 0.159 for them). It is more useful than
those models for predictive purposes.
The multiple correlation between mental impairment and the two explanatory
variables is R = +√
0.339 = 0.582. This equals the correlation between the observed
y- and predicted y-values for the model.
450 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
SPSS reports R and R2 in a separate ‘Model Summary’ table, as Table 11.7 shows.
Most software also reports an adjusted version of R2 that is a less biased estimate
of the population value. Exercise 63 defines this measure, and Table 11.7 reports its
value of 0.303.
2
Properties of R and R2
The properties of R2 are similar to those of r2 for bivariate models.
• R2 falls between 0 and 1.
• The larger the value of R2, the better the set of explanatory variables (x1, . . . , xk)
collectively predict y.
• R2 = 1 only when all the residuals are 0, that is, when all y = y, so that
SSE = 0. In that case, the prediction equation passes through all the data
points.
• R2 = 0 when the predictions do not vary as any of the x-values vary. In that
case, b1 = b2 = · · · = bk = 0, and y is identical to y, since the explanatory
variables do not add any predictive power. When this happens, the correlation
between y and each explanatory variable equals 0.
• R2 cannot decrease when we add an explanatory variable to the model. It is
impossible to explain less variation in y by adding explanatory variables to a
regression model.
• R2 for the multiple regression model is at least as large as the r2-values for the
separate bivariate models. That is, R2 for the multiple regression model is at
least as large as r2Y X1
when Y as a linear function of x1, r2Y X2
when Y as a
linear function of x2, and so forth.
Properties of the multiple correlation R follow directly from the ones for R2, since
R is the positive square root of R2. For instance, the multiple correlation for the
model E(Y ) = α + β1x1 + β2x2 + β3x3 is at least as large as the multiple correlation
for the model E(Y ) = α + β1x1 + β2x2.
The numerator of R2, TSS − SSE, summarizes the variation in Y explained by
the multiple regression model. This difference, which equals∑
(y − y)2, is called the
regression sum of squares. The ANOVA table in Table 11.7 lists the regression
sum of squares as 394.2. (Some software, such as SAS, labels this the ‘Model’ sum
of squares.) The total sum of squares TSS of the y-values about y partitions into
the variation explained by the regression model (regression sum of squares) plus the
variation not explained by the model (the residual sum of squares, SSE).
11.4. INFERENCES FOR MULTIPLE REGRESSION COEFFICIENTS 451
Multicollinearity with Many Explanatory Variables
When there are many explanatory variables but the correlations among them are
strong, once you have included a few of them in the model, R2 usually doesn’t increase
much more when you add additional ones. For example, for the “house selling price”
data set at the text website (introduced in Example 9.10), r2 is 0.71 with the house’s
tax assessment as a predictor of selling price. Then, R2 increases to 0.77 when we
add house size as a second predictor. But then it increases only to 0.79 when we
add number of bathrooms, number of bedrooms, and whether the house is new as
additional predictors.
When R2 does not increase much, this does not mean that the additional variables
are uncorrelated with Y . It means merely that they don’t add much new power for
predicting Y , given the values of the predictors already in the model. These other
variables may have small associations with Y , given the variables already in the model.
This often happens in social science research when the explanatory variables are highly
correlated, no one having much unique explanatory power. Section 14.3 discusses this
condition, called multicollinearity .
Figure 11.8, which portrays the portion of the total variability in Y explained by
each of three predictors, shows a common occurrence. The size of the set for a predictor
in this figure represents the size of its r2-value in predicting Y . The amount a set for
a predictor overlaps with the set for another predictor represents its association with
that predictor. The part of the set for a predictor that does not overlap with other
sets represents the part of the variability in Y explained uniquely by that predictor.
In Figure 11.8, all three predictors have moderate associations with Y , and together
they explain considerable variation. Once x1 and x2 are in the model, however, x3
explains little additional variation in Y , because of its strong correlations with x1 and
x2. Because of this overlap, R2 increases only slightly when x3 is added to a model
already containing x1 and x2.
For predictive purposes, we gain little by adding explanatory variables to a model
that are strongly correlated with ones already in the model, since R2 will not increase
much. Ideally, we should use explanatory variables having weak correlations with
each other but strong correlations with Y . In practice, this is not always possible,
especially if we want to include certain variables in the model for theoretical reasons.
In practice, the sample size you need to do a multiple regression well gets larger
when you want to use more explanatory variables. Technical difficulties caused by
multicollinearity are less severe for larger sample sizes. Ideally, the sample size should
be at least about 10 times the number of explanatory variables (for example, at least
about 40 for 4 explanatory variables).
11.4 Inferences for Multiple Regression Coeffi-
cients
The multiple regression function
E(Y ) = α + β1x1 + · · · + βkxk
452 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
describes the relationship between the explanatory variables and the mean of the
response variable. For particular values of the explanatory variables, α+β1x1 + · · ·+βkxk represents the mean of Y for the population having those values.
To make inferences about the parameters, we formulate the entire multiple regres-
sion model. This consists of this equation together with a set of assumptions:
• The population distribution of Y is normal, for each combination of values of
x1, . . . , xk.
• The standard deviation, σ, of the conditional distribution of responses on Y is
the same at each combination of values of x1, . . . , xk.
• The sample is randomly selected.
Under these assumptions, the true sampling distributions exactly equal those
quoted in this section. In practice, the assumptions are never satisfied perfectly.
Two-sided inferences are robust to the normality and common σ assumptions. More
important are the assumptions of randomization and that the regression function de-
scribes well how the mean of Y depends on the explanatory variables. We’ll see ways
to check the latter assumption in Section 14.2.
Two types of significance tests are used in multiple regression. The first is a global
test of independence. It checks whether any of the explanatory variables are statisti-
cally related to Y . The second studies the partial regression coefficients individually,
to assess which explanatory variables have significant partial effects on Y .
Testing the Collective Influence of the Explanatory Vari-
ables
Do the explanatory variables collectively have a statistically significant effect on the
response variable? We check this by testing
H0 : β1 = β2 = · · · = βk = 0.
This states that the mean of Y does not depend on the values of x1, . . . , xk. Under
the inference assumptions, this states that Y is statistically independent of all k
explanatory variables.
The alternative hypothesis is
Ha : At least one βi 6= 0.
This states that at least one explanatory variable is related to Y , controlling for the
others. The test judges whether using x1, · · · , xk together to predict y, with the
prediction equation y = a + b1x1 + · · · + bkxk, is better than using y.
These hypotheses about {βi} are equivalent to
H0 : Population multiple correlation = 0 Ha : Population multiple correlation > 0.
11.4. INFERENCES FOR MULTIPLE REGRESSION COEFFICIENTS 453
The equivalence occurs because the multiple correlation equals 0 only in those situa-
tions in which all the partial regression coefficients equal 0. Also, H0 is equivalent to
H0: population R-squared = 0.
For these hypotheses about the k predictors, the test statistic equals
F =R2/k
(1 − R2)/[n − (k + 1)].
The sampling distribution of this statistic is called the F distribution . We next
study this distribution and its properties.
The F Distribution
The symbol for the F test statistic and its distribution honors the most eminent
statistician in history, R. A. Fisher, who discovered the F distribution in 1922. Like
the chi-squared distribution, the F distribution can take only nonnegative values and
it is somewhat skewed to the right. Figure 11.9 illustrates.
The shape of the F distribution is determined by two degrees of freedom terms,
denoted by df1 and df2:
df1 = k, the number of explanatory variables in the model.
df2 = n − (k + 1) = n − number of parameters in regression equation.
The first of these, df1 = k, is the divisor of the numerator term (R2) in the F test
statistic. The second, df2 = n−(k+1), is the divisor of the denominator term (1−R2).
The number of parameters in the multiple regression model is k + 1, representing the
k beta terms and the alpha term.
The mean of the F distribution is approximately equal to 1.2 The larger the R2
value, the larger the ratio R2/(1 − R2), and the larger the F test statistic becomes.
Thus, larger values of the F test statistic provide stronger evidence against H0. Under
the presumption that H0 is true, the P -value is the probability the F test statistic
is larger than the observed F value. This is the right-tail probability under the F
distribution beyond the observed F -value, as Figure 11.9 shows.
Table D at the end of the text lists the F scores having P -values of 0.05, 0.01,
and 0.001, for various combinations of df1 and df2. This table allows us to determine
whether P > 0.05, 0.01 < P < 0.05, 0.001 < P < 0.01, or P < 0.001. Software for
regression reports the actual P -value.
Example 11.4 F Test for Mental Health Impairment Data
In Example 11.2, we used multiple regression for n = 40 observations on Y =
mental impairment, with k = 2 explanatory variables, life events and SES. The null
hypothesis that mental impairment is statistically independent of life events and SES
is H0: β1 = β2 = 0.
2It equals df2/(df2 − 2), which is usually close to 1 unless n is quite small.
454 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
In Example 11.3 we found that this model has R2 = 0.339. The F test statistic
value is
F =R2/k
(1 − R2)/[n − (k + 1)]=
0.339/2
0.661/[40 − (2 + 1)]= 9.5.
The two degrees of freedom terms for the F distribution are df1 = k = 2 and df2 =
n − (k + 1) = 40 − 3 = 37, the two divisors in this statistic.
From Table D, when df1 = 2 and df2 = 37, the F -value with right-tail probability
of 0.001 falls between 8.77 and 8.25. Since the observed F test statistic of 9.5 falls
above these two, it is farther out in the tail and has smaller tail probability than 0.001.
Thus, the P -value is P < 0.001. Part of the SPSS printout in Table 11.7 showed the
ANOVA table
Sum of Squares df Mean Square F Sig.
Regression 394.238 2 197.119 9.495 .000
Residual 768.162 37 20.761
in which we see the F statistic. The P -value, which rounded to three decimal places
is P = 0.000, appears under the heading ‘Sig’ in the ANOVA table.
This extremely small P -value provides strong evidence against H0. It suggests
that at least one of the explanatory variables is related to mental impairment. Equiv-
alently, we can conclude that the population multiple correlation and R-squared are
positive. So, we obtain significantly better predictions of y using the multiple regres-
sion equation than by using y.
2
Normally, unless the sample size is small and the associations are weak, this F
test has a small P -value. If we choose variables wisely for a study, at least one of
them should have some explanatory power.
Inferences for Individual Regression Coefficients
Suppose the P -value is small for the F test that all the regression coefficients equal 0.
This does not imply that every explanatory variable has an effect on Y (controlling
for the other explanatory variables in the model), but merely that at least one of them
has an effect. More narrowly focused analyses judge which partial effects are nonzero
and estimate the sizes of those effects. These inferences make the same assumptions as
the F test, the most important being randomization and that the regression function
describes well how the mean of Y depends on the explanatory variables.
Consider an arbitrary explanatory variable xi, with coefficient βi in the multiple
regression model. The test for its partial effect on Y has H0: βi = 0. If βi = 0,
the mean of Y is identical for all values of xi, controlling for the other explanatory
variables in the model. The alternative can be two-sided, Ha: βi 6= 0, or one-sided,
Ha: βi > 0 or Ha: βi < 0, to predict the direction of the partial effect.
The test statistic for H0: βi = 0, using sample estimate bi of βi, is
t =bi
se,
11.4. INFERENCES FOR MULTIPLE REGRESSION COEFFICIENTS 455
where se is the standard error of bi. As usual, the t test statistic takes the best
estimate (bi) of the parameter (βi), subtracts the H0 value of the parameter (0), and
divides by the standard error. The formula for se is complex, but software provides
its value. If H0 is true and the model assumptions hold, the t statistic has the t
distribution with df = n − (k + 1). The df value is the same as df2 in the F test.
It is more informative to estimate the size of a partial effect than to test whether it
is zero. Recall that βi represents the change in the mean of Y for a one-unit increase
in xi, controlling for the other variables. A confidence interval for βi is
bi ± t(se).
The t score comes from the t table, with df = n − (k + 1)2. For example, a 95%
confidence interval for the partial effect of x1 is b1 ± t.025(se).
Example 11.5 Inferences for Separate Predictors of Mental Impairment
For the multiple regression model for Y = mental impairment, X1 = life events,
and X2 = SES,
E(Y ) = α + β1x1 + β2x2
let’s consider the effect of life events. The hypothesis that mental impairment is
statistically independent of life events, controlling for SES, is H0: β1 = 0. If H0 is
true, the multiple regression equation reduces to E(Y ) = α + β2x2. If H0 is false,
then β1 6= 0 and the full model provides a better fit than the bivariate model.
Table 11.5 contained the results,
B Std. Error t Sig.
(Constant) 28.230 2.174 12.984 .000
LIFE .103 .032 3.177 .003
SES -.097 .029 -3.351 .002
This tells us that the point estimate of β1 is b1 = 0.103 and has standard error
se = 0.032. The test statistic equals
t =b1se
=0.103
0.032= 3.2.
This appears under the heading ‘t’ in the table in the row for the variable LIFE. The
statistic has df = n−(k+1) = 40−3 = 37. The P -value appears under ‘Sig’ in the row
for LIFE. It is 0.003, the probability that the t statistic exceeds 3.2 in absolute value.
There is strong evidence that mental impairment is related to life events, controlling
for SES.
A 95% confidence interval for β1 uses t0.025 = 2.026, the t-value for df = 37 having
a probability of 0.05/2 = 0.025 in each tail. This interval equals
b1 ± t0.025(se) = 0.103 ± 2.026(0.032), which is (0.04, 0.17).
Controlling for SES, we are 95% confident that the change in mean mental impairment
per one-unit increase in life events falls between 0.04 and 0.17. The interval does not
456 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
contain 0. This is in agreement with rejecting H0: β1 = 0 in favor of Ha: β1 6= 0 at
the α = 0.05 level.
Since this interval contains only positive numbers, the relationship between men-
tal impairment and life events is positive, controlling for SES. It may be simpler to
interpret the interval (0.04, 0.17) by noting that an increase of 100 units in life events
corresponds to anywhere from a 100(0.04) = 4 to a 100(0.17) = 17 unit increase in
mean mental impairment. The interval is relatively wide, because of the small sample
size.
2
How is the t test for a partial regression coefficient different from the t test of
H0: β = 0 for the bivariate model, E(Y ) = α + βx, studied in Section 9.5? That t
test evaluates whether Y and X are associated, ignoring other variables, because it
applies to the bivariate model. By contrast, the test just presented evaluates whether
variables are associated, controlling for other variables.
A note of caution: Suppose there is multicollinearity, that is, a lot of overlap
among the explanatory variables in the sense that any one is well predicted by the
others. Then, possibly none of the individual partial effects has a small P -value, even
if R2 is large and a large F statistic occurs in the global test for the βs. Any particular
variable may explain uniquely little of the variation in Y , even though together the
variables explain a lot of the variation.
Variability and Mean Squares in the ANOVA Table∗
The precision of the least squares estimates relates to the size of the conditional
standard deviation σ that measures variability of y at fixed values of the predictors.
The smaller the variability of y-values about the regression equation, the smaller the
standard errors become. The estimate of σ is
s =
√
∑
(y − y)2
n − (k + 1)=
√
SSE
df.
The degrees of freedom value is also df for t inferences for regression coefficients, and
it is df2 for the F test about the collective effect of the predictors. (When a model has
only k = 1 predictor, df simplifies to n − 2, the term in the s formula of Section 9.3.)
Part of the SPSS printout in Table 11.7 showed the ANOVA table
Sum of Squares df Mean Square F Sig.
Regression 394.238 2 197.119 9.495 .000
Residual 768.162 37 20.761
containing the sums of squares for the multiple regression model with the mental
impairment data. We see that SSE = 768.2. Since n = 40 for k = 2 predictors, we
have df = n − (k + 1) = 40 − 3 = 37 and
s =
√
SSE
df=
√
768.2
37=
√20.76 = 4.56.
11.4. INFERENCES FOR MULTIPLE REGRESSION COEFFICIENTS 457
If the conditional distributions are approximately bell-shaped, nearly all mental im-
pairment scores fall within about 14 units (3 standard deviations) of the mean specified
by the regression function.
SPSS reports the conditional standard deviation under the heading ‘Std. Error of
the Estimate’ in the Model Summary table that also has the R and R2values (See
Table 11.7). This is a poor choice of label by SPSS, because s refers to the variability
in Y -values, not the variability of a sampling distribution of an estimator.
The square of s, which estimates the conditional variance, is called the mean
square error , often abbreviated by MSE. Software shows it in the ANOVA table in
the ‘Mean Square’ column, in the row labeled ‘Residual’ (or ‘Error’ in some software).
For example, MSE = 20.76 in the above table. Some software (such as SAS) better
labels the conditional standard deviation estimate s as ‘Root MSE,’ because it is the
square root of the mean square error.
The F Statistic Is a Ratio of Mean Squares∗
An alternative formula for the F test statistic for testing H0 : β1 = · · · = βk = 0 uses
the two mean squares in the ANOVA table. Specifically,
F =Regression mean square
Residual mean square (MSE)=
197.1
20.8= 9.5.
This gives the same value as the F test statistic formula based on R2.
The regression mean square equals the regression sum of squares divided by its
degrees of freedom. The df equals k, the number of explanatory variables in the
model, which is df1 for the F test. On the printout shown above, the regression mean
square equalsRegression SS
df1=
394.2
2= 197.1.
Relationship Between F and t Statistics∗
We’ve seen that the F distribution is used to test that all partial regression coefficients
equal 0. Some regression software also lists F test statistics instead of t test statistics
for the tests about the individual regression coefficients. The two statistics are related
and have the same P -values. The square of the t statistic for testing that a partial
regression coefficient equals 0 is an F test statistic having the F distribution with
df1 = 1 and df2 = n − (k + 1).
To illustrate, in Example 11.5, for H0: β1 = 0 and Ha: β1 6= 0, the test statistic
was t = 3.18 with df = 37. Alternatively, we could use F = t2 = 3.182 = 10.1, which
has the F distribution with df1 = 1 and df2 = 37. The P -value for this F value is
0.002, the same as Table 11.5 reports for the two-sided t test.
In general, if a statistic has the t distribution with d degrees of freedom, then
the square of that statistic has the F distribution with df1 = 1 and df2 = d. A
disadvantage of the F approach is that it lacks information about the direction of the
association. It cannot be used for one-sided alternative hypotheses.
458 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
11.5 Interaction between Predictors in their Ef-
fects
The multiple regression equation
E(Y ) = α + β1x1 + β2x2 + · · · + βkxk
assumes that the partial relationship between Y and each xi is linear and that the
slope βi of that relationship is identical for all values of the other explanatory variables.
This implies a parallelism of lines relating the two variables, at various values of the
other variables, as Figure 11.3 illustrated.
This model is sometimes too simple to be adequate. Often, there is interaction,
with the relationship between two variables changing according to the value of a third
variable. Section 10.3 introduced this concept.
Interaction
For quantitative variables, interaction exists between two explanatory variables in their
effects on Y when the effect of one variable changes as the level of the other variable
changes.
For example, suppose the relationship between x1 and the mean of Y is E(Y ) =
2 + 5x1 when x2 = 0, it is E(Y ) = 4 + 15x1 when x2 = 50, and it is E(Y ) = 6 + 25x1
when x2 = 100. The slope for the partial effect of x1 changes markedly as the fixed
value for x2 changes. There is then interaction between x1 and x2 in their effects on
Y .
Cross-product Terms
A common approach for allowing interaction introduces cross-product terms of
the explanatory variables into the multiple regression model. With two explanatory
variables, the model is
E(Y ) = α + β1x1 + β2x2 + β3x1x2.
This is a special case of the multiple regression model with three explanatory variables,
in which x3 is an artificial variable created as the cross-product x3 = x1x2 of the two
primary explanatory variables.
Let’s see why this model permits interaction. Consider how Y is related to x1,
controlling for x2. We rewrite the equation in terms of x1 as
E(Y ) = (α + β2x2) + (β1 + β3x2)x1 = α′ + β′x1
where
α′ = α + β2x2 and β′ = β1 + β3x2.
So, for fixed x2, the mean of Y changes linearly as a function of x1. The slope of the
relationship is β′ = (β1 +β3x2). This depends on the value of x2. As x2 changes, the
11.5. INTERACTION BETWEEN PREDICTORS IN THEIR EFFECTS459
slope for the effect of x1 changes. In summary, the mean of Y is a linear function of
x1, but the slope of the line depends on the value of x2.
Note that now we can interpret β1 as the effect of x1 only when x2 = 0. Unless
x2 = 0 is a particular value of interest for x2, it is not particularly useful to form
confidence intervals or perform significance tests about β2 in this model.
Similarly, the mean of Y is a linear function of x2, but the slope varies according
to the value of x1. The coefficient β2 of x2 refers to the effect of x2 only at x1 = 0.
Example 11.6 Interaction Model for Mental Impairment
For the data set on Y = mental impairment, X1 = life events, and X2 = SES,
we create a third explanatory variable x3 that gives the cross product of x1 and
x2 for the 40 individuals. For the first subject, for example, x1 = 46, x2 = 84, so
x3 = 46(84) = 3864. Software makes it easy to create this variable without doing the
calculations yourself. Table 11.8 shows part of the printout for the interaction model.
The prediction equation is
y = 26.0 + 0.156x1 − 0.060x2 − 0.00087x1x2.
Table 11.8: Interaction Model for Y = Mental Impairment, X1 = Life Events, andX2 = SES
Sum of Mean
Squares DF Square F Sig
Regression 403.631 3 134.544 6.383 0.0014
Residual 758.769 36 21.077
Total 1162.400 39
R R Square
.589 .347
B Std. Error t Sig
(Constant) 26.036649 3.948826 6.594 0.0001
LIFE 0.155865 0.085338 1.826 0.0761
SES -0.060493 0.062675 -0.965 0.3409
LIFE*SES -0.000866 0.001297 -0.668 0.5087
Figure 11.10 portrays the relationship between predicted mental impairment and
life events for a few distinct SES values. For an SES score of x2 = 0, the relationship
between y and x1 is
y = 26.0 + 0.156x1 − 0.060(0) − 0.00087x1(0) = 26.0 + 0.156x1.
460 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
When x2 = 50, the prediction equation is
y = 26.0 + 0.156x1 − 0.060(50) − 0.00087(50)x1 = 23.0 + 0.113x1.
When x2 = 100, the prediction equation is
Y = 20.0 + 0.069x1.
The higher the value of SES, the smaller the slope between predicted mental impair-
ment and life events, and so the weaker is the effect of life events. This suggests that
subjects who possess greater resources, in the form of higher SES, are better able to
withstand the mental stress of potentially traumatic life events.
2
Testing an Interaction Term
For two explanatory variables, the model allowing interaction is
E(Y ) = α + β1x1 + β2x2 + β3x1x2.
The simpler model assuming no interaction is the special case β3 = 0. The hypothesis
of no interaction is H0: β3 = 0. As usual, the t test statistic divides the estimate of
the parameter (β3) by its standard error.
From Table 11.8, t = −0.00087/0.0013 = −0.67. The P -value for Ha: β3 6= 0
is P = 0.51. Little evidence exists of interaction. The variation in the slope of the
relationship between mental impairment and life events for various SES levels could
be due to sampling variability. The sample size here is small, however, and this makes
it difficult to estimate effects precisely. Studies based on larger sample sizes (e.g.,
Holzer 1977) have shown that interaction of the type seen in this example does exist
for these variables.
In Table 11.8, neither the test of H0: β1 = 0 or of H0: β2 = 0 have small P -
values. Yet, the tests of both H0: β1 = 0 and H0: β2 = 0 are highly significant for
the ‘no interaction’ model E(Y ) = α+β1x1 +β2x2; from Table 11.5, the P -values are
0.003 and 0.002. This loss of significance occurs because x3 = x1x2 is quite strongly
correlated with x1 and x2, with rX1X3= 0.779 and rX2X3
= 0.646. These substantial
correlations are not surprising, since x3 = x1x2 is completely determined by x1 and
x2.
Since considerable overlap occurs in the variation in Y that is explained by x1
and by x1x2, and also by x2 and x1x2, the partial variability explained by each is
relatively small. For example, much of the predictive power contained in x1 is also
contained in x2 and x1x2. The unique contribution of x1 (or x2) to the model is
relatively small, and nonsignificant, when x2 (or x1) and x1x2 are in the model.
When the evidence of interaction is weak, as it is here with a P -value of 0.51, it
is best to drop the interaction term from the model before testing hypotheses about
partial effects such as H0: β1 = 0 or H0: β2 = 0. On the other hand, if the evidence
of interaction is strong, it no longer makes sense to test these other hypotheses. If
there is interaction, then the effect of each variable exists and differs according to the
level of the other variable.
11.5. INTERACTION BETWEEN PREDICTORS IN THEIR EFFECTS461
Centering the Explanatory Variables∗
For the mental health data, we’ve seen that x1 and x2 are highly significant in the
model with only those predictors (see Table 11.5) but lose their significance after
entering the interaction term, even though the interaction is not significant (see Table
11.8). We also saw that the coefficients of x1 and x2 in an interaction model are not
usually meaningful, because they refer to the effect of a predictor only when the other
predictor equals 0.
Suppose we center the scores for each variable around 0, by subtracting the mean.
Letting xC1 = x1 −µX1
and xC2 = x2 −µX2
, we then express the interaction model as
E(Y ) = α + β1xC1 + β2xC
2 + β3xC1 xC
2
= α + β1(x1 − µX1) + β2(x2 − µX2
) + β3(x1 − µX1)(x2 − µX2
).
Now, β1 refers to the effect of x1 at the mean of x2, and β2 refers to the effect of x2
at the mean of x1.
When we rerun the interaction model for the mental health data after centering
the predictors about their sample means, that is, with LIFE CEN = LIFE - 44.425
and SES CEN = SES - 56.60, we get
B Std. Error t Sig
(Constant) 27.359555 0.731366 37.409 0.0001
LIFE_CEN 0.106850 0.033185 3.220 0.0027
SES_CEN -0.098965 0.029390 -3.367 0.0018
LIFE_CEN*SES_CEN -0.000866 0.001297 -0.668 0.5087
The estimate for the interaction term is the same as for the model with uncentered
predictors, but now the estimates (and standard errors) for the effects of x1 and x2
alone are similar to the values for the no-interaction model. Also, their statistical
significance is similar as in that model.
Centering the predictor variables before using them in a model allowing interaction
has two benefits. First, the estimates of the effects of x1 and x2 are more meaningful,
being effects at the mean rather than at 0. Second,q the estimates and their standard
errors are similar as in the no-interaction model. The cross-product term with centered
variables does not overlap with the other terms like it does in the ordinary model.
Generalizations and Limitations∗
When the number of explanatory variables exceeds two, a model allowing interaction
can have cross-products for each pair of explanatory variables. For example, with
three explanatory variables, an interaction model is
E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3.
This is a special case of multiple regression with six explanatory variables, identifying
x4 = x1x2, x5 = x1x3, and x6 = x2x3. Significance tests can judge which, if any, of
the cross-product terms are needed in the model.
462 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
When interaction exists and the model contains cross-product terms, it is more
difficult to summarize simply the relationships. One approach is to sketch a collection
of lines such as those in Figure 11.10 to describe graphically how the relationship
between two variables changes according to the values of other variables. Another
possibility is to divide the data into groups according to the value on a control variable
(e.g., high on x2, medium on x2, low on x2) and report the slope between Y and x1
within each subset as a means of describing the interaction.
The interaction terms in the above model are called second-order, to distinguish
them from higher-order interaction terms with products of more than two variables
at a time. Such terms are occasionally used in more complex models, not considered
in this chapter.
11.6 Comparing Regression Models
When the number of explanatory variables increases, the multiple regression model
becomes more difficult to interpret and some variables may become redundant. This
is especially true when some explanatory variables are cross-products of others, to
allow for interaction. Not all predictors may be needed in the model. We next present
a test of whether a model fits significantly better than a simpler model containing
only some of the predictors.
Complete and Reduced Models
We refer to the full model with all the predictors as the complete model. The
model containing only some of these predictors is called the reduced model. The
reduced model is said to be nested within the complete model, being a special case of
it.
The complete and reduced models are identical if the partial regression coefficients
for the extra variables in the complete model all equal 0. In that case, none of the
extra predictors increases the explained variability in Y , in the population of interest.
Testing whether the complete model is identical to the reduced model is equivalent to
testing whether the extra parameters in the complete model equal 0. The alternative
hypothesis is that at least one of these extra parameters is not 0, in which case the
complete model is better than the reduced model.
For instance, a complete model with three explanatory variables and all the second-
order interaction terms is
E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3.
The reduced model without the interaction terms is
E(Y ) = α + β1x1 + β2x2 + β3x3.
The test comparing the complete model to the reduced model has H0: β4 = β5 =
β6 = 0.
11.6. COMPARING REGRESSION MODELS 463
Comparing Models by Comparing SSE or R2 Values
The test statistic for comparing two regression models compares the residual sums
of squares for the two models. Denote SSE =∑
(y − y)2 for the reduced model by
SSEr and for the complete model by SSEc. Now, SSEr ≥ SSEc, because the reduced
model has fewer predictors and tends to make poorer predictions. Even if H0 were
true, we would not expect the estimates of the extra parameters and the difference
(SSEr−SSEc) to equal 0. Some reduction in error occurs from fitting the extra terms
because of sampling variability.
The test statistic uses the reduction in error, SSEr − SSEc, that results from
adding the extra variables. It has df = the number of extra terms in the complete
model. An equivalent statistic uses the R2 values, R2c for the complete model and R2
r
for the reduced model. The test statistic equals
F =(SSEr − SSEc)/df1
SSEc/df2=
(R2c − R2
r)/df1
(1 − R2c)/df2
,
where df1 is the number of extra terms in the complete model and df2 is the usual
residual df for the complete model, which is df2 = n − (k + 1). A relatively large
reduction in error (or relatively large increase in R2) yields a large F test statistic
and small P -value. As usual for F statistics, the P -value is the right-tail probability.
Example 11.7 Comparing Models for Mental Impairment
For the mental impairment data, a comparison of the complete model
E(Y ) = α + β1x1 + β2x2 + β3x1x2
to the reduced model
E(Y ) = α + β1x1 + β2x2
analyzes whether interaction exists. The complete model has just one additional term,
and the null hypothesis is H0: β3 = 0.
The sum of squared errors for the complete model is SSEc = 758.8 (Table 11.8),
while for the reduced model it is SSEr = 768.2 (Table 11.7). The difference
SSEr − SSEc = 768.2 − 758.8 = 9.4
has df1 = 1 since the complete model has one more parameter. Since the sample size
is n = 40, df2 = n − (k + 1) = 40 − (3 + 1) = 36, the df for SSE in Table 11.8. The F
test statistic equals
F =(SSEr − SSEc)/df1
SSEc/df2=
9.4/1
758.8/36= 0.45.
Equivalently, the R2 values for the two models are R2r = 0.339 and R2
c = 0.347, so
F =
(
R2c − R2
r
)
/df1(
1 − R2c
)
/df2
=(0.347 − 0.339)/1
(1 − 0.347)/36= 0.45.
464 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
From software, the P -value from the F distribution with df1 = 1 and df2 = 36 is
P = 0.51. There is little evidence that the complete model is better. The null
hypothesis seems plausible, so the reduced model is adequate.
When H0 contains a single parameter, the t test is available. In fact, from the
previous section (and Table 11.8), the t statistic equals
t =b3se
=−0.00087
0.0013= −0.67.
It also has a P -value of 0.51 for Ha: β3 6= 0. We get the same result with the t test as
with the F test for complete and reduced models. In fact, the F test statistic equals
the square of the t statistic. (Refer to the final subsection in Section 11.4.)
2
The t test method is limited to testing one parameter at a time. The F test can
test several regression parameters together to analyze whether at least one of them is
nonzero, such as in the global F test of H0 : β1 = · · · = βk = 0 or the test comparing
a complete model to a reduced model. F tests are equivalent to t tests only when H0
contains a single parameter.
11.7 Partial Correlation∗
Multiple regression models describe the effect of an explanatory variable on the re-
sponse variable while controlling for other variables of interest. Related measures
describe the strength of the association. For example, to describe the association
between mental impairment and life events, controlling for SES, we could ask, “Con-
trolling for SES, what proportion of the variation in mental impairment does life
events explain?”
These measures describe the partial association between Y and a particular pre-
dictor, whereas the multiple correlation and R2 describe the association between Y
and the entire set of predictors in the model. The partial correlation is based on the
ordinary correlations between each pair of variables. For a single control variable, it
is defined as follows:
Partial Correlation
The sample partial correlation between Y and X2, controlling for X1, is
rY X2·X1=
rY X2− rY X1
rX1X2√
(
1 − r2Y X1
) (
1 − r2X1X2
)
.
In the symbol rY X2·X1, the variable to the right of the dot represents the controlled
variable. The analogous formula for rY X1·X2(i.e., controlling X2) is
rY X1·X2=
rY X1− rY X2
rX1X2√
(
1 − r2Y X2
) (
1 − r2X1X2
)
.
11.7. PARTIAL CORRELATION∗ 465
Since one variable is controlled, the partial correlations rY X1·X2and rY X2·X1
are
called first-order partial correlations.
Example 11.8 Partial Correlation Between Education and Crime Rate
Example 11.1 discussed a data set for counties in Florida, with Y = crime rate,
X1 = education, and X2 = urbanization. The pairwise correlations are rY X1=
0.468, rY X2= 0.678, and rX1X2
= 0.791. It was surprising to observe a positive
correlation between crime rate and education. Can it be explained by their joint
dependence on urbanization? This is plausible if the association disappears when we
control for urbanization.
The partial correlation between crime rate and education, controlling for urban-
ization, equals
rY X1·X2=
rY X1− rY X2
rX1X2√
(1 − r2Y X2
)(1 − r2X1X2
)=
0.468 − 0.678(0.791)√
(1 − 0.6782) (1 − 0.7912)= −0.152.
Not surprisingly, rY X1·X2is much smaller than rY X1
. It even has a different direction,
illustrating Simpson’s paradox. The relationship between crime rate and education
may well be spurious, reflecting their joint dependence on urbanization.
2
Interpreting Partial Correlations
The partial correlation has properties similar to those for the ordinary correlation
between two variables, such as a range of −1 to +1, larger absolute values representing
stronger associations, and value free of the units. We list the properties below for
rY X1·X2, but analogous properties apply to rY X2·X1
.
• rY X1·X2falls between −1 and +1.
• The larger the absolute value of rY X1·X2, the stronger the association between
Y and X1, controlling for X2.
• The value of a partial correlation does not depend on the units of measurement
of the variables.
• rY X1·X2has the same sign as the partial slope (b1) for the effect of x1 in the
prediction equation y = a+b1x1+b2x2. This happens because the same variable
(x2) is controlled in the model as in the correlation.
• Under the assumptions for conducting inference for multiple regression (see
the beginning of Section 11.4), rY X1·X2estimates the correlation between Y
and X1 at every fixed value of X2. If we could control X2 by considering
a subpopulation of subjects all having the same value on X2, then rY X1·X2
estimates the correlation between Y and X1 for that subpopulation.
466 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
• The sample partial correlation is identical to the correlation computed for the
points in the partial regression plot (Section 11.2).
Interpreting Squared Partial Correlations
Like r2 and R2, the square of a partial correlation has a proportional reduction in
error (PRE) interpretation. It states that r2Y X2·X1
is the proportion of variation in
Y explained by X2, controlling for X1. This squared measure describes the effect of
removing from consideration the portion of the total sum of squares (TSS) in Y that
is explained by X1, and then finding the proportion of the remaining unexplained
variation in Y that is explained by X2.
Squared Partial Correlation
The square of the partial correlation rY X2·X1represents the proportion of the variation
in Y that is explained by X2, out of that left unexplained by X1. It equals
r2Y X2·X1
=R2 − r2
Y X1
1 − r2Y X1
=Partial proportion explained uniquely by X2
Proportion unexplained by X1.
Recall from Section 9.4 that r2Y X1
represents the proportion of the variation in
Y explained by X1. The remaining proportion (1 − r2Y X1
) represents the variation
left unexplained. When X2 is added to the model, it accounts for some additional
variation. The total proportion of the variation in Y accounted for by X1 and X2
jointly is R2 for the model with both X1 and X2 as explanatory variables. So, R2 −r2Y X1
is the additional proportion of the variability in Y explained by X2, after the
effects of X1 have been removed or controlled. The maximum this difference could
be is 1 − r2Y X1
, the proportion of variation yet to be explained after accounting for
the influence of X1. The additional explained variation R2 − r2Y X1
divided by this
maximum possible difference is a measure that has a maximum possible value of 1. In
fact, as the above formula suggests, this ratio equals the squared partial correlation
between Y and X2, controlling for X1.
Figure 11.11 illustrates this property of the squared partial correlation. It shows
the ratio of the partial contribution of X2 beyond that of X1, namely, R2 − r2Y X1
,
divided by the proportion (1 − r2Y X1
) left unexplained by X1. Similarly, the square
of rY X1·X2equals
r2Y X1·X2
=R2 − r2
Y X2
1 − r2Y X2
,
the proportion of variation in Y explained by X1, out of that part unexplained by X2.
Example 11.9 Partial Correlation of Life Events with Mental Impairment
We return to the mental health study, with Y = mental impairment, X1 = life
events, X2 = SES. Software reports the correlation matrix,
11.7. PARTIAL CORRELATION∗ 467
IMPAIR LIFE SES
IMPAIR 1.000 .372 -.399
LIFE .372 1.000 .123
SES -.399 .123 1.000
So, rY X1= 0.372, rY X2
= −0.399, and rX1X2= 0.123. By its definition, the partial
correlation between mental impairment and life events, controlling for SES, is
rY X1·X2=
rY X1− rY X2
rX1X2√
(
1 − r2Y X2
) (
1 − r2X1X2
)
=0.372 − (−0.399)(0.123)
√
[1 − (−0.399)2] (1 − 0.1232)= 0.463.
The partial correlation, like the correlation of 0.37 between mental impairment and
life events, is moderately positive.
Since r2Y X1·X2
= (0.463)2 = 0.21, controlling for SES, 21% of the variation in
mental impairment is explained by life events. Alternatively, since R2 = 0.339 (Table
11.7),
r2Y X1·X2
=R2 − r2
Y X2
1 − r2Y X2
=0.339 − (−0.399)2
1 − (−0.399)2= 0.21.
2
Higher-Order Partial Correlations
One reason we showed the connection between squared partial correlation values and
R-squared is that this approach also works when the number of control variables
exceeds one. For example, with three predictors, let R2Y (X1,X2,X3) denote the value
of R2. The square of the partial correlation between Y and X3, controlling for X1 and
X2, relates to how much larger this is than the R2 value for the model with only X1
and X2 as predictors, which we denote by R2Y (X1,X2). The squared partial correlation
is
r2Y X3·X1,X2
=R2
Y (X1,X2,X3) − R2Y (X1,X2)
1 − R2Y (X1,X2)
.
In this expression, R2Y (X1,X2,X3) −R2
Y (X1,X2)is the increase in the proportion of
explained variance from adding x3 to the model. The denominator 1 − R2Y (X1,X2)
is
the proportion of the variation left unexplained when x1 and x2 are the only predictors
in the model.
The partial correlation rY X3·X1,X2is called a second-order partial corre-
lation , since it controls two variables. It has the same sign as b3 in the prediction
equation y = a + b1x1 + b2x2 + b3x3, which also controls x1 and x2 in describing the
effect of x1.
Inference for Partial Correlations
Controlling for a certain set of variables, the slope of the partial effect of a predictor is 0
in the same situations in which the partial correlation between Y and that predictor is
0. An alternative formula for the t test for a partial effect uses the partial correlation.
468 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
With k predictors in the model, the equivalent t test statistic is
t =partial correlation
√
(1 − squared partial correlation)/[n − (k + 1)].
This statistic has the t distribution with df = n − (k + 1). It equals the t statistic
based on the partial slope estimate and, hence, has the same P -value.
We illustrate by testing that the population partial correlation between mental
impairment and life events, controlling for SES, is 0. From Example 11.9, rY X1·X2=
0.463. There are k = 2 explanatory variables and n = 40 observations. The test
statistic equals
t =rY X1·X2
√
(1 − r2Y X1·X2
)/[n − (k + 1)]=
0.463√
[1 − (0.463)2]/37= 3.18.
This equals the test statistic for H0: β1 = 0 in Table 11.5. Thus, the P -value is also
the same, P = 0.003.
When no variables are controlled (i.e., the number of explanatory variables is
k = 1), the t statistic formula simplifies to
t =r
√
(1 − r2)/(n − 2).
This is the statistic for testing that the population bivariate correlation equals 0
(Section 9.5). Confidence intervals for partial correlations are more complex. They
require a log transformation such as shown for the correlation in Exercise 65 in Chapter
9.
11.8 Standardized Regression Coefficients∗
As in bivariate regression (Recall Section 9.4), the sizes of regression coefficients in
multiple regression models depend on the units of measurement for the variables. To
compare the relative effects of two explanatory variables, it is appropriate to compare
their coefficients only if the variables have the same units. Otherwise, standardized
versions of the regression coefficients provide more meaningful comparisons.
Standardized Regression Coefficient
The standardized regression coefficient for an explanatory variable represents the
change in the mean of Y , in Y standard deviations, for a one standard deviation increase
in that variable, controlling for the other explanatory variables in the model. We denote
them by β∗
1 , β∗
2 , . . . .
If |β∗
2 | > |β∗
1 |, for example, then a standard deviation increase in X2 has a greater
partial effect on Y than does a standard deviation increase in X1.
11.8. STANDARDIZED REGRESSION COEFFICIENTS∗ 469
The Standardization Mechanism
The standardized regression coefficients represent the values the regression coefficients
take when the units are such that Y and the explanatory variables all have equal
standard deviations. We standardize the partial regression coefficients by adjusting for
the differing standard deviation of Y and each Xi. Let sy denote the sample standard
deviation of Y , and let sx1, sx2
, . . . , sxkdenote the sample standard deviations of the
explanatory variables.
The estimates of the standardized regression coefficients are
b∗1 = b1
(
sx1
sy
)
, b∗2 = b2
(
sx2
sy
)
, . . . .
Example 11.10 Standardized Coefficients for Mental Impairment
The prediction equation relating mental impairment to life events and SES is
y = 28.23 + 0.103x1 − 0.097x2.
Table 11.2 reported the sample standard deviations sy = 5.5, sx1= 22.6, and sx2
=
25.3. Since the unstandardized coefficient of x1 is b1 = 0.103, the estimated standard-
ized coefficient is
b∗1 = b1
(
sx1
sy
)
= 0.103(
22.6
5.5
)
= 0.43.
Since b2 = −0.097, the standardized value equals
b∗2 = b2
(
sx2
sy
)
= −0.097(
25.3
5.5
)
= −0.45.
The estimated change in the mean of Y for a standard deviation increase in x1,
controlling for x2, has similar magnitude as the estimated change for a standard
deviation increase in x2, controlling for x1. However the partial effect of x1 is positive,
whereas the partial effect of x2 is negative.
Table 11.9, which repeats Table 11.5, shows how SPSS reports the estimated stan-
dardized regression coefficients. It uses the heading BETA, reflecting the alternative
name beta weights for these coefficients.
2
470 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.9: SPSS Printout for Fit of Multiple Regression Model to Mental Impair-ment Data
Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
(Constant) 28.230 2.174 12.984 .000
LIFE .103 .032 .428 3.177 .003
SES -.097 .029 -.451 -3.351 .002
Properties of Standardized Regression Coefficients
For bivariate regression, standardizing the regression coefficient yields the correla-
tion. For the multiple regression model, the standardized partial regression coefficient
relates to the partial correlation (Exercise 67), and it usually takes similar value.
Unlike the partial correlation, however, b∗i need not fall between −1 and +1. A
value |b∗i | > 1 occasionally occurs when Xi is highly correlated with the set of other
explanatory variables in the model. In such cases, the standard errors are usually
large and the estimates are unreliable.
Since a standardized regression coefficient is a multiple of the unstandardized
coefficient, one equals 0 when the other does. The test of H0: β∗
i = 0 is equivalent to
the t test of H0: βi = 0. It is unnecessary to have separate tests for these coefficients.
In the sample, the magnitudes of the {b∗i } have the same relative sizes as the t statistics
from those tests. For example, the predictor with the greatest standardized partial
effect is the one that has the largest t statistic, in absolute value.
Standardized Form of Prediction Equation∗
Regression equations have an expression using the standardized regression coefficients.
In this equation, the variables appear in standardized form.
Notation for Standardized Variables
Let zY , zX1, . . . , zXk
denote the standardized versions of the variables Y, X1, . . . , Xk .
For instance, zY = (y − y)/sy represents the number of standard deviations that an
observation on y falls from its mean.
Each subject’s scores on y, x1, . . . , xk have corresponding z-scores for zY , zX1, . . . , zXk
.
If a subject’s score on x1 is such that zX1= (x1 − x1)/sx1
= 2.0, for instance, then
that subject falls two standard deviations above the mean x1 on that variable.
Let zY = (y − y)/sy denote the predicted z-score for the response variable. For
the standardized variables and the estimated standardized regression coefficients, the
prediction equation is
zY = b∗1zX1+ b∗2zX2
+ · · · + b∗kzXk.
11.8. STANDARDIZED REGRESSION COEFFICIENTS∗ 471
This equation predicts how far an observation on y falls from its mean, in standard
deviation units, based on how far the explanatory variables fall from their means, in
standard deviation units. The standardized coefficients are the weights attached to
the standardized explanatory variables in contributing to the predicted standardized
response variable.
Example 11.11 Standardized Prediction Equation for Mental Impairment
Example 11.10 found that the estimated standardized regression coefficients for
the life events and SES predictors of mental impairment are b∗1 = 0.43 and b∗2 = −0.45.
The prediction equation relating the standardized variables is therefore
zY = 0.43zX1− 0.45zX2
.
Consider a subject who is two standard deviations above the mean on life events
but two standard deviations below the mean on SES. This subject has a predicted
standardized mental impairment of
zY = 0.43(2) − 0.45(−2) = 1.8.
The predicted mental impairment for that subject is 1.8 standard deviations above
the mean. If the distribution of mental impairment is approximately normal, this
subject might well have mental health problems, since only about 4% of the scores in
a normal distribution fall at least 1.8 standard deviations above their mean.
2
In the prediction equation with standardized variables, no intercept term appears.
Why is this? When the standardized explanatory variables all equal 0, those variables
all fall at their means. Then, y = y, so that
zY =y − y
sy= 0.
So, this merely tells us that a subject who falls at the mean on each explanatory
variable is predicted to fall at the mean on the response variable.
Cautions in Comparing Standardized Regression Coeffi-
cients
To assess which predictor in a multiple regression model has the greatest impact on the
response variable, it is tempting to compare their standardized regression coefficients.
Make such comparisons with caution. In some cases, the observed differences in the
b∗i may simply reflect sampling error. In particular, when multicollinearity exists, the
standard errors are high and the estimated standardized coefficients may be unstable.
Keep in mind also that the effects are partial ones, depending on which other
variables are in the model. An explanatory variable that seems important in one
472 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
system of variables may seem unimportant when other variables are controlled. For
example, it is possible that |b∗2| > |b∗1| in a model with two explanatory variables, yet
when a third explanatory variable is added to the model, |b∗2| < |b∗1|.It is unnecessary to standardize to compare the effect of the same variable for two
groups, such as in comparing the results of separate regressions for females and males,
since the units of measurement are the same in each group. In fact, it is usually unwise
to standardize in this case, because the standardized coefficients are more susceptible
than the unstandardized coefficients to differences in the standard deviations of the
predictors. For instance, Section 9.6 showed that the correlation depends strongly on
the range of x-values sampled. Two groups that have the same value for an estimated
regression coefficient have different standardized coefficients if the standard deviation
of the predictor differs for the two groups.
Finally, if an explanatory variable is highly correlated with the set of other ex-
planatory variables, it is artificial to conceive of that variable changing while the
others remain fixed in value. As an extreme example, suppose Y = height, X1 =
length of left leg, and X2 = length of right leg. The correlation between X1 and X2
is extremely close to 1. It does not make much sense to imagine how Y changes as
X1 changes while X2 is controlled.
11.9 Chapter Summary
This chapter generalized the bivariate regression model to include additional explana-
tory variables. The multiple regression equation relating a response variable Y
to a set of k explanatory variables is
E(Y ) = α + β1x1 + β2x2 + · · · + βkxk.
• The {βi} are partial regression coefficients . The value βi is the change
in the mean of Y for a one-unit change in xi, controlling for the other variables
in the model.
• The multiple correlation R describes the association between Y and the
collective set of explanatory variables. It equals the correlation between the
observed and predicted y-values. It falls between 0 and 1.
• R2 = (TSS − SSE)/TSS represents the proportional reduction in error from
predicting Y using the prediction equation y = a + b1x1 + b2x2 + · · · + bkxk
instead of y. It equals the square of the multiple correlation.
• A partial correlation , such as rY X1·X2, describes the association between
two variables, controlling for others. It falls between −1 and +1.
• The squared partial correlation between Y and xi represents the proportion of
the variation in Y that can be explained by xi, out of that part left unexplained
by a set of control variables.
Chap. 11 Problems 473
• An F statistic tests H0 : β1 = β2 = · · · = βk = 0, that the response variable
is independent of all the predictors. A small P -value suggests that at least one
predictor affects the response.
• Individual t tests and confidence intervals for {βi} analyze partial effects of each
predictor, controlling for the other variables in the model.
• Interaction between x1 and x2 in their effects on Y means that the effect of
either predictor changes as the value of the other predictor changes. We can
allow this by introducing cross-products of explanatory variables to the model,
such as the term β3(x1x2).
• To compare regression models, a complete model and a simpler reduced
model, the F test compares the SSE values or R2 values.
• Standardized regression coefficients do not depend on the units of mea-
surement. The estimated standardized coefficient b∗i describes the change in Y ,
in Y standard deviation units, for a one standard deviation increase in xi, con-
trolling for the other explanatory variables.
To illustrate, with k = 2 explanatory variables, the prediction equation is
Y = a + b1x1 + b2x2.
Fixing x2, a straight line describes the relation between Y and x1. Its slope b1is the change in y for a one-unit increase in x1, controlling for x2. The multiple
correlation R is at least as large as the correlations between Y and each predictor.
The squared partial correlation r2Y X2·X1
is the proportion of the variation of Y that
is explained by x2, out of that part of the variation left unexplained by x1. The
estimated standardized regression coefficient b∗1 = b1(sx1/sy) describes the effect of a
standard deviation change in x1, controlling for x2.
Table 11.10 summarizes the basic properties and inference methods for these mea-
sures and those introduced in Chapter 9 for bivariate regression.
The model studied in this chapter is still somewhat restrictive in the sense that
all the predictors are quantitative. The next chapter shows how to include categorical
predictors in the model.
PROBLEMS
Practicing the Basics
1. For students at Walden University, the relationship between Y = college GPA
(with range 0–4.0) and X1 = high school GPA (range 0–4.0) and X2 = college
board score (range 200–800) satisfies E(Y ) = 0.20 + 0.50x1 + 0.002x2.
a) Find the mean college GPA for students having (i) high school GPA = 4.0
and college board score = 800, (ii) x1 = 3.0 and x2 = 300.
b) Show that the relationship between Y and x1 for those students with x2 =
474 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.10: Summary of Bivariate and Multiple Regression
BIVARIATE REGRESSION MULTIPLE REGRESSIONModel E(Y ) = α + βx E(Y ) = α + β1x1 + · · · + βkxk
Predictionequation
y = a + bx y = a + b1x1 + · · · + bkxk
Simultaneous effect Partial effect ofof x1 . . . , xk one xi
Propertiesof
measures
b = Sloper = correlation,standardized slope,−1 ≤ r ≤ 1,r has the same sign as b
R = Multiple correlation,0 ≤ R ≤ 1
bi = Partial slopeb∗i = Standardized re-gression coefficient
r2 = PRE measure,0 ≤ r2 ≤ 1
R2 = PRE measure,0 ≤ R2 ≤ 1
Partial correlation,−1 ≤ rY X1·X2
≤ 1,same sign as bi andb∗i , r2
Y X1·X2is PRE
measure
Tests of noassociation
H0: β = 0 or H0: ρ = 0,Y not associated with x
H0: β1 = · · · = βk =0 (Y not associated withx1, . . . xk)
H0: βi = 0, orH0: popul. partial corr.= 0,Y not associated withxi, controlling for otherx variables
Teststatistic
t = bse
= r√
1−r2
n−2
F =Regression MSResidual MS
t = bi
se
df = n − 2 =R2/k
(1−R2)/[n−(k+1)], df = n − (k + 1)
df1 = k, df2 = n − (k + 1)
Chap. 11 Problems 475
500 is E(Y ) = 1.2 + 0.5x1.
c) Show that when x2 = 600, E(Y ) = 1.4 + 0.5x1. Thus, increasing x2 by 100
shifts the line relating Y to x1 upward by 100β2 = 0.2 units.
d) Show that setting x1 at a variety of values yields a collection of parallel lines,
each having slope 0.002, relating the mean of Y to x2.
2. For recent data in Florida on Y = selling price of home (in dollars), X1 = size
of home (in square feet), X2 = lot size (in square feet), the prediction equation
is y = −10, 536 + 53.8x1 + 2.84x2.
a) A particular home of 1240 square feet on a lot of 18,000 square feet sold for
$145,000. Find the predicted selling price and the residual, and interpret.
b) For fixed lot size, how much is the house selling price predicted to increase
for each square foot increase in home size? Why?
3. Refer to the previous exercise:
a) For fixed home size, how much would lot size need to increase to have the
same impact as a one square foot increase in home size?
b) Suppose house selling prices are changed from dollars to thousands of dollars.
Explain why the prediction equation changes to y = −10.536+0.0538+0.00284.
4. Use software with the “2005 statewide crime” data file at the text website, with
murder rate (number of murders per 100,000 people) as the response variable
and with percent of high school graduates and the poverty rate (percentage of
the population with income below the poverty level) as explanatory variables.
a) Construct the partial regression plots. Interpret.
b) Report the prediction equation. Explain how to interpret the estimated
coefficients.
c) Re-do the analyses after deleting the D.C. observation. Does this observation
have much influence on the results?
5. A regression analysis with recent U.N. data from several nations on Y = per-
centage of people who use the Internet, X1 = per capita gross domestic product
(in thousands of dollars), and X2 = percentage of people using cell phones has
results shown in Table 11.11.
a) Write the prediction equation.
b) Find the predicted Internet use for a country with per capita GDP of 10
thousand dollars and 50% using cell phones.
c) Find the prediction equations when cell-phone use is (i) 0 %, (ii) 100%, and
use them to interpret the effect of GDP.
d) Use the equations in (c) to explain the ‘no interaction’ property of the model.
6. Refer to the previous exercise.
a) Show how to obtain R-squared from the sums of squares in the ANOVA
table. Interpret it.
b) r2 = 0.78 when GDP is the sole predictor. Why do you think R2 does not
increase much when cell-phone use is added to the model, even though it is
476 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.11:
B Std. Error t Sig
(Constant) -3.601 2.506 -1.44 0.159
GDP 1.2799 0.2703 4.74 0.000
CELLULAR 0.1021 0.0900 1.13 0.264
R Square .796
ANOVA
Sum of Squares DF
Regression 10316.8 2
Residual Error 2642.5 36
Total 12959.3 38
itself highly associated with Y (with r = 0.67)? (Hint: Would you expect X1
and X2 to be highly correlated? If so, what’s the effect?)
7. Table 9.17 showed data from Florida counties on Y = crime rate (number per
1000 residents), X1 = median income (thousands of dollars), and X2 = percent
in urban environment.
a) Figure 11.12 shows a scatterplot relating Y to X1. Predict the sign that the
estimated effect of X1 has in the prediction equation Y = a + bx1. Explain.
b) Figure 11.13 shows a partial regression plot relating Y to X1, controlling
for X2. Predict the sign that the estimated effect of X1 has in the prediction
equation Y = a + b1x1 + b2x2. Explain.
c) Table 11.12 shows part of a printout for the bivariate and multiple regression
models. Report the prediction equation relating Y to x1, and interpret the slope.
d) Report the prediction equation relating Y to both x1 and x2. Interpret the
coefficient of x1, and compare to (c).
e) The correlations are rY X1= 0.43, rY X2
= 0.68, rX1X2= 0.73. Use these to
explain why the x1 effect seems so different in (c) and (d).
f) Report the prediction equations relating crime rate to income at urbanization
levels of (i) 0, (ii) 50, (iii) 100. Interpret.
8. Refer to the previous exercise. Using software with the “Florida crime” data
file at the text website:
a) Construct box plots for each variable and scatterplots and partial regression
plots between Y and each of x1 and x2. Interpret these plots.
b) Find the prediction equations for the bivariate effects of x1 and of x2. In-
terpret.
c) Find the prediction equation for the multiple regression model. Interpret.
Chap. 11 Problems 477
Table 11.12:
B Std. Error t Sig
(Constant) -11.526 16.834 -0.685 0.4960
INCOME 2.609 0.675 3.866 0.0003
B Std. Error t Sig
(Constant) 40.261 16.365 2.460 0.0166
INCOME -0.809 0.805 -1.005 0.3189
URBAN 0.646 0.111 5.811 0.0001
d) Find R2 for the multiple regression model, and show that it is not much
larger than r2 for the model using urbanization alone as the predictor. Inter-
pret.
9. Recent UN data from several nations on Y = crude birth rate (number of births
per 1000 population size), X1 = women’s economic activity (female labor force
as percentage of male), and X2 = GNP (per capita, in thousands of dollars)
has prediction equation y = 34.53 − 0.13x1 − 0.64x2.
a) Interpret the coefficient of x1.
b) Sketch on a single graph the relationship between Y and x1 when x2 = 0,
x2 = 10, and x2 = 20. Interpret the results.
c) The bivariate prediction equation with x1 is y = 37.65 − 0.31x1. The corre-
lations are rY X1= −0.58, rY X2
= −0.72, and rX1X2= 0.58. Explain why the
coefficient of x1 in the bivariate equation is quite different from in the multiple
predictor equation.
10. For recent UN data for several nations, a regression of carbon dioxide use (CO2,
a measure of air pollution) on gross domestic product (GDP) has a correlation
of 0.786. With life expectancy as a second explanatory variable, the multiple
correlation is 0.787.
a) Explain how to interpret the multiple correlation.
b) For predicting CO2, did it help much to add life expectancy to the model?
Does this mean that life expectancy is very weakly correlated with CO2? Ex-
plain.
11. Table 11.13 shows a printout from fitting the multiple regression model to recent
statewide data, excluding D.C., on Y = violent crime rate (per 100,000 people),
X1 = poverty rate (percentage with income below the poverty level), and X2 =
percent living in metropolitan areas.
a) Report the prediction equation.
b) Massachusetts had y = 805, x1 = 10.7, and x2 = 96.2. Find its predicted
478 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.13:
Sum of
Squares DF Mean Square F Sig
Regression 2448368.07 2 1224184.04 31.249 0.0001
Residual 1841257.15 47 39175.68
Total 4289625.22 49
R R Square Std Error of the Estimate
.7555 .5708 197.928
B Std. Error t Sig
(Constant) -498.683 140.988 -3.537 0.0009
POVERTY 32.622 6.677 4.885 0.0001
METRO 9.112 1.321 6.900 0.0001
Correlations
VIOLENT POVERTY METRO
VIOLENT 1.0000 .3688 .5940
POVERTY .3688 1.0000 -.1556
METRO .5940 -.1556 1.0000
Chap. 11 Problems 479
violent crime rate. Find the residual, and interpret.
c) Interpret the fit by showing the prediction equation relating y and x1 for
states with (i) x2 = 0, (ii) x2 = 50, (iii) x2 = 100. Interpret.
d) Interpret the correlation matrix.
e) Report R2 and the multiple correlation, and interpret.
12. Refer to the previous exercise.
a) Report the F statistic for testing H0: β1 = β2 = 0, report its df values and
P -value, and interpret.
b) Show how to construct the t statistic for testing H0: β1 = 0, report its df
and P -value for Ha: β1 6= 0, and interpret.
c) Construct a 95% confidence interval for β1, and interpret.
d) Since these analyses use data for all the states, what relevance, if any, do
the inferences have in (a)–(c)?
13. Refer to the previous two exercises. When we add x3 = percentage of single-
parent families to the model, we get the results in Table 11.14.
a) Report the prediction equation and interpret the coefficient of poverty rate.
b) Why do you think the effect of poverty rate is much lower after x3 is added
to the model?
Table 11.14:Variable Coefficient Std. Error
Intercept -1197.538Poverty 18.283 (6.136)Metropolitan 7.712 (1.109)Single-parent 89.401 (17.836)R2 0.722n 50
14. Table 11.15 comes from a regression analysis of Y = number of children in
family, X1 = mother’s educational level in years (MEDUC), and X2 = father’s
socioeconomic status (FSES), for a random sample of 49 college students at
Texas A&M University.
a) Write the prediction equation. Interpret parameter estimates.
b) For the first subject in the sample, x1 = 12, x2 = 61, and y = 5. Find the
predicted value of y and the residual, and interpret.
c) Report SSE. Use it to explain the least squares property of this prediction
equation.
d) Explain why it is not possible that rY X1·X2= 0.40.
b) Can you tell from the table whether rY X1is positive or negative? Explain.
15. The General Social Survey has asked subjects to rate various groups using the
“feeling thermometer.” The rating is between 0 and 100, more favorable as the
480 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.15:Sum of Squares
Regression 31.8Residual 199.3
B(Constant) 5.25MEDUC -0.24FSES 0.02
score gets closer to 100 and less favorable as the score gets closer to 0. For a
small data set from the GSS, Table 11.16 shows results of fitting the multiple
regression model with feelings toward liberals as the response, using explanatory
variables political ideology (scores 1 = extremely liberal, 2 = liberal, 3 = slightly
liberal, 4 = moderate, 5 = slightly conservative, 6 = conservative, 7 = extremely
conservative) and religious attendance, using scores (1 = never, 2 = less than
once a year, 3= once or twice a year, 4 = several times a year, 5 = about once
a month, 6 = 2-3 times a month, 7 = nearly every week, 8 = every week, 9 =
several times a week). Standard errors are shown in parentheses.
a) Report the prediction equation and interpret the ideology partial effect.
b) Report the predicted value and residual for the first observation, for which
ideology = 7, religion = 9, and feelings = 10.
c) Report, and explain how to interpret, R2.
d) Tables of this form often put * by an effect having P < 0.05, ** by an effect
having P < 0.01, and *** by an effect having P < 0.001. Show how this was
determined for the ideology effect, and discuss the disadvantage of summarizing
in this manner.
e) Explain how the F value can be obtained from the R2 value reported. Report
its df values, and explain how to interpret its result.
f) The estimated standardized regression coefficients are −0.79 for ideology and
−0.23 for religion. Interpret.
16. Refer to Table 11.5. Test H0: β2 = 0 that mental impairment is independent
of SES, controlling for life events. Report the test statistic, and report and
interpret the P -value for (a) Ha: β2 6= 0, (b) Ha: β2 < 0.
17. For a random sample of 66 state precincts, data are available on
Y = Percentage of adult residents who are registered to vote
X1 = Percentage of adult residents owning homes
X2 = Percentage of adult residents who are nonwhite
X3 = Median family income (thousands of dollars)
X4 = Median age of residents
Chap. 11 Problems 481
Table 11.16:Variable Coefficient
Intercept 135.31Ideology -14.07
(3.16)**
Religion -2.95(2.26)
F 13.93**
R2 0.799Adj. R2 0.742(n) (10)
X5 = Percentage of residents who have lived in the
precinct at least ten years
Table 11.16 shows a portion of the printout used to analyze the data.
a) Fill in all the missing values in the printout, indicating in each ‘Sig’ space
whether P > 0.05, 0.01 < P < 0.05, 0.001 < P < 0.01, or P < 0.001.
b) Do you think it is necessary to include all five explanatory variables in the
model? Explain.
c) To what test does the “F Value” refer? Interpret the result of that test.
d) To what test does the t-value opposite x1 refer? Interpret the result of that
test.
18. Refer to the previous exercise.
a) Find a 95% confidence interval for the change in the mean of Y for a 1-unit
increase in the percentage of adults owning homes, controlling for the other
variables. Interpret.
b) Find a 95% confidence interval for the change in the mean of Y for a 50-unit
increase in the percentage of adults owning homes, controlling for the other
variables. Interpret.
19. Use software with the “house selling price” data file at the text website to con-
duct a multiple regression analysis of Y = selling price of home (dollars), X1 =
size of home (square feet), X2 = number of bedrooms, X3 = number of bath-
rooms.
a) Use graphics to display the effects of the predictors. Interpret, and explain
how the highly discrete nature of x2 and x3 affects the plots.
b) Report the prediction equation and interpret the estimates.
c) Inspect the correlation matrix, and report the variables having the (i) strongest
association, (ii) weakest association.
d) Report R2, and interpret.
482 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Table 11.17:
Sum of DF Mean F Sig R-Square
Squares Square ----
Regression ---- --- ---- ---- ----
Residual 2940.0 --- ---- Root MSE
Total 3753.3 --- ----
Parameter Standard
Variable Estimate Error t Sig
Intercept 70.0000
x1 0.1000 0.0450 ---- ----
x2 -0.1500 0.0750 ---- ----
x3 0.1000 0.2000 ---- ----
x4 -0.0400 0.0500 ---- ----
x5 0.1200 0.0500 ---- ----
e) Find the F statistic for testing the overall effect of the three predictors, re-
port its df values and its P -value, and interpret.
f) Find the t test statistic for H0: β3 = 0, report its P -value for Ha: β3 > 0,
and interpret.
20. Refer to the previous exercise. Now use only number of bathrooms and number
of bedrooms as predictors.
a) Again test the partial effect of number of bathrooms, and interpret.
b) Construct a 95% confidence interval for the coefficient of number of bath-
rooms, and interpret.
c) Find the partial correlation between selling price and number of bathrooms,
controlling for number of bedrooms. Compare it to the correlation, and inter-
pret.
d) Find the estimated standardized regression coefficients for the model, and
interpret.
e) Write the prediction equation using standardized variables. Interpret.
21. Exercise 11 showed a regression analysis for statewide data on Y = violent
crime rate, X1 = poverty rate, and X2 = percent living in metropolitan areas.
When we add an interaction term, we get the prediction equation y = 158.9 −14.72x1 − 1.29x2 + 0.76x1x2.
a) As the percentage living in metropolitan areas increases, does the effect of
poverty rate tend to increase or decrease? Explain.
b) Show how to interpret the prediction equation, by finding how it simplifies
Chap. 11 Problems 483
when x2 = 0, 50, and 100.
22. A study analyzes relationships among Y = percentage vote for Democratic
candidate, X1 = percentage of registered voters who are Democrats, and X2 =
percentage of registered voters who vote in the election, for several congressional
elections in 2006. The researchers expect interaction, since they expect a higher
slope between Y and x1 at larger values of x2 than at smaller values. They
obtain the prediction equation Y = 20+0.30x1 +0.05x2 +0.005x1x2. Does this
equation support the direction of their prediction? Explain.
23. Use software with the “house selling price” data file to allow interaction between
number of bedrooms and number of bathrooms in their effects on selling price.
a) Report the prediction equation.
b) Interpret the fit by showing the equation relating y and number of bedrooms
for homes with (i) two bathrooms, (ii) three bathrooms.
c) Use a test to analyze the significance of the interaction term. Interpret.
24. A multiple regression analysis investigates the relationship between Y = college
GPA and several explanatory variables, using a random sample of 195 students
at Slippery Rock University. First, high school GPA and total SAT score are
entered into the model. The sum of squared errors is SSE = 20. Next, parents’
education and parents’ income are added, to determine if they have an effect,
controlling for high school GPA and SAT. For this expanded model SSE = 19.
Test whether this complete model is significantly better than the one containing
only high school GPA and SAT. Report and interpret the P -value.
25. Table 11.18 shows results of regressing Y = birth rate (BIRTHS, number of
births per 1000 population) on x1 = women’s economic activity (ECON) and
x2 = literacy rate (LITERACY), using UN data for 23 nations.
a) Report the value of each of the following:
(i) rY X1(ii) rY X2
(iii) R2
(iv) TSS (v) SSE (vi) mean square error
(vii) s (viii) sy (ix) se for b1(x) t for H0: β1 = 0
(xi) P for H0: β1 = 0 against Ha: β1 6= 0
(xii) P for H0: β1 = 0 against Ha: β1 < 0
(xiii) F for H0: β1 = β2 = 0
(xiv) P for H0: β1 = β2 = 0
b) Report the prediction equation, and carefully interpret the three estimated
regression coefficients.
c) Interpret the correlations rY X1and rY X2
.
d) Report R2, and interpret its value.
e) Report the multiple correlation, and interpret.
f) Though inference may not be relevant for these data, report the F statistic
for H0: β1 = β2 = 0, report its P -value, and interpret.
484 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
g) Show how to construct the t statistic for H0: β1 = 0, report its df and
P -value for Ha: β1 6= 0, and interpret.
Table 11.18:
Mean Std Deviation N
BIRTHS 22.117 10.469 23
ECON 47.826 19.872 23
LITERACY 77.696 17.665 23
Correlations
BIRTHS ECON LITER
Correlation BIRTHS 1.00000 -0.61181 -0.81872
ECON -0.61181 1.00000 0.42056
LITERACY -0.81872 0.42056 1.00000
Sig.(2-tailed) BIRTHS . 0.0019 0.0001
ECON 0.0019 . 0.0457
LITERACY 0.0001 0.0457 .
Sum of
Squares DF Mean Square F Sig
Regression 1825.969 2 912.985 31.191 0.0001
Residual 585.424 20 29.271
Total 2411.393 22
Root MSE (Std. Error of the Estimate) 5.410 R Square 0.7572
Unstandardized Coeff. Standardized
B Std. Error Coeff. (Beta) t Sig
(Constant) 61.713 5.2453 11.765 0.0001
ECON -0.171 0.0640 -0.325 -2.676 0.0145
LITERACY -0.404 0.0720 -0.682 -5.616 0.0001
26. Refer to the previous exercise.
a) Find the partial correlation between Y and X1, controlling for X2. Interpret
both the partial correlation and its square.
b) Find the estimate of the conditional standard deviation, and interpret its
value.
c) Show how to find the estimated standardized regression coefficient for x1
using the unstandardized estimate and the standard deviations, and interpret
its value.
Chap. 11 Problems 485
d) Write the prediction equation using standardized variables. Interpret.
e) Find the predicted z-score for a country that is one standard deviation above
the mean on both predictors. Interpret.
27. Refer to Examples 11.1 and 11.8. Explain why the partial correlation between
crime rate and high school graduation rate is so different from the bivariate
correlation. (This is an example of Simpson’s paradox, which states that a
bivariate association can have a different direction than a partial association.)
28. For a group of 100 children of ages varying from 3 to 15, the correlation be-
tween vocabulary score on an achievement test and height of child is 0.65. The
correlation between vocabulary score and age for this sample is 0.85, and the
correlation between height and age is 0.75.
a) Show that the partial correlation between vocabulary and height, controlling
for age, is 0.036. Interpret.
b) Test whether this partial correlation is significantly nonzero. Interpret.
c) Is it plausible that the relationship between height and vocabulary is spuri-
ous, in the sense that it is due to their joint dependence on age? Explain.
29. A multiple regression model describes the relationship among a collection of
cities between Y = murder rate (number of murders per 100,000 residents) and
X1 = Number of police officers (per 100,000 residents)
X2 = Median length of prison sentence given to convicted murderers
(in years)
X3 = Median income of residents of city (in thousands of dollars)
X4 = Unemployment rate in city
These variables are observed for a random sample of thirty cities with popu-
lation size exceeding 35,000. For these cities, the prediction equation is y =
30 − 0.02x1 − 0.1x2 − 1.2x3 + 0.8x4, and y = 15, x1 = 100, x2 = 15, x3 = 13,
x4 = 7.8, sy = 8, sx1= 30, sx2
= 10, sx3= 2, sx4
= 2.
a) Can you tell from the coefficients of the prediction equation which explana-
tory variable has the greatest partial effect on Y ? Explain.
b) Find the standardized regression coefficients and interpret their values.
c) Write the prediction equation using standardized variables. Find the pre-
dicted z-score on murder rate for a city that is one standard deviation above
the mean on x1, x2, and x3, and one standard deviation below the mean on x4.
Interpret.
30. Exercise 11 showed a regression of violent crime rate on poverty rate and percent
living in metropolitan areas. The estimated standardized regression coefficients
are 0.473 for poverty rate and 0.668 for percent in metropolitan areas.
a) Interpret the estimated standardized regression coefficients.
b) Express the prediction equation using standardized variables, and explain
how it is used.
486 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Concepts and Applications
31. Refer to the student survey data set (Exercise 1.11). Using software, conduct
a regression analysis using Y = political ideology with predictors number of
times per week of newspaper reading and religiosity. Prepare a report, summa-
rizing your graphical analyses, bivariate models and interpretations, multiple
regression models and interpretations, inferences, checks of effects of outliers,
and overall summary of the relationships.
32. Repeat the previous exercise using Y = college GPA with predictors high school
GPA and number of weekly hours of physical exercise
33. Refer to the student data file you created in Exercise 1.12. For variables chosen
by your instructor, fit a multiple regression model and conduct descriptive and
inferential statistical analyses. Interpret and summarize your findings.
34. Using software with the “2005 statewide crime” data file at the text website,
conduct a regression analysis of murder rate with predictors poverty rate, the
percent living in urban areas, and percent of high school graduates. Conduct
descriptive and inferential analyses. Provide interpretations, and provide a
paragraph summary of your conclusions at the end of your report.
35. Repeat the previous exercise using violent crime rate as the response variable.
36. Refer to Exercise 34. Repeat this problem, excluding the observation for D.C.
Describe the effect on the various analyses of this observation.
37. Table 27 in Chapter 9 is the “UN data” data file at the text website. Construct
a multiple regression model containing two explanatory variables that provide
good predictions for the fertility rate. How did you select this model? (Hint:
One way is based on entries in the correlation matrix.)
38. In about 200 words, explain to someone who has never studied statistics what
multiple regression does and how it can be useful.
39. Analyze the “house selling price” data file at the text website (which were
introduced in Example 9.10), using selling price of home, size of home, number
of bedrooms, and taxes. Prepare a short report summarizing your analyses and
conclusions.
40. For Example 11.2 on mental impairment, Table 11.19 shows the result of adding
religious attendance as a predictor, measured as the approximate number of
times the subject attends a religious service over the course of a year. Write a
short report, interpreting the information from this table.
41. a study3 of mortality rates found in the U.S. that states with higher income
inequality tended to have higher mortality rates. The effect of income inequality
3A. Muller, BMJ, vol. 324, 2002
Chap. 11 Problems 487
Table 11.19:Variable Coefficient
Intercept 27.422Life events 0.0935
(0.0313)**
SES -0.0958
(0.0256)***
Religious attendance -0.0370(0.0219)
R2 0.358(n) (40)
disappeared after controlling for the percentage of a state’s residents that had
at least a high school education. Explain how these results relate to analyses
conducted using bivariate regression and multiple regression.
42. A 2002 study4 relating the percentage of a child’s life spent in poverty to number
of years of education completed by the mother and the percentage of a child’s
life spent in a single parent home reported the results shown in Table 11.20.
Prepare a one-page report explaining how to interpret the results in this table.
Table 11.20:
Unstandardized Standardized
Coefficients Coefficients
B Std. Error Beta t Sig.
(Constant) 56.401 2.121 12.662 .000
% single parent 0.323 .014 .295 11.362 .000
mother school -3.330 .152 -.290 -11.294 .000
F 611.6 (df = 2, 4731) Sig .000
R 0.453 R Square 0.205
43. The Economist magazine5 developed a quality-of-life index for nations as the
predicted value obtained by regressing an average of life-satisfaction scores from
several surveys on gross domestic product (GDP, per capita, in dollars), life ex-
pectancy (in years), an index of political freedom (from 1 = completely free
4http://www.heritage.org/Research/Family/cda02-05.cfm5http://www.economist.com/media/pdf/QUALITYOFLIFE.pdf
488 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
to 7 = unfree), the percentage unemployed, the divorce rate (on a scale of 1
for lowest rates to 5 for highest), latitude (to distinguish between warmer and
cold climes), a political stability measure, gender equality defined as the ratio
of average male and female earnings, and community life (1 if country has high
rate of church attendance or trade-union membership, 0 otherwise). Table 11.21
shows results of the model fit for 74 countries, for which the multiple correla-
tion is 0.92. The study used the prediction equation to predict the quality of
life in 2005 for 111 nations. The top 10 ranks were for Ireland, Switzerland,
Norway, Luxembourg, Sweden, Australia, Iceland, Italy, Denmark, and Spain.
Other ranks included 13 for the U.S., 14 for Canada, 15 for New Zealand, 16
for Netherlands, and 29 for the U.K.
a) Which variables would you expect to have negative effects on quality of life?
Is this supported by the results?
b) The study states that “GDP explains more than 50% of the variation in life
satisfaction.” How does this relate to a summary measure of association?
c) The study reported that “Using so-called Beta coefficients from the regres-
sion to derive the weights of the various factors, life expectancy and GDP were
the most important.” Explain what was meant by this.
d) Although GDP seems to be an important predictor, in a bivariate sense and
a partial sense, Table 11.21 reports a very small coefficient, 0.00003. Why do
you think this is?
e) The study mentioned other predictors that were not included because they
provided no further predictive power. For example, the study stated that ed-
ucation seemed to have an effect mainly through its effects on other variables
in the model, such as GDP, life expectancy, and political freedom. Does this
mean there is no association between education and quality of life? Explain.
Table 11.21:Coefficients Standard error t statistic
Constant 2.796 0.789 3.54
GDP per person 0.00003 0.00001 3.52
Life expectancy 0.045 0.011 4.23
Political freedom −0.105 0.056 −1.87
Unemployment − 0.022 0.010 − 2.21
Divorce rate −0.188 0.064 −2.93
Latitude −1.353 0.469 −2.89
Political stability 0.152 0.052 2.92
Gender equality 0.742 0.543 1.37
Community life 0.386 0.124 3.13
44. A recent article6 used multiple regression to predict attitudes toward homosex-
6T. Shackelford and A. Besser, Individual Differences Research, 2007
Chap. 11 Problems 489
uality. The researchers found that the effect of number of years of education on
a measure of tolerance toward homosexuality varied from essentially no effect
for political conservatives to a considerably positive effect for political liberals.
Explain how this is an example of statistical interaction, and explain how it
would be handled by a multiple regression model.
45. In the study mentioned in the previous exercise, a separate model did not
contain interaction terms. The best predictor of attitudes toward homosexuality
was educational level, with an estimated standardized regression coefficient of
0.21. The authors also reported, “Controlling for other variables1, an additional
year of education completed was associated with a .09 rating unit increase in
attitudes toward homosexuality.” In comparing the effect of education with the
effects of other predictors in the model, such as the age of the subject, explain
the purpose of estimating standardized coefficients. Explain how to interpret
the one reported for education.
46. For a linear model with two explanatory variables X1 and X2, which of the
following must be incorrect? Why?
a) rY X1= 0.01, rY X2
= −0.2, R = .75
b) rY X1= 0.01, rY X2
= −0.75, R = 0.2
c) rY X1= 0.4, rY X2
= 0.4, R = 0.4
47. In Exercise 1 on Y = college GPA, X1 = high school GPA, and X2 = college
board score, E(Y ) = 0.20 + 0.50x1 + 0.002x2. True or false: Since β1 = 0.50
is larger than β2 = 0.002, this implies that X1 has the greater partial effect on
Y . Explain.
48. Table 11.20 shows results of fitting various regression models to data on Y = col-
lege GPA, X1 = high school GPA, X2 = mathematics entrance exam score, and
X3 = verbal entrance exam score. Indicate which of the following statements
are false. Give a reason for your answer.
Table 11.22:Model
Estimates E(Y ) = α + βx1 E(Y ) = α + β1x1 + β2x2 E(Y ) = α + β1x1 + β2x2 + β3x3
Coefficient of x1 0.450 0.400 0.340Coefficient of x2 0.003 0.002Coefficient of x3 0.002R2 0.25 0.34 0.38
a) The correlation between Y and X1 is positive.
b) A one-unit increase in x1 corresponds to a change of 0.45 in the estimated
mean of Y , controlling for x2 and x3.
c) The value of SSE increases as we add additional variables to the model.
490 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
d) It follows from the sizes of the estimates for the third model that X1 has
the strongest partial effect on Y .
e) The value of r2Y X3
is 0.40.
f) The partial correlation rY X1·X2is positive.
g) The partial correlation rY X1·X3could be negative.
h) Controlling for X1, a 100-unit increase in X2 corresponds to a predicted
increase of 0.3 in college GPA.
i) For the first model, the estimated standardized regression coefficient equals
0.50.
49. In regression analysis, which of the following statements must be false? Why?
a) For the model E(Y ) = α + β1x1, Y is significantly related to x1 at the 0.05
level, but when x2 is added to the model, Y is not significantly related to x1 at
the 0.05 level.
b) The estimated coefficient of x1 is positive in the bivariate model, but negative
in the multiple regression model.
c) When the model is refitted after Y is multiplied by 10, R2, rY X1, rY X1·X2
, b∗1,
the F statistics and t statistics do not change.
d) rY X2·X1cannot exceed rY X2
.
e) The F statistic for testing that all the regression coefficients equal 0 has
P < 0.05, but none of the individual t tests have P < 0.05.
f) If you compute the standardized regression coefficient for a bivariate model,
you always get the correlation.
g) r2Y X1
= r2Y X2
= 0.6 and R2 = 0.6.
h) r2Y X1
= r2Y X2
= 0.6 and R2 = 1.2.
i) The correlation between Y and Y equals −0.10.
j) If x3 is added to a model already containing x1 and x2, then if the prediction
equation has b3 = 0, R2 stays the same.
k) For every F test, there is an equivalent test using the t distribution.
For Exercises 50–54, select the correct answer(s) and indicate why the other
responses are inappropriate. (More than one response may be correct.)
50. If Y = 2 + 3x1 + 5x2 − 8x3, then controlling for x2 and x3, the predicted mean
change in Y when x1 is increased from 10 to 20 equals
a) 3 b) 30 c) 0.3 d) Cannot be given—depends on specific values of x2
and x3.
51. If Y = 2 + 3x1 + 5x2 − 8x3,
a) The strongest correlation is between Y and X3.
b) The variable with the strongest partial influence on Y is X2.
c) The variable with the strongest partial influence on Y is X3, but one cannot
tell from this equation which pair has the strongest correlation.
d) None of the above.
52. If Y = 2 + 3x1 + 5x2 − 8x3,
a) rY X3< 0
Chap. 11 Problems 491
b) rY X3·X1< 0
c) rY X3·X1,X2< 0
d) Insufficient information to answer.
e) Answers (a), (b), and (c) are all correct.
53. If Y = 2 + 3x1 + 5x2 − 8x3, and H0: β3 = 0 is rejected at the 0.05 level, then
a) H0: ρY X3·X1,X2= 0 is rejected at the 0.05 level.
b) H0: ρY X3= 0 is rejected at the .05 level.
c) rY X3·X1,X2> 0
54. The F test for comparing a complete model to a reduced model
a) Can be used to test the significance of a single regression parameter in a
multiple regression model.
b) Can be used to test H0: β1 = · · · = βk = 0 in a multiple regression equation.
c) Can be used to test H0: No interaction, in the model
E(Y ) = α + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3.
d) Can be used to test whether the model E(Y ) = α + β1x1 + β2x2 gives a
significantly better fit than the model E(Y ) = α + β1x1 + β2x3.
55. Explain the difference in the purposes of the correlation, the multiple correla-
tion, and the partial correlation.
56. Let Y = height, X1 = length of right leg, X2 = length of left leg. Describe
what you expect for the relative sizes of the three pairwise correlations, R, and
rY X2·X1.
57. Give an example of three variables for which you expect β 6= 0 in the model
E(Y ) = α + βx1 but β1 = 0 in the model E(Y ) = α + β1x1 + β2x2.
58. For the models E(Y ) = α + βx and E(Y ) = α + β1x1 + β2x2, express null
hypotheses in terms of correlations that are equivalent to the following:
a) H0: β = 0
b) H0: β1 = β2 = 0
c) H0: β2 = 0
59. * Whenever X1 and X2 are uncorrelated, then R2 for the model E(Y ) = α +
β1x1 + β2x2 satisfies R2 = r2Y X1
+ r2Y X2
. In this case, draw a figure that
portrays the variability in Y , the part of that variability explained by each of
X1 and X2, and the total variability explained by both of them together.
60. * Which of the following sets of correlations would you expect to yield the
highest R2 value? Why?
a) rY X1= 0.4, rY X2
= 0.4, rX1X2= 0.0
b) rY X1= 0.4, rY X2
= 0.4, rX1X2= 0.5
c) rY X1= 0.4, rY X2
= 0.4, rX1X2= 1.0
492 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
61. * Suppose the correlation between Y and X1 equals the multiple correlation
between Y and X1 and X2. What does this imply about the partial correlation
rY X2·X1? Interpret.
62. * Software reports four types of sums of squares in multiple regression models.
The Type I (sometimes called sequential) sum of squares represents the vari-
ability explained by a variable, controlling for variables previously entered into
the model. The Type III (sometimes called partial) sum of squares represents
the variability explained by that variable, controlling for all other variables in
the model.
a) For any multiple regression model, explain why the Type I sum of squares
for x1 is the regression sum of squares for the bivariate model with x1 as the
predictor, whereas the Type I sum of squares for x2 equals the amount by which
SSE decreases when x2 is added to the model.
b) Explain why the Type I sum of squares for the last variable entered into a
model is the same as the Type III sum of squares for that variable.
63. * The sample value of R2 tends to overestimate the population value, because
the sample data fall closer to the sample prediction equation than to the true
population regression equation. This bias is greater if n is small or the number
of predictors k is large. A somewhat better estimate is adjusted R2,
R2adj = 1 − s2
s2y
= R2 −[
k
n − (k + 1)
]
(1 − R2),
where s2 is the estimated conditional variance (i.e., the mean square error) and
s2Y is the sample variance of Y .
a) Suppose R2 = 0.339 for a model with k = 2 explanatory variables (such as
in Table 11.5). Find R2adj for the following sample sizes: 10, 40 (as in the text
example), 100, and 1000. Show that R2adj approaches R2 in value as n increases.
b) Show that R2adj is negative when R2 < k/(n − 1). This is undesireable, and
R2adj is equated to 0 in such cases. (Also, unlike R2, R2
adj could decrease when
we add an explanatory variable to a model.)
64. * Let R2Y (X1,...Xk) denote R2 for the multiple regression model with k explana-
tory variables. Explain why
rY Xk·X1,...,Xk−1=
R2Y (X1,...,Xk) − R2
Y (X1,...,Xk−1)
1 − RY (X1,...,Xk−1).
65. * The numerator R2 − r2Y X1
of the squared partial correlation r2Y X2·X1
gives
the increase in the proportion of explained variation from adding X2 to the
model. This increment, denoted by r2Y (X2·X1)
, is called the squared semipar-
tial correlation. One can use squared semipartial correlations to partition the
Chap. 11 Problems 493
variation in the response variable. For instance, for three explanatory variables,
R2Y (X1,X2,X3) = r2
Y X1+ (R2
Y (X1,X2) − r2Y X1
) + (R2Y (X1,X2,X3) − R2
Y (X1,X2))
= r2Y X1
+ r2Y (X2·X1) + r2
Y (X3·X1,X2).
The total variation in Y explained by X1, X2, and X3 together partitions into:
(i) the proportion explained by X1 (i.e., r2Y X1
), (ii) the proportion explained
by X2 beyond that explained by X1 (i.e., r2Y (X2·X1)), and (iii) the proportion
explained by X3 beyond that explained by X1 and X2 (i.e, r2Y (X3·X1,X2)).
These correlations have the same ordering as the t statistics for testing partial
effects, and some researchers use them as indices of importance of the predictors.
a) In Example 11.2 on mental impairment, show that r2Y (X2·X1)
= 0.20 and
r2Y (X1·X2) = 0.18. Interpret.
b) Explain why the squared semipartial correlation r2Y (X2·X1) cannot be larger
than the squared partial correlation r2Y X2·X1
.
66. * The least squares prediction equation provides predicted values Y with the
strongest possible correlation with Y , out of all possible prediction equations
of that form. That is, the least squares equation yields the best prediction
of Y in the sense that it represents the linear reduction of X1, . . . , Xk to the
single variable that is most strongly correlated with Y . Based on this property,
explain why the multiple correlation cannot decrease when one adds a variable
to a multiple regression model. (Hint: The prediction equation for the simpler
model is a special case of a prediction equation for the full model that has
coefficient 0 for the added variable.)
67. * Let b∗i denote the estimated standardized regression coefficient when Xi is
treated as the response variable and Y as an explanatory variable, controlling
for the same set of other variables. Then, b∗i need not equal b∗i . The partial
correlation between Y and Xi, which is symmetric in the order of the two
variables, satisfies
r2Y Xi·— = b∗i b∗i .
a) From this formula, explain why the partial correlation must fall between b∗iand b∗i . (Note: When a =
√bc, a is said to be the geometric average of b and
c.)
b) Even though b∗i does not necessarily fall between −1 and +1, explain why
b∗i b∗i cannot exceed 1.
68. * Chapters 12 and 13 show how to incorporate categorical predictors in regres-
sion models, and this exercise provides a preview. Table 11.21 shows part of a
printout for a model for the “house selling price 2” data set at the text website,
with Y = selling price of home, X1 = size of home, and X2 = whether the
house is new (1 = yes, 0 = no).
a) Report the prediction equation. By setting x2 = 0 and then 1, construct the
494 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
two separate lines for older and for new homes. Note that the model implies
that the slope effect of size on selling price is the same for each.
b) Since x2 takes only the values 0 and 1, explain why the coefficient of x2
estimates the difference of mean selling prices between new and older homes,
controlling for house size.
Table 11.23:
B Std. Error t Sig
(Constant) -26.089 5.977 -4.365 0.0001
SIZE 72.575 3.508 20.690 0.0001
NEW 19.587 3.995 4.903 0.0001
69. * Refer to the previous exercise. When we add an interaction term, we get
y = −16.6 + 66.6x1 − 31.8x2 + 29.4(x1x2).
a) Interpret the fit by reporting the prediction equation between selling price
and size of house separately for new homes (x2 = 1) and for old homes (x2 =
0). Interpret. (This fit is equivalent to fitting lines separately to the data for
new homes and for old homes.)
b) Interpret the fit by reporting the difference between the predicted selling
prices for new and old homes for houses with x1 equal to (i) 1.5, (ii) 2.0, (iii)
2.5.
c) A plot of the data shows an outlier, a new home with a very high selling
price. When that observation is removed from the data set and the model is
re-fitted, y = −16.6 + 66.6x1 + 9.0x2 + 5.0(x1x2). Re-do (a), and explain how
an outlier can have a large impact on a regression analysis.
11.9.1 Bibliography
DeMaris, A. (2004). Regression with Social Data: Modeling Continuous and Limited
Response Variables. Wiley.
Draper, N. R. and Smith, H. (1998). Applied Regression Analysis, 3rd ed., Wiley.
Holzer, C. E., III (1977). The Impact of Life Events on Psychiatric Symptomatology.
Ph.D. dissertation, University of Florida, Gainesville.
Kutner, M. H., Nachtsheim, C. J., and Neter, J. (2004). Applied Linear Regression
Models, 4th ed. McGraw-Hill/Irwin.
Weisberg, S. (2005). Applied Linear Regression, 3rd ed., Wiley.
Chap. 11 Problems 495
Figure 11.5: A Scatterplot Matrix: Scatterplots for Pairs of Variables fromTable 11.1
((Fig. 11.5 from 3e))
Figure 11.6: Partial Regression Plot for Mental Impairment and Life Events,Controlling for SES. This plots the residuals from regressing mental impairmenton SES against the residuals from regressing life events on SES.
((Include new Fig. 11.6))
Figure 11.7: Partial Regression Plot for Mental Impairment and SES, Control-ling for Life Events. This plots the residuals from regressing mental impairmenton life events against the residuals from regressing SES on life events.
((Include new Fig. 11.7))
Figure 11.8: R2 Does Not Increase Much When x3 Is Added to the Model
Already Containing x1 and x2
((Fig. 11.8 in 3e))
Figure 11.9: The F Distribution and the P -Value for F Tests. Larger F valuesgive stronger evidence against H0.
((Fig. 11.9 in 3e))
Figure 11.10: Portrayal of Interaction between x1 and x2 in their Effects on Y .
((Fig. 11.10 in 3e; if possible, change to lower-case letters))
496 CHAPTER 11. MULTIPLE REGRESSION AND CORRELATION
Figure 11.11: Representation of r2Y X2·X1
as the Proportion of Variability ThatCan Be Explained by X2, of that Left Unexplained by X1
((Fig 11.11 in 3e))
Figure 11.12:
-----+----+----+----+----+----+----+----+----+----+----+----+-
| 1 |
| |
| |
C | 1 1 |
R 100 + 1 1 +
I | 1 2 1 |
M | 1 1 |
E | 1 1 1 |
| 1 11 11 1 1 |
| 11 1 1 1 11 1 1 1 |
50 + 1 1 1 2 1 1 +
| 1 1 11 1 1 1 |
| 1 3 1 1 1 1 |
| 1 1 1 |
| 1 1 2 1 |
| 1 2 11 |
0 + 1 +
-----+----+----+----+----+----+----+----+----+----+----+----+--
14 16 18 20 22 24 26 28 30 32 34 36
INCOME
Chap. 11 Problems 497
Figure 11.13:
Partial Regression Residual Plot
-+------+------+------+------+------+------+------+------+---
50 + 1 +
| 1 |
| 1 |
40 + +
| 1 1 |
| 1 1 |
30 + +
| 1 1 |
| 1 1 1 |
20 + 1 1 1 +
| 2 |
CRIME | 1 2 |
10 + 1 1 1 1 |
| 1 1 1 |
| 1 |
0 + 1 +
| 1 1 1 1 1 |
| 1 1 1 |
-10 + 12 1 1 1 +
| 1 1 1 1 1 1 1 1 |
| 11 1 1 1 2 |
-20 + 1 +
| 2 |
| 1 1 |
-30 + 1 +
| 1 1 |
| |
-40 + +
-+------+------+------+------+------+------+------+------+---
-6 -4 -2 0 2 4 6 8 10
INCOME