+ All Categories
Home > Documents > Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Date post: 18-Jan-2018
Category:
Upload: holly-ray
View: 220 times
Download: 0 times
Share this document with a friend
Description:
Copyright © 2009 Cengage Learning 17.3 The Model… We now assume we have k independent variables potentially related to the one dependent variable. This relationship is represented in this first order linear equation: In the one variable, two dimensional case we drew a regression line; here we imagine a response surface. error variable dependent variable independent variables coefficients
54
Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression
Transcript
Page 1: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.1

Chapter 19

Multiple Regression

Page 2: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.2

Multiple Regression…The simple linear regression model was used to analyze how one interval variable (the dependent variable y) is related to one other interval variable (the independent variable x).

Multiple regression allows for any number of independent variables.

We expect to develop models that fit the data better than would a simple linear regression model.

Page 3: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.3

The Model…We now assume we have k independent variables potentially related to the one dependent variable. This relationship is represented in this first order linear equation:

In the one variable, two dimensional case we drew a regression line; here we imagine a response surface.

error variable

dependentvariable

independent variables

coefficients

Page 4: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.4

Required Conditions…For these regression methods to be valid the following four conditions for the error variable ( ) must be met:• The probability distribution of the error variable ( ) is normal.• The mean of the error variable is 0. • The standard deviation of is , which is a constant.• The errors are independent.

Page 5: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.5

Estimating the Coefficients…The sample regression equation is expressed as:

We will use computer output to:Assess the model…

How well it fits the dataIs it usefulAre any required conditions violated?

Employ the model…Interpreting the coefficientsPredictions using the prediction equationEstimating the expected value of the dependent variable

Page 6: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.6

Regression Analysis Steps… Use a computer and software to generate the coefficients and the statistics used to assess the model.

Diagnose violations of required conditions. If there are problems, attempt to remedy them.

Assess the model’s fit.standard error of estimate, coefficient of determination, F-test of the analysis of variance.

If , , and are OK, use the model to predict or estimate the expected value of the dependent variable.

Page 7: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.7

Example 17.1La Quinta Motor Inns is a moderately priced chain of motor inns located across the United States. Its market is the frequent business traveler.

The chain recently launched a campaign to increase market share by building new inns. The management of the chain is aware of the difficulty in choosing locations for new motels. Moreover, making decisions without adequate information often results in poor decisions.

Consequently the chain management acquired data on 100 randomly selected inns belonging to La Quinta. The objective was to predict which sites are likely to be profitable.

Page 8: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.8

Example 17.1To measure profitability La Quinta used operating margin, which is the ratio of the sum of profit, depreciation, and interest expenses divided by total revenue.

The higher the operating margin, the greater the success of the inn.

La Quinta defines profitable inns as those with an operating margin in excess of 50% and unprofitable inns with margins of less than 30%.

Page 9: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.9

Example 17.1After a discussion with a number of experienced managers La Quinta decided to select one or two independent variables from each of the categories:

CompetitionMarket awarenessDemand generatorsDemographicsPhysical

Page 10: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.10

Example 17.1To measure the degree of competition they determined the total number of motel and hotel rooms within 3 miles of each La Quinta inn.

Market awareness was measured by the number of miles to the closest competing motel.

Two variables that represent sources of customers were chosen. The amount of office space and college and university enrollment in the surrounding community are demand generators. Both of these are measures of economic activity.

Page 11: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.11

Example 17.1A demographic variable that describes the community is the median household income.

Finally, as a measure of the physical qualities of the location La Quinta chose the distance to the downtown core.

Page 12: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.12

Example 17.1 – La Quinta Inns…Where should La Qunita locate a new motel? Factors influencing profitability… Profitability

Competition Market Awareness

Demand Generator

sCommunit

y

Distance to downtown.

Medianhouseholdincome.

Distance tothe nearestLa Quinta inn.

# of rooms within 3 mile radius

Physical

Offices, Higher Ed.

fact

orm

easu

re

*these need to be interval data !

Page 13: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.13

Example 17.1 – La Quinta Inns…Where should La Qunita locate a new motel?

Several possible predictors of profitability were identified, and data (Xm17-01) were collected. Its believed that operating margin (y) is dependent upon these factors:x1 = Total motel and hotel rooms within 3 mile radiusx2 = Number of miles to closest competitionx3 = Volume of office space in surrounding communityx4 = College and university student numbers in communityx5 = Median household income in communityx6 = Distance (in miles) to the downtown core.

Page 14: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.14

Transformation…Can we transform this data:

into a mathematical model that looks like this:

margincompetition

(i.e. # of rooms)awareness(distance tonearest alt.)

physical(distance todowntown)

Page 15: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.15

Example 17.1…In Excel: Data > Data Analysis > Regression

COMPUTE

Page 16: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.16

The Model…Although we haven’t done any assessment of the model yet, at first pass:

it suggests that increases inThe number of miles to closest

competition, office space, student enrollment and household income will positively impact the operating margin.

Likewise, increases in the total number of lodging rooms within a short distance and the distance from downtown will negatively impact the operating margin…

INTERPRET

Page 17: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.17

Model Assessment…We will assess the model in three ways:

Standard error of estimate,Coefficient of determination, andF-test of the analysis of variance.

Page 18: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.18

Standard Error of Estimate… In multiple regression, the standard error of estimate is defined as:

n is the sample size and k is the number of independent variables in the model. We compare this value to the mean value of y:

It seems the standard error of estimate is not particularly small. What can we conclude?

calculate

Page 19: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.19

Coefficient of Determination…Again, the coefficient of determination is defined as:

This means that 52.51% of the variation in operating margin is explained by the six independent variables, but 47.49% remains unexplained.

Page 20: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.20

Adjusted R2 value…What’s this?

The “adjusted” R2 is:the coefficient of determination adjusted for degrees of freedom.

It takes into account the sample size n, and k, the number of independent variables, and is given by:

Page 21: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.21

Testing the Validity of the Model…In a multiple regression model (i.e. more than one independent variable), we utilize an analysis of variance technique to test the overall validity of the model. Here’s the idea:

H0:H1: At least one is not equal to

zero.If the null hypothesis is true, none of the independent variables is linearly related to y, and so the model is invalid.If at least one is not equal to 0, the model does have some validity.

Page 22: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.22

Testing the Validity of the Model…ANOVA table for regression analysis…Source of Variation

degrees of freedom

Sums of Squares Mean Squares F-Statistic

Regression k SSR MSR = SSR/k F=MSR/MSE

Error n–k–1 SSE MSE = SSE/(n–k-1)

Total n–1

A large value of F indicates that most of the variation in y is explained by the regression equation and that the model is valid.

A small value of F indicates that most of the variation in y is unexplained.

Page 23: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.23

Testing the Validity of the Model…Our rejection region is:

Since Excel calculated the F statistic as F = 17.14 and our FCritical = 2.17, (and the p-value is zero) we reject H0 in favor of H1, that is:

“there is a great deal of evidence to inferthat the model is valid”

Page 24: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.24

Table 17.2… Summary

SSE R2 FAssessme

nt of Model

0 0 1 Perfect

small small close to 1 large Good

large large close to 0 small Poor

0 0 Useless

Once we’re satisfied that the model fits the data as well as possible, and that the required conditions are satisfied, we can interpret and test the

individual coefficients and use the model to predict and estimate…

Page 25: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.25

Interpreting the Coefficients*Intercept (b0) 38.14 • This is the average operating margin when all of the independent variables are zero. Its meaningless to try and interpret this value, particularly if 0 is outside the range of the values of the independent variables (as is the case here).

# of motel and hotel rooms (b1) –.0076 • Each additional room within three miles of the La Quinta inn, will decrease the operating margin. (I.e. for each additional 1000 rooms the margin decreases by 7.6%)

Distance to nearest competitor (b2) 1.65 • For each additional mile that the nearest competitor is to a La Quinta inn, the average operating margin increases by 1.65%.

*in each case we assume all other variables are held constant…

Page 26: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.26

Interpreting the Coefficients*Office space (b3) .020 • For each additional thousand square feet of office space, the margin will increase by .020. E.g. an extra 100,000 square feet of office space will increase margin (on average) by 2.0%.

Student enrollment (b4) .21 • For each additional thousand students, the average operating margin increases by .21%

Median household income (b5) .41 • For each additional thousand dollar increase in median household income, the average operating margin increases by .41%

Distance to downtown core (b6) –.23 • For each additional mile to the downtown center, the operating margin decreases on average by .23%

*in each case we assume all other variables are held constant…

Page 27: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.27

Testing the Coefficients…For each independent variable, we can test to determine whether there is enough evidence of a linear relationship between it and the dependent variable for the entire population…

H0: = 0H1: ≠ 0

(for i = 1, 2, …, k) and using:

as our test statistic (with n–k–1 degrees of freedom).

Page 28: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.28

Testing the CoefficientsWe can use our Excel output to quickly test each of the six coefficients in our model…

Thus, the number of hotel and motel rooms, distance to the nearest motel, amount of office space, and median household income are linearly related to the operating margin. There is no evidence to infer that college enrollment and distance to downtown center are linearly related to operating margin.

INTERPRET

Page 29: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.29

Using the Regression EquationMuch like we did with simple linear regression, we can produce a prediction interval for a particular value of y.

As well, we can produce the confidence interval estimate of the expected value of y.

Excel’s tools will do the work; our role is to set up the problem, understand and interpret the results.

Page 30: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.30

Using the Regression EquationPredict the operating margin if a La Quinta Inn is built at a location where… There are 3815 rooms within 3 miles of the site. The closest other hotel or motel is .9 miles away. The amount of office space is 476,000 square feet. There is one college and one university nearby with a total enrollment of 24,500 students. Census data indicates the median household income in the area (rounded to the nearest thousand) is $35,000, and, The distance to the downtown center is11.2 miles. our xi’s…

Page 31: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.31

Using the Regression EquationWe add one row (our given values for the independent variables) to the bottom of our data set:

Then we use:Add-Ins > Data Analysis Plus >Prediction Intervalto crunch the numbers…

Page 32: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.32

Prediction Interval…We predict that the operating margin will fall between 25.4 and 48.8. If management defines a profitable inn as one with an operating margin greater than 50% and an unprofitable inn as one with an operating margin below 30%, they will pass on this site, since the entire prediction interval is below 50%.

INTERPRET

Page 33: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.33

Confidence IntervalThe expected operating margin of all sites that fit this category is estimated to be between 33.0 and 41.2.We interpret this to mean that if we built inns on an infinite number of sites that fit the category described, the mean operating margin would fall between 33.0 and 41.2. In other words, the average inn would not be profitable either…

INTERPRET

Page 34: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.34

Regression Diagnostics• Calculate the residuals and check the following:

• Is the error variable nonnormal?• Draw the histogram of the residuals• Is the error variance constant?• Plot the residuals versus the predicted values of y.

• Are the errors independent (time-series data)?• Plot the residuals versus the time periods.• Are there observations that are inaccurate or do not belong to the target population?

• Double-check the accuracy of outliers and influential observations.

Page 35: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.35

Regression Diagnostics• Multiple regression models have a problem that simple regressions do not, namely multicollinearity.

• It happens when the independent variables are highly correlated.

• We’ll explore this concept through the following example…

Page 36: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.36

Example 17.2• A real estate agent wanted to develop a model to predict the selling price of a home. The agent believed that the most important variables in determining the price of a house are its:

• size,• number of bedrooms, • and lot size.• The proposed model is:

• Housing market data has been gathered and Excel is the analysis tool of choiceXm17-02

Page 37: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.37

Example 17.2• Data > Data Analysis > Regression

The F-test indicates the mode is valid…

…but these t-stats suggest none of the variables are related to the selling price.

Page 38: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.38

Example 17.2Unlike the t-tests in the multiple regression model, these three t-tests tell us that the number of

bedrooms, the house size, and the lot size are all linearly related to

the price…

Page 39: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.39

Example 17.2• How to account for this apparent contradiction?

• The answer is that the three independent variables are correlated with each other !

(i.e. this is reasonable: larger houses have more bedrooms and are situated on larger lots, and smaller houses have fewer bedrooms and are located on smaller lots.)

multicollinearity affected the t-tests so that they implied that none of the independent variables is linearly related to price when, in fact, all are

Page 40: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.40

Regression Diagnostics – Time Series• The Durbin-Watson test allows us to determine whether there is evidence of first-order autocorrelation — a condition in which a relationship exists between consecutive residuals, i.e. ei-1 and ei (i is the time period). The statistic for this test is defined as:

• d has a range of values: 0 ≤ d ≤ 4.

Page 41: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.41

Durbin–Watson

0 d 4Small values of d (d < 2) indicate a positive first-order autocorrelation.

Large values of d (d > 2) imply a negative first-order autocorrelation.

Page 42: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.42

Durbin–Watson (one-tail test)• To test for positive first-order autocorrelation:

• If d < dL , we conclude that there is enough evidence to show that positive first-order autocorrelation exists.

• If d > dU , we conclude that there is not enough evidence to show that positive first-order autocorrelation exists.

• And if dL ≤ d ≤ dU , the test is inconclusive.

dUdL

positive first-order autocorrelation

existspositive first-order

autocorrelation does not existinconclusive

test

dL, dU from table 11, appendix B

Page 43: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.43

Durbin–Watson (one-tail test)• To test for negative first-order autocorrelation:

• If d > 4 – dL , we conclude that there is enough evidence to show that negative first-order autocorrelation exists.

• If d < 4 – dU , we conclude that there is not enough evidence to show that negative first-order autocorrelation exists.

• And if 4 – dU ≤ d ≤ 4 – dL , the test is inconclusive.

4-dU 4-dL

negative first-order autocorrelation does not exist

negative first-order

autocorrelation exists

inconclusive test

dL, dU from table 11, appendix B

Page 44: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.44

Durbin–Watson (two-tail test)• To test for first-order autocorrelation:

• If d < dL or d > 4 – dL , first-order autocorrelation exists.

• If d falls between dL and dU or between 4 – dU and 4 – dU , the test is inconclusive.

• If d falls between dU and 4 – dU there is no evidence of first order autocorrelation.

4-dU 4-dL

exists existsinconclusive

dUdL 2 40

inconclusive doesn’t exist

Page 45: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.45

Example 17.3Can we create a model that will predict lift ticket sales at a ski hill based on two weather parameters?

Variables:y - lift ticket sales during Christmas week,x1 - total snowfall (inches), andx2 - average temperature (degrees Fahrenheit)

Our ski hill manager collected 20 years of data. Xm17-03

Page 46: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.46

Example 17.3

Both the coefficient of determination and the p-value of the F-test indicate the model is

poor…

Neither variable is linearly related to ticket sale…

Page 47: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.47

Example 17.3• The histogram of residuals…

• reveals the errors may be normally distributed…

Page 48: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.48

Example 17.3• In the plot of residuals versus predicted values (testing for heteroscedasticity) — the error variance appears to be constant…

Page 49: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.49

Example 17.3 Durbin-Watson• Apply the Durbin-Watson Statistic from Data Analysis Plus to the entire list of residuals.

Page 50: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.50

Example 17.3To test for positive first-order autocorrelation with α = .05, we find in Table 8(a) in Appendix BdL = 1.10 and dU = 1.54

 The null and alternative hypotheses areH0 : There is no first-order autocorrelation.H1 : There is positive first-order autocorrelation. The rejection region is d < dL = 1.10. Since d = .59, we reject

the null hypothesis and conclude that there is enough evidence

to infer that positive first-order autocorrelation exists.

Page 51: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.51

Example 17.3Autocorrelation usually indicates that the model needs to

include an independent variable that has a time-ordered

effect on the dependent variable.

The simplest such independent variable represents the time

periods. We included a third independent variable that records the number of years since the year the data were gathered. Thus, x3 = 1, 2,..., 20. The new model is

y = β0 + β1x1 + β2x2 + β3x3 + ε

Page 52: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.52

Example 17.3

The fit of the model is high,The model is valid…

Snowfall and time are linearly related to ticket sales; temperature is not…

our new variable

Page 53: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.53

Example 17.3• If we re-run the Durbin-Watson statistic against the residuals from our Regression analysis,

• we can conclude that there is not enough evidence to infer the presence of first-order autocorrelation. (Determining dL and dU is left as an exercise for the reader…)

• Hence, we have improved out model dramatically!

Page 54: Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.

Copyright © 2009 Cengage Learning 17.54

Example 17.3Notice that the model is improved dramatically.

The F-test tells us that the model is valid. The t-tests tell us that both

the amount of snowfall and time are significantly linearly related to the

number of lift tickets.

This information could prove useful in advertising for the resort. For

example, if there has been a recent snowfall, the resort could

emphasize that in its advertising.

If no new snow has fallen, it may emphasize their snow-making facilities.


Recommended