Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | charleen-hubbard |
View: | 239 times |
Download: | 4 times |
© 2004 Prentice-Hall, Inc. Chap 15-1
Basic Business Statistics(9th Edition)
Chapter 15Multiple Regression Model
Building
© 2004 Prentice-Hall, Inc. Chap 15-2
Chapter Topics
The Quadratic Regression Model Using Transformations in Regression
Models Influence Analysis Collinearity Model Building Pitfalls in Multiple Regression and Ethical
Issues
© 2004 Prentice-Hall, Inc. Chap 15-3
The Quadratic Regression Model
Relationship between the Response Variable and the Explanatory Variable is a Quadratic Polynomial Function
Useful When Scatter Diagram Indicates Non-Linear Relationship
Quadratic Model :
The Second Explanatory Variable is the Square of the First Variable
20 1 1 2 1i i i iY X X
© 2004 Prentice-Hall, Inc. Chap 15-4
Quadratic Regression Model(continued)
X1
Y
X1X1
YYY
2 > 0 2 > 0 2 < 0 2 < 0
2 = the coefficient of the quadratic term
X1
Quadratic model may be considered when a scatter diagram takes on the following shapes:
© 2004 Prentice-Hall, Inc. Chap 15-5
Testing for Significance: Quadratic Model
Testing for Overall Relationship Similar to test for linear model F test statistic =
Testing the Quadratic Effect Compare quadratic model
with the linear model
Hypotheses (No quadratic effect) (Quadratic effect is present)
MSR
MSE
20 1 1 2 1i i i iY X X
0 1 1i i iY X
0 2: 0H 1 2: 0H
© 2004 Prentice-Hall, Inc. Chap 15-6
Heating Oil ExampleOil (Gal) Temp Insulation
275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6
230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10
203.50 41 6441.10 21 3323.00 38 352.50 58 10
(0F)
Determine if a quadratic model is needed for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
© 2004 Prentice-Hall, Inc. Chap 15-7
Heating Oil Example: Residual Analysis
Insulation Residual Plot
0 2 4 6 8 10 12
No discernable pattern
Temperature Residual Plot
-60
-40
-20
0
20
40
60
0 20 40 60 80
Re
sid
ua
ls
Possible non-linear relationship
(continued)
© 2004 Prentice-Hall, Inc. Chap 15-8
Heating Oil Example: t Test for Quadratic Model
Testing the Quadratic Effect Model with quadratic insulation term
Model without quadratic insulation term
Hypotheses (No quadratic term in insulation) (Quadratic term is needed in
insulation)
(continued)
20 1 1 2 2 3 2i i i i iY X X X
0 1 1 2 2i i i iY X X
0 3: 0H 1 3: 0H
© 2004 Prentice-Hall, Inc. Chap 15-9
3
3 3 1.8667 01.6611
1.1238b
bt
S
Example Solution
H0: 3 = 0
H1: 3 0
df = 11
Critical Values:
Test Statistic:
Decision:
Conclusion:
Do not reject H0 at = 0.05.
There is not sufficient evidence for the need to include quadratic effect of insulation on oil consumption.
Z0 2.2010-2.2010
.025
Reject H0 Reject H0
.025
Is quadratic term in insulation needed on monthly consumption of heating oil? Test at = 0.05.
1.6611
© 2004 Prentice-Hall, Inc. Chap 15-10
Example Solution in PHStat
PHStat | Regression | Multiple Regression …
Excel spreadsheet for the heating oil example
Microsoft Excel Worksheet
© 2004 Prentice-Hall, Inc. Chap 15-11
Using Transformations
Either or Both Independent and Dependent Variables May Be Transformed
Can Be Based on Theory, Logic or Scatter Diagrams
© 2004 Prentice-Hall, Inc. Chap 15-12
Inherently Non-Linear Models
Non-Linear Models that Can Be Expressed in Linear Form Can be estimated by least squares in linear
form Require Data Transformation
© 2004 Prentice-Hall, Inc. Chap 15-13
Transformed Multiplicative Model (Log-Log)
1 20 1 2Original: i i i iY X X
0 1 1 2 2Transformed: ln ln ln ln lni i i iY X X
Y
X1
Y
X1
1 1
10 1 11 0
1 1
1 1
Similarly for X2
© 2004 Prentice-Hall, Inc. Chap 15-14
Square Root Transformation
Y
X1
0 1 1 2 2i i i iY X X
1 > 0
1 < 0
Similarly for X2
Transforms non-linear model to one that appears linear. Often used to overcome heteroscedasticity.
© 2004 Prentice-Hall, Inc. Chap 15-15
Exponential Transformation(Log-Linear)
Y
X1
0 1 1 2 2i iX Xi iY e Original Model
1 > 0
1 < 0
Transformed Into: 0 1 1 2 2 1ln lni i iY X X
© 2004 Prentice-Hall, Inc. Chap 15-16
Interpretation of Coefficients
Transformed Exponential Model (Y is Transformed into lnY ) The coefficient of the independent variable
can be approximately interpreted as: a 1 unit change in leads to an estimated average rate of change of percentage in Y
kX
kX 100 kb
© 2004 Prentice-Hall, Inc. Chap 15-17
Interpretation of Coefficients
Transformed Multiplicative Model The Dependent Variable Y is transformed to ln Y The Independent Variable X is transformed to ln X The coefficient of the independent variable
can be approximately interpreted as a 1 percent rate of change in leads to an estimated average rate of change of percentage in Y. Therefore, is the elasticity of Y with respect to a change in .
(continued)
kX
kX
kbkb
kX
© 2004 Prentice-Hall, Inc. Chap 15-18
Influence Analysis To Determine Observations that Have
Influential Effect on the Fitted Model Potentially Influential Points Become
Candidates for Removal from the Model Criteria Used are:
The hat matrix elements hi
The studentized deleted residuals ti
Cook’s distance statistic Di
All 3 Criteria are Complementary Only when all 3 criteria provide a consistent result
should an observation be removed
© 2004 Prentice-Hall, Inc. Chap 15-19
The Hat Matrix Element hi
If , Xi is an Influential Point Xi may be considered a candidate for
removal from the model
2
2
1
1 i
i n
ii
X Xh
n X X
2 1 /ih k n
© 2004 Prentice-Hall, Inc. Chap 15-20
The Hat Matrix Element hi :Heating Oil Example
Oil (Gal) Temp Insulation h i275.30 40 3 0.1567363.80 27 3 0.1852164.30 40 10 0.175740.80 73 6 0.246794.30 64 6 0.1618
230.90 34 6 0.0741366.70 9 6 0.2306300.60 8 10 0.3521237.80 23 10 0.2268121.40 63 3 0.244631.40 65 10 0.2759
203.50 41 6 0.0676441.10 21 3 0.2174323.00 38 3 0.157452.50 58 10 0.2268
No hi > 0.4 No observation appears to be a candidate for removal from the model
15 2
2 1 / 0.4
n k
k n
© 2004 Prentice-Hall, Inc. Chap 15-21
The Studentized Deleted Residuals ti
: the residual for observation i SSE : error sum of squares An observation is considered influential if
is the critical value of a two-tail test at 10% level of significance
2
1
1i ii i
n kt e
SSE h e
ie
2i n kt t
2n kt
© 2004 Prentice-Hall, Inc. Chap 15-22
The Studentized Deleted Residuals ti :Example
Oil (Gal) Temp Insulation t i275.30 40 3 -0.3772363.80 27 3 0.3474164.30 40 10 0.8243
40.80 73 6 -0.187194.30 64 6 0.0066
230.90 34 6 -1.0571366.70 9 6 -1.1776300.60 8 10 -0.8464237.80 23 10 0.0341121.40 63 3 -1.8536
31.40 65 10 1.0304203.50 41 6 -0.6075441.10 21 3 2.9674323.00 38 3 1.1681
52.50 58 10 0.2432
2 11
15 2
1.7957n k
n k
t t
t10 and t13 are
influential points for potential removal from the model
10t
13t
© 2004 Prentice-Hall, Inc. Chap 15-23
Cook’s Distance Statistic Di
ei = the residual for observation i MSE = mean square error of the fitted
regression model hi = hat matrix element of observation i If , an observation is considered
influential is the critical value of the F
distribution at a 50% level of significance
2
21
i ii
i
e hD
kMSE h
1, 1i k n kD F
1, 1k n kF
© 2004 Prentice-Hall, Inc. Chap 15-24
Cook’s Distance Statistic Di : Heating Oil Example
Oil (Gal) Temp Insulation D i
275.30 40 3 0.0094363.80 27 3 0.0098164.30 40 10 0.049640.80 73 6 0.004194.30 64 6 0.0001
230.90 34 6 0.0295366.70 9 6 0.1342300.60 8 10 0.1328237.80 23 10 0.0001121.40 63 3 0.308331.40 65 10 0.1342
203.50 41 6 0.0094441.10 21 3 0.4941323.00 38 3 0.082452.50 58 10 0.0062
No Di > 0.835 No observation appears to be a candidate for removal from the modelUsing the 3 criteria, there is insufficient evidence for the removal of any observation from the model.
1, 1 3,12
15 2
0.835k n k
n k
F F
© 2004 Prentice-Hall, Inc. Chap 15-25
Collinearity (Multicollinearity)
High Correlation between Explanatory Variables
Coefficient of Multiple Determination Measures Combined Effect of the Correlated Explanatory Variables
Little or No New Information Provided Leads to Unstable Coefficients (Large
Standard Error)
© 2004 Prentice-Hall, Inc. Chap 15-26
Venn Diagrams and Collinearity
Oil
TempInsulation
Large Overlap Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOTNOT in estimating and
12
Large Overlap Overlap reflects collinearity between Temp and Insulation
© 2004 Prentice-Hall, Inc. Chap 15-27
Detect Collinearity (Variance Inflationary
Factor)
Used to Measure Collinearity
If is Highly Correlated with
the Other Explanatory Variables
2 coefficient of multiple
determination from the
regression of on all
the other explantory variables
j
j
R
X
2
1
1j
j
VIFR
5, j jVIF X
jVIF
© 2004 Prentice-Hall, Inc. Chap 15-28
Detect Collinearity in PHStat
PHStat | Regression | Multiple Regression … Check the “Variance Inflationary Factor (VIF)”
box Excel spreadsheet for the heating oil
example Since there are only two explanatory variables,
only one VIF is reported in the Excel spreadsheet
No VIF is > 5 There is no evidence of collinearity
Microsoft Excel Worksheet
© 2004 Prentice-Hall, Inc. Chap 15-29
Model Building
Goal is to Develop a Good Model with the Fewest Explanatory Variables Easier to interpret Lower probability of collinearity
Stepwise Regression Procedure Provides limited evaluation of alternative
models Best-Subset Approach
Uses the or Cp Statistic Selects the model with the largest or
small Cp near k+1
2adjr
2adjr
© 2004 Prentice-Hall, Inc. Chap 15-30
Model Building FlowchartChoose X1,X2,…Xp
Run Regression to Find VIFs
Remove Variable with
Highest VIF
Any VIF>5?
Run Subsets Regression to Obtain
“Best” Models in Terms of Cp
Do Complete Analysis
Add Curvilinear Term and/or Transform Variables as Indicated
Perform Predictions
No
More than One?
Remove this X
Yes
No
Yes
© 2004 Prentice-Hall, Inc. Chap 15-31
Pitfalls and Ethical Issues
Fail to Understand that the Interpretation of the Estimated Regression Coefficients are Performed Holding All Other Independent Variables Constant
Fail to Evaluate Residual Plots for Each Independent Variable
Fail to Evaluate Interaction Terms
© 2004 Prentice-Hall, Inc. Chap 15-32
Pitfalls and Ethical Issues
Fail to Obtain VIF for Each Independent Variable and Remove Variables that Exhibit a High Collinearity with Other Independent Variables Before Performing Significance Test on Each Independent Variable
Fail to Examine Several Alternative Models Fail to Use Other Methods When the
Assumptions Necessary for Least-Squares Regression Have Been Seriously Violated
(continued)
© 2004 Prentice-Hall, Inc. Chap 15-33
Chapter Summary
Described the Quadratic Regression Model
Discussed Using Transformations in Regression Models
Presented Influence Analysis Described Collinearity Discussed Model Building Addressed Pitfalls in Multiple Regression
and Ethical Issues