transcript
- Slide 1
- 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9
th Edition) Chapter 14 Introduction to Multiple Regression
- Slide 2
- 2004 Prentice-Hall, Inc. Chap 14-2 Chapter Topics The Multiple
Regression Model Residual Analysis Testing for the Significance of
the Regression Model Inferences on the Population Regression
Coefficients Testing Portions of the Multiple Regression Model
Dummy-Variables and Interaction Terms Logistic Regression
Model
- Slide 3
- 2004 Prentice-Hall, Inc. Chap 14-3 Population Y-intercept
Population slopesRandom error The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent
variables is a linear function Dependent (Response) variable
Independent (Explanatory) variables
- Slide 4
- 2004 Prentice-Hall, Inc. Chap 14-4 Multiple Regression
Model
- Slide 5
- 2004 Prentice-Hall, Inc. Chap 14-5 Multiple Regression
Equation
- Slide 6
- 2004 Prentice-Hall, Inc. Chap 14-6 Multiple Regression Equation
Too complicated by hand! Ouch!
- Slide 7
- 2004 Prentice-Hall, Inc. Chap 14-7 Interpretation of Estimated
Coefficients Slope ( b j ) Estimated that the average value of Y
changes by b j for each 1 unit increase in X j, holding all other
variables constant (ceterus paribus) Example: If b 1 = -2, then
fuel oil usage ( Y ) is expected to decrease by an estimated 2
gallons for each 1 degree increase in temperature ( X 1 ), given
the inches of insulation ( X 2 ) Y-Intercept ( b 0 ) The estimated
average value of Y when all X j = 0
- Slide 8
- 2004 Prentice-Hall, Inc. Chap 14-8 Multiple Regression Model:
Example ( 0 F) Develop a model for estimating heating oil used for
a single family home in the month of January, based on average
temperature and amount of insulation in inches.
- Slide 9
- 2004 Prentice-Hall, Inc. Chap 14-9 Multiple Regression
Equation: Example Excel Output For each degree increase in
temperature, the estimated average amount of heating oil used is
decreased by 5.437 gallons, holding insulation constant. For each
increase in one inch of insulation, the estimated average use of
heating oil is decreased by 20.012 gallons, holding temperature
constant.
- Slide 10
- 2004 Prentice-Hall, Inc. Chap 14-10 Multiple Regression in
PHStat PHStat | Regression | Multiple Regression Excel spreadsheet
for the heating oil example
- Slide 11
- 2004 Prentice-Hall, Inc. Chap 14-11 Venn Diagrams and
Explanatory Power of Regression Oil Temp Variations in Oil
explained by Temp or variations in Temp used in explaining
variation in Oil Variations in Oil explained by the error term
Variations in Temp not used in explaining variation in Oil
- Slide 12
- 2004 Prentice-Hall, Inc. Chap 14-12 Venn Diagrams and
Explanatory Power of Regression Oil Temp (continued)
- Slide 13
- 2004 Prentice-Hall, Inc. Chap 14-13 Venn Diagrams and
Explanatory Power of Regression Oil Temp Insulation Overlapping
variation NOT estimation Overlapping variation in both Temp and
Insulation are used in explaining the variation in Oil but NOT in
the estimation of nor NOT Variation NOT explained by Temp nor
Insulation
- Slide 14
- 2004 Prentice-Hall, Inc. Chap 14-14 Coefficient of Multiple
Determination Proportion of Total Variation in Y Explained by All X
Variables Taken Together Never Decreases When a New X Variable is
Added to Model Disadvantage when comparing among models
- Slide 15
- 2004 Prentice-Hall, Inc. Chap 14-15 Venn Diagrams and
Explanatory Power of Regression Oil Temp Insulation
- Slide 16
- 2004 Prentice-Hall, Inc. Chap 14-16 Adjusted Coefficient of
Multiple Determination Proportion of Variation in Y Explained by
All the X Variables Adjusted for the Sample Size and the Number of
X Variables Used Penalizes excessive use of independent variables
Smaller than Useful in comparing among models Can decrease if an
insignificant new X variable is added to the model
- Slide 17
- 2004 Prentice-Hall, Inc. Chap 14-17 Coefficient of Multiple
Determination Excel Output Adjusted r 2 reflects the number of
explanatory variables and sample size is smaller than r 2
- Slide 18
- 2004 Prentice-Hall, Inc. Chap 14-18 Interpretation of
Coefficient of Multiple Determination 96.56% of the total variation
in heating oil can be explained by temperature and amount of
insulation 95.99% of the total fluctuation in heating oil can be
explained by temperature and amount of insulation after adjusting
for the number of explanatory variables and sample size
- Slide 19
- 2004 Prentice-Hall, Inc. Chap 14-19 Simple and Multiple
Regression Compared simple The slope coefficient in a simple
regression picks up the impact of the independent variable plus the
impacts of other variables that are excluded from the model, but
are correlated with the included independent variable and the
dependent variable multiple Coefficients in a multiple regression
net out the impacts of other variables in the equation Hence, they
are called the net regression coefficients They still pick up the
effects of other variables that are excluded from the model, but
are correlated with the included independent variables and the
dependent variable
- Slide 20
- 2004 Prentice-Hall, Inc. Chap 14-20 Simple and Multiple
Regression Compared: Example Two Simple Regressions: Multiple
Regression: The three s do not have the same value The two s do not
have the same value The three s are different
- Slide 21
- 2004 Prentice-Hall, Inc. Chap 14-21 Simple and Multiple
Regression Compared: Slope Coefficients The three s are
different
- Slide 22
- 2004 Prentice-Hall, Inc. Chap 14-22 Simple and Multiple
Regression Compared: r 2
- Slide 23
- 2004 Prentice-Hall, Inc. Chap 14-23 Example: Adjusted r 2 Can
Decrease Adjusted r 2 decreases when k increases from 2 to 3
Rainfall is not useful in explaining the variation in oil
consumption. Try a 3 rd explanatory variable
- Slide 24
- 2004 Prentice-Hall, Inc. Chap 14-24 Using the Regression
Equation to Make Predictions Predict the amount of heating oil used
for a home if the average temperature is 30 0 and the insulation is
6 inches. The predicted heating oil used is 278.97 gallons.
- Slide 25
- 2004 Prentice-Hall, Inc. Chap 14-25 Predictions in PHStat
PHStat | Regression | Multiple Regression Check the Confidence and
Prediction Interval Estimate box Excel spreadsheet for the heating
oil example
- Slide 26
- 2004 Prentice-Hall, Inc. Chap 14-26 Residual Plots Residuals Vs
May need to transform Y variable Residuals Vs May need to transform
variable Residuals Vs May need to transform variable Residuals Vs
Time May have autocorrelation
- Slide 27
- 2004 Prentice-Hall, Inc. Chap 14-27 Residual Plots: Example No
Discernable Pattern Maybe some non- linear relationship
- Slide 28
- 2004 Prentice-Hall, Inc. Chap 14-28 Testing for Overall
Significance Shows if Y Depends Linearly on All of the X Variables
Together as a Group Use F Test Statistic Hypotheses: H 0 : k = 0
(No linear relationship) H 1 : At least one j ( At least one
independent variable affects Y ) The Null Hypothesis is a Very
Strong Statement The Null Hypothesis is Almost Always Rejected
- Slide 29
- 2004 Prentice-Hall, Inc. Chap 14-29 Testing for Overall
Significance Test Statistic: Where F has k numerator and ( n-k-1 )
denominator degrees of freedom (continued)
- Slide 30
- 2004 Prentice-Hall, Inc. Chap 14-30 Test for Overall
Significance Excel Output: Example k = 2, the number of explanatory
variables n - 1 p -value
- Slide 31
- 2004 Prentice-Hall, Inc. Chap 14-31 Test for Overall
Significance: Example Solution F 03.89 H 0 : 1 = 2 = = k = 0 H 1 :
At least one j 0 =.05 df = 2 and 12 Critical Value : Test
Statistic: Decision: Conclusion: Reject at = 0.05. There is
evidence that at least one independent variable affects Y. = 0.05 F
168.47 (Excel Output)
- Slide 32
- 2004 Prentice-Hall, Inc. Chap 14-32 Test for Significance:
Individual Variables Show If Y Depends Linearly on a Single X j
Individually While Holding the Effects of Other X s Fixed Use t
Test Statistic Hypotheses: H 0 : j 0 (No linear relationship) H 1 :
j 0 (Linear relationship between X j and Y )
- Slide 33
- 2004 Prentice-Hall, Inc. Chap 14-33 t Test Statistic Excel
Output: Example t Test Statistic for X 1 (Temperature) t Test
Statistic for X 2 (Insulation)
- Slide 34
- 2004 Prentice-Hall, Inc. Chap 14-34 t Test : Example Solution H
0 : 1 = 0 H 1 : 1 0 df = 12 Critical Values: Test Statistic:
Decision: Conclusion: Reject H 0 at = 0.05. There is evidence of a
significant effect of temperature on oil consumption holding
constant the effect of insulation. t 0 2.1788 -2.1788.025 Reject H
0 0.025 Does temperature have a significant effect on monthly
consumption of heating oil? Test at = 0.05. t Test Statistic =
-16.1699
- Slide 35
- 2004 Prentice-Hall, Inc. Chap 14-35 Venn Diagrams and
Estimation of Regression Model Oil Temp Insulation Only this
information is used in the estimation of This information is NOT
used in the estimation of nor
- Slide 36
- 2004 Prentice-Hall, Inc. Chap 14-36 Confidence Interval
Estimate for the Slope Provide the 95% confidence interval for the
population slope 1 (the effect of temperature on oil consumption).
-6.169 1 -4.704 We are 95% confident that the estimated average
consumption of oil is reduced by between 4.7 gallons to 6.17
gallons per each increase of 1 0 F holding insulation constant. We
can also perform the test for the significance of individual
variables, H 0 : 1 = 0 vs. H 1 : 1 0, using this confidence
interval.
- Slide 37
- 2004 Prentice-Hall, Inc. Chap 14-37 Contribution of a Single
Independent Variable Let X j Be the Independent Variable of
Interest Measures the additional contribution of X j in explaining
the total variation in Y with the inclusion of all the remaining
independent variables
- Slide 38
- 2004 Prentice-Hall, Inc. Chap 14-38 Contribution of a Single
Independent Variable Measures the additional contribution of X 1 in
explaining Y with the inclusion of X 2 and X 3. From ANOVA section
of regression for Note: the values of the coefficients b 0, b 1,
and b 2 change in the two regression equations.
- Slide 39
- 2004 Prentice-Hall, Inc. Chap 14-39 Coefficient of Partial
Determination of Measures the proportion of variation in the
dependent variable that is explained by X j while controlling for
(holding constant) the other independent variables
- Slide 40
- 2004 Prentice-Hall, Inc. Chap 14-40 Coefficient of Partial
Determination for (continued) Example: Model with two independent
variables
- Slide 41
- 2004 Prentice-Hall, Inc. Chap 14-41 Venn Diagrams and
Coefficient of Partial Determination for Oil Temp Insulation =
- Slide 42
- 2004 Prentice-Hall, Inc. Chap 14-42 Coefficient of Partial
Determination in PHStat PHStat | Regression | Multiple Regression
Check the Coefficient of Partial Determination box Excel
spreadsheet for the heating oil example
- Slide 43
- 2004 Prentice-Hall, Inc. Chap 14-43 Contribution of a Subset of
Independent Variables Let X s Be the Subset of Independent
Variables of Interest Measures the contribution of the subset X s
in explaining SST with the inclusion of the remaining independent
variables
- Slide 44
- 2004 Prentice-Hall, Inc. Chap 14-44 Contribution of a Subset of
Independent Variables: Example Let X s be X 1 and X 3 From ANOVA
section of regression for
- Slide 45
- 2004 Prentice-Hall, Inc. Chap 14-45 Testing Portions of Model
Examines the Contribution of a Subset X s of Explanatory Variables
to the Relationship with Y Null Hypothesis: Variables in the subset
do not improve the model significantly when all other variables are
included Alternative Hypothesis: At least one variable in the
subset is significant when all other variables are included
- Slide 46
- 2004 Prentice-Hall, Inc. Chap 14-46 Testing Portions of Model
One-Tailed Rejection Region Requires Comparison of Two Regressions
One regression includes everything Another regression includes
everything except the portion to be tested (continued)
- Slide 47
- 2004 Prentice-Hall, Inc. Chap 14-47 Partial F Test for the
Contribution of a Subset of X Variables Hypotheses: H 0 : Variables
X s do not significantly improve the model given all other
variables included H 1 : Variables X s significantly improve the
model given all others included Test Statistic: with df = m and (
n-k-1 ) m = # of variables in the subset X s
- Slide 48
- 2004 Prentice-Hall, Inc. Chap 14-48 Partial F Test for the
Contribution of a Single Hypotheses: H 0 : Variable X j does not
significantly improve the model given all others included H 1 :
Variable X j significantly improves the model given all others
included Test Statistic: with df =1 and ( n-k-1 ) m = 1 here
- Slide 49
- 2004 Prentice-Hall, Inc. Chap 14-49 Testing Portions of Model:
Example Test at the =.05 level to determine if the variable of
average temperature significantly improves the model, given that
insulation is included.
- Slide 50
- 2004 Prentice-Hall, Inc. Chap 14-50 Testing Portions of Model:
Example H 0 : X 1 (temperature) does not improve model with X 2
(insulation) included H 1 : X 1 does improve model =.05, df = 1 and
12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion:
Reject H 0 ; X 1 does improve model.
- Slide 51
- 2004 Prentice-Hall, Inc. Chap 14-51 Testing Portions of Model
in PHStat PHStat | Regression | Multiple Regression Check the
Coefficient of Partial Determination box Excel spreadsheet for the
heating oil example
- Slide 52
- 2004 Prentice-Hall, Inc. Chap 14-52 Do We Need to Do This for
One Variable? The F Test for the Contribution of a Single Variable
After All Other Variables are Included in the Model is IDENTICAL to
the t Test of the Slope for that Variable The Only Reason to
Perform an F Test is to Test Several Variables Together
- Slide 53
- 2004 Prentice-Hall, Inc. Chap 14-53 Dummy-Variable Models
Categorical Explanatory Variable with 2 or More Levels Only
Intercepts are Different Assumes Equal Slopes Across Categories The
Number of Dummy-Variables Needed is (# of Levels - 1) Regression
Model Has Same Form: Two Level Examples Yes or No, On or Off Use
Dummy-Variable (Coded as 0 or 1)
- Slide 54
- 2004 Prentice-Hall, Inc. Chap 14-54 Dummy-Variable Models (with
2 Levels) Given: Y = Assessed Value of House X 1 = Square Footage
of House X 2 = Desirability of Neighborhood = Desirable ( X 2 = 1)
Undesirable ( X 2 = 0) 0 if undesirable 1 if desirable Same
slopes
- Slide 55
- 2004 Prentice-Hall, Inc. Chap 14-55 Undesirable Desirable
Location Dummy-Variable Models (with 2 Levels) (continued) X 1
(Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes
Intercepts different
- Slide 56
- 2004 Prentice-Hall, Inc. Chap 14-56 Interpretation of the
Dummy- Variable Coefficient (with 2 Levels) Example: : GPA 0
non-business degree 1 business degree : Annual salary of college
graduate in thousand $ With the same GPA, college graduates with a
business degree are making an estimated 6 thousand dollars more
than graduates with a non-business degree, on average. :
- Slide 57
- 2004 Prentice-Hall, Inc. Chap 14-57 Dummy-Variable Models (with
3 Levels)
- Slide 58
- 2004 Prentice-Hall, Inc. Chap 14-58 Interpretation of the
Dummy- Variable Coefficients (with 3 Levels) With the same footage,
a Split- level will have an estimated average assessed value of
18.84 thousand dollars more than a Tudor. With the same footage, a
Ranch will have an estimated average assessed value of 23.53
thousand dollars more than a Tudor.
- Slide 59
- 2004 Prentice-Hall, Inc. Chap 14-59 Regression Model Containing
an Interaction Term Hypothesizes Interaction between a Pair of X
Variables Response to one X variable varies at different levels of
another X variable Contains a Cross-Product Term Can Be Combined
with Other Models E.g., Dummy-Variable Model
- Slide 60
- 2004 Prentice-Hall, Inc. Chap 14-60 Effect of Interaction
Given: Without Interaction Term, Effect of X 1 on Y is Measured by
1 With Interaction Term, Effect of X 1 on Y is Measured by 1 + 3 X
2 Effect Changes as X 2 Changes
- Slide 61
- 2004 Prentice-Hall, Inc. Chap 14-61 Y = 1 + 2X 1 + 3(1) + 4X 1
(1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1
Interaction Example Effect (slope) of X 1 on Y depends on X 2 value
X1X1 4 8 12 0 010.51.5 Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2
- Slide 62
- 2004 Prentice-Hall, Inc. Chap 14-62 Interaction Regression
Model Worksheet Multiply X 1 by X 2 to get X 1 X 2 Run regression
with Y, X 1, X 2, X 1 X 2 Case, iYiYi X 1i X 2i X 1i X 2i 11133
248540 31326 435630 :::::
- Slide 63
- 2004 Prentice-Hall, Inc. Chap 14-63 Interpretation When There
Are 3+ Levels Male = 0 if female; 1 if male Part-time = 1 if
working part-time; 0 if working full-time or not working Full-time
= 1 if working full-time; 0 if working part-time or not working
MalePart-time = 1 if male and working part-time; 0 otherwise =
(Male times Part-time) MaleFull-time = 1 if male working full-time;
0 otherwise = (Male times Full-time) Consider the effects of gender
(male or female) and working status (working part-time, working
full-time or not working) on income (Y ).
- Slide 64
- 2004 Prentice-Hall, Inc. Chap 14-64 Interpretation When There
Are 3+ Levels (continued)
- Slide 65
- 2004 Prentice-Hall, Inc. Chap 14-65 Interpreting Results Female
Not-working: Part-time: Full-time: Male Not-working: Part-time:
Full-time: Main Effects : Male, Part-time and Full-time Interaction
Effects : MalePart-time and MaleFull-time Difference
- Slide 66
- 2004 Prentice-Hall, Inc. Chap 14-66 Suppose X 1 and X 2 are
Numerical Variables and X 3 is a Dummy-Variable To Test if the
Slope of Y with X 1 and/or X 2 are the Same for the Two Levels of X
3 Model: Hypotheses: H 0 : = = 0 (No Interaction between X 1 and X
3 or X 2 and X 3 ) H 1 : 4 and/or 5 0 ( X 1 and/or X 2 Interacts
with X 3 ) Perform a Partial F Test Evaluating the Presence of
Interaction with Dummy-Variable
- Slide 67
- 2004 Prentice-Hall, Inc. Chap 14-67 Evaluating the Presence of
Interaction with Numerical Variables Suppose X 1, X 2 and X 3 are
Numerical Variables To Test If the Independent Variables Interact
with Each Other Model: Hypotheses: H 0 : = = = 0 (no interaction
among X 1, X 2 and X 3 ) H 1 : at least one of 4, 5, 6 0 (at least
one pair of X 1, X 2, X 3 interact with each other) Perform a
Partial F Test
- Slide 68
- 2004 Prentice-Hall, Inc. Chap 14-68 Logistic Regression Model
Enables the Use of Regression Model to Predict the Probability of a
Particular Categorical Response for a Given Set of Explanatory
Variables Based on the Odds Ratio Represents the probability of a
success compared with the probability of failure
- Slide 69
- 2004 Prentice-Hall, Inc. Chap 14-69 Logistic Regression Model
Logistic Regression Equation Estimated Odds Ratio Estimated
Probability of Success (continued)
- Slide 70
- 2004 Prentice-Hall, Inc. Chap 14-70 Interpretation of Estimated
Slope Coefficients Logistic Regression Equation Has to be Estimated
Using Computer Statistical Software, e.g. Minitab The Estimated
Slope Coefficient b j Measures the Estimated Change in the Natural
Logarithm of the Odds Ratio as a Result of a One Unit Change in the
Independent Variable X j Holding Constant the Effects of all the
Other Independent Variables
- Slide 71
- 2004 Prentice-Hall, Inc. Chap 14-71 The Deviance Statistic Use
to Test whether the Logistic Regression is a Good-Fitting Model
Hypotheses H 0 : The model is a good-fitting model H 1 : The model
is not a good-fitting model Test Statistic The deviance statistic
has a distribution with (n k 1) degrees of freedom The rejection
region is always in the upper tail
- Slide 72
- 2004 Prentice-Hall, Inc. Chap 14-72 Testing Significance of an
Independent Variable Hypotheses (X j is not significant) (X j is
significant) Test Statistic The Wald statistic is normally
distributed A two-tail test with left and right-tail rejection
regions
- Slide 73
- 2004 Prentice-Hall, Inc. Chap 14-73 Chapter Summary Developed
the Multiple Regression Model Discussed Residual Plots Addressed
Testing the Significance of the Multiple Regression Model Discussed
Inferences on Population Regression Coefficients Addressed Testing
Portions of the Multiple Regression Model Discussed Dummy-Variables
and Interaction Terms Addressed Logistic Regression Model