Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 241 times |
Download: | 2 times |
Korelasi Ganda Dan Penambahan Peubah
Pertemuan 13
Matakuliah : I0174 – Analisis RegresiTahun : Ganjil 2007/2008
Bina Nusantara
Korelasi Ganda dan Penambahan Peubah
Multiple Regression and Correlation
Bina Nusantara
Chapter Topics
• The Multiple Regression Model
• Residual Analysis
• Testing for the Significance of the Regression Model
• Inferences on the Population Regression Coefficients
• Testing Portions of the Multiple Regression Model• Dummy-Variables and Interaction Terms
Bina Nusantara
Population Y-intercept
Population slopes Random error
The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear
function
Dependent (Response) variable
Independent (Explanatory) variables
1 2i i i k ki iY X X X
Bina Nusantara
Multiple Regression Model
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
Bivariate model
1X
Y
2X
0 1 1 2 2i i i iY X X (Observed )Y
| 0 1 1 2 2Y X i iX X
Response
Plane0
1 2,i iX X
Bina Nusantara
Multiple Regression Equation
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
Bivariate model 0 1 1 2 2i i i iY b b X b X e Y
1X
2X
(Observed )Y
Response
Plane
1 2,i iX X
0b
0 1 1 2 2i i iY b b X b X Multiple Regression EquationMultiple Regression Equation
Bina Nusantara
Multiple Regression Equation
Too complicated
by hand! Ouch!
Bina Nusantara
Interpretation of Estimated Coefficients• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2)
• Y-Intercept (b0)– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: ExampleOil (Gal) Temp Insulation
275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6
230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10
203.50 41 6441.10 21 3323.00 38 352.50 58 10
(0F)
Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.
Bina Nusantara
1 2ˆ 562.151 5.437 20.012i i iY X X
Multiple Regression Equation: Example
CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067
Excel Output
For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.
For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
0 1 1 2 2i i i k kiY b b X b X b X
Bina Nusantara
Multiple Regression in PHStat• PHStat | Regression | Multiple Regression …
• Excel spreadsheet for the heating oil example
Microsoft Excel Worksheet
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
Temp
Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil
Variations in Oil explained by the error term
Variations in Temp not used in explaining variation in Oil
SSE
SSR
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
Temp
2
r
SSR
SSR SSE
(continued)
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
TempInsulation
Overlapping Overlapping variation in both Temp and Insulation are used in explaining the variationvariation in Oil but NOTNOT in the estimationestimation of nor
12
Variation NOTNOT explained by Temp nor Insulation SSE
Bina Nusantara
Coefficient of Multiple Determination
• Proportion of Total Variation in Y Explained by All X Variables Taken Together
–
• Never Decreases When a New X Variable is Added to Model– Disadvantage when comparing among models
212
Explained Variation
Total VariationY k
SSRr
SST
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
TempInsulation
212
Yr
SSR
SSR SSE
Bina Nusantara
Adjusted Coefficient of Multiple Determination
• Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used–
– Penalizes excessive use of independent variables– Smaller than– Useful in comparing among models– Can decrease if an insignificant new X variable is added to the model
2 212
11 1
1adj Y k
nr r
n k
212Y kr
Bina Nusantara
Coefficient of Multiple Determination
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
Excel Output
SST
SSRr ,Y 2
12
Adjusted r2
reflects the number of explanatory variables and sample size
is smaller than r2
Bina Nusantara
Interpretation of Coefficient of Multiple Determination
•
– 96.56% of the total variation in heating oil can be explained by temperature and amount of insulation
•
– 95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size
212 .9656Y
SSRr
SST
2adj .9599r
Bina Nusantara
Simple and Multiple Regression Compared
• The slope coefficient in a simplesimple regression picks up the impact of the independent variable plus the impacts of other variables that are excluded from the model, but are correlated with the included independent variable and the dependent variable
• Coefficients in a multiplemultiple regression net out the impacts of other variables in the equation
– Hence, they are called the net regression coefficients
– They still pick up the effects of other variables that are excluded from the model, but are correlated with the included independent variables and the dependent variable
Bina Nusantara
Simple and Multiple Regression Compared: Example
• Two Simple Regressions:– –
• Multiple Regression:–
0 1
0 2
Oil Temp
Oil Insulation
0 1 2Oil Temp Insulation
Bina Nusantara
CoefficientsIntercept 562.1510092Temp -5.436580588Insulation -20.01232067
Simple and Multiple Regression Compared: Slope Coefficients
0 1 2Oil Temp Insulationb b b e
0 1Oil Tempb b e 0 2Oil Insulationb b e
CoefficientsIntercept 436.4382299Temp -5.462207697
CoefficientsIntercept 345.3783784Insulation -20.35027027
-20.0123 -20.3503
-5.4366 -5.4622
Bina Nusantara
Simple and Multiple Regression Compared: r2
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
0 1 2Oil Temp Insulation
0 1Oil Temp 0 1Oil Insulation Regression Statistics
Multiple R 0.86974117R Square 0.756449704Adjusted R Square 0.737715065Standard Error 66.51246564Observations 15
Regression StatisticsMultiple R 0.465082527R Square 0.216301757Adjusted R Square 0.156017277Standard Error 119.3117327Observations 15
0.75645 0.96561 0. 30 216
0.97275
Bina Nusantara
Example: Adjusted r2
Can Decrease
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
0 1 2Oil Temp Insulation
0 1 2 3Oil Temp Insulation Color
Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15
Adjusted r 2 decreases when k increases from 2 to 3
Color is not useful in explaining the variation in oil consumption.
Bina Nusantara
Using the Regression Equation to Make Predictions
Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.
The predicted heating oil used is 278.97 gallons.
1 2
ˆ 562.151 5.437 20.012
562.151 5.437 30 20.012 6
278.969
i i iY X X
Bina Nusantara
Predictions in PHStat
• PHStat | Regression | Multiple Regression …– Check the “Confidence and Prediction Interval
Estimate” box• Excel spreadsheet for the heating oil example
Microsoft Excel Worksheet
Bina Nusantara
Residual Plots• Residuals Vs
– May need to transform Y variable
• Residuals Vs– May need to transform variable
• Residuals Vs– May need to transform variable
• Residuals Vs Time– May have autocorrelation
Y
1X
2X1X
2X
Bina Nusantara
Residual Plots: Example
Insulation Residual Plot
0 2 4 6 8 10 12
No Discernable Pattern
Temperature Residual Plot
-60
-40
-20
0
20
40
60
0 20 40 60 80
Re
sid
ua
ls
Maybe some non-linear relationship
Bina Nusantara
Testing for Overall Significance
• Shows if Y Depends Linearly on All of the X Variables Together as a Group
• Use F Test Statistic• Hypotheses:
– H0: …k = 0 (No linear relationship)
– H1: At least one i ( At least one independentvariable affects Y )
• The Null Hypothesis is a Very Strong Statement• The Null Hypothesis is Almost Always Rejected
Bina Nusantara
Testing for Overall Significance• Test Statistic:
–
• Where F has k numerator and (n-k-1) denominator degrees of freedom
(continued)
all /
all
SSR kMSRF
MSE MSE
Bina Nusantara
ANOVAdf SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2
Test for Overall SignificanceExcel Output: Example
k = 2, the number of explanatory variables n - 1
p-value
Test StatisticMSR
FMSE
Bina Nusantara
Test for Overall Significance:Example Solution
F0 3.89
H0: 1 = 2 = … = k = 0
H1: At least one j 0 = .05df = 2 and 12
Critical Value:
Test Statistic:
Decision:
Conclusion:
Reject at = 0.05.
There is evidence that at least one independent variable affects Y.
= 0.05
F 168.47(Excel Output)
Bina Nusantara
Test for Significance:Individual Variables
• Show If Y Depends Linearly on a Single Xj Individually While Holding the Effects of Other X’s Fixed
• Use t Test Statistic• Hypotheses:
– H0: j 0 (No linear relationship)
– H1: j 0 (Linear relationship between Xj and Y)
Bina Nusantara
Coefficients Standard Error t Stat P-valueIntercept 562.1510092 21.09310433 26.65094 4.77868E-12Temp -5.436580588 0.336216167 -16.1699 1.64178E-09Insulation -20.01232067 2.342505227 -8.543127 1.90731E-06
t Test StatisticExcel Output: Example
t Test Statistic for X1 (Temperature)
t Test Statistic for X2 (Insulation)
i
i
b
bt
S
Bina Nusantara
t Test : Example Solution
H0: 1 = 0
H1: 1 0
df = 12
Critical Values:
Test Statistic:
Decision:
Conclusion:
Reject H0 at = 0.05.
There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation.
t0 2.1788-2.1788
.025
Reject H0 Reject H0
.025
Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.
t Test Statistic = -16.1699
Bina Nusantara
Venn Diagrams and Estimation of Regression Model
Oil
TempInsulation
Only this information is used in the estimation of 2
Only this information is used in the estimation of
1This information is NOT used in the estimation of nor1 2
Bina Nusantara
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).
11 1n p bb t S
Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935Temp -5.4365806 -6.169132673 -4.7040285Insulation -20.012321 -25.11620102 -14.90844
-6.169 1 -4.704
We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual variables, H0: 1 = 0 vs. H1: 1 0, using this confidence interval.
Bina Nusantara
Contribution of a SingleIndependent Variable
• Let Xj Be the Independent Variable of Interest
•
– Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables
jX
| all others except
all all others except
j j
j
SSR X X
SSR SSR X
Bina Nusantara
Contribution of a Single Independent Variable kX
1 2 3
1 2 3 2 3
| and
, and and
SSR X X X
SSR X X X SSR X X
Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3.
From ANOVA section of regression for
From ANOVA section of regression for
0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2 3 3i i iY b b X b X
Bina Nusantara
Coefficient of Partial Determination of•
• Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other independent variables
2 all others
| all others
all | all others
Yj
j
j
r
SSR X
SST SSR SSR X
jX
Bina Nusantara
Coefficient of Partial Determination forjX
(continued)
1 221 2
1 2 1 2
|
, |Y
SSR X Xr
SST SSR X X SSR X X
Example: Model with two independent variables
Bina Nusantara
Venn Diagrams and Coefficient of Partial Determination for jX
Oil
TempInsulation
1 2|SSR X X
21 2
1 2
1 2 1 2
|
, |
Yr
SSR X X
SST SSR X X SSR X X
=
Bina Nusantara
Coefficient of Partial Determination in PHStat
• PHStat | Regression | Multiple Regression …– Check the “Coefficient of Partial Determination” box
• Excel spreadsheet for the heating oil example
Microsoft Excel Worksheet
Bina Nusantara
Contribution of a Subset of Independent Variables
• Let Xs Be the Subset of Independent Variables of Interest–
– Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables
| all others except
all all others except
s s
s
SSR X X
SSR SSR X
Bina Nusantara
Contribution of a Subset of Independent Variables: Example
Let Xs be X1 and X3
1 3 2
1 2 3 2
and |
, and
SSR X X X
SSR X X X SSR X
From ANOVA section of regression for
From ANOVA section of regression for
0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2i iY b b X
Bina Nusantara
Testing Portions of Model
• Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with Y
• Null Hypothesis:– Variables in the subset do not improve the model
significantly when all other variables are included • Alternative Hypothesis:
– At least one variable in the subset is significant when all other variables are included
Bina Nusantara
Testing Portions of Model
• One-Tailed Rejection Region• Requires Comparison of Two Regressions
– One regression includes everything– Another regression includes everything
except the portion to be tested
(continued)
Bina Nusantara
Partial F Test for the Contribution of a Subset of X Variables
• Hypotheses:– H0 : Variables Xs do not significantly improve the model
given all other variables included
– H1 : Variables Xs significantly improve the model given all others included
• Test Statistic:–
– with df = m and (n-k-1)
– m = # of variables in the subset Xs
| all others /
allsSSR X m
FMSE
Bina Nusantara
Partial F Test for the Contribution of a Single
• Hypotheses:– H0 : Variable Xj does not significantly improve the model
given all others included
– H1 : Variable Xj significantly improves the model given all others included
• Test Statistic:–
– with df = 1 and (n-k-1 ) – m = 1 here
jX
| all others
alljSSR X
FMSE
Bina Nusantara
Testing Portions of Model: Example
Test at the = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.
Bina Nusantara
Testing Portions of Model: ExampleH0: X1 (temperature) does not improve model with X2 (insulation) included
H1: X1 does improve model
= .05, df = 1 and 12
Critical Value = 4.75
ANOVASS
Regression 51076.47Residual 185058.8Total 236135.2
ANOVASS MS
Regression 228014.6263 114007.313Residual 8120.603016 676.716918Total 236135.2293
(For X1 and X2) (For X2)
Conclusion: Reject H0; X1 does improve model.
1 2
1 2
| 228,015 51,076261.47
, 676.717
SSR X XF
MSE X X