Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | todd-caldwell |
View: | 219 times |
Download: | 0 times |
11 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
The equation that describes how the dependent The equation that describes how the dependent variable variable yy is related to the independent is related to the independent variables variables xx11, , xx22, . . . , . . . xxpp and an error term is: and an error term is:
yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +
where:where:00, , 11, , 22, . . . , , . . . , pp are the are the parametersparameters, and, and is a random variable called the is a random variable called the error termerror term
Multiple Regression ModelMultiple Regression Model
Chapter 13(a) - Multiple Regression
22 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
The equation that describes how the The equation that describes how the mean value of mean value of yy is related to is related to xx11, , xx22, . . . , . . . xxpp is:is:
Multiple Regression Equation and Estimated Multiple Regression Equation and Estimated MREMRE
EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp
Multiple Regression EquationMultiple Regression Equation
Estimated Multiple Regression EquationEstimated Multiple Regression Equation
yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxppA simple random sample is used to compute A simple random sample is used to compute
sample statistics sample statistics bb00, , bb11, , bb22, , . . . , . . . , bbpp that are that are used as the point estimators of the used as the point estimators of the parameters parameters 00, , 11, , 22, . . . , , . . . , pp..
33 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Estimation ProcessEstimation Process
Multiple Regression ModelMultiple Regression Model
EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp + + Multiple Regression EquationMultiple Regression Equation
EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp Unknown parameters areUnknown parameters are
00, , 11, , 22, . . . , , . . . , pp
Sample Data:Sample Data:xx11 x x22 . . . x . . . xpp y y. . . .. . . .. . . .. . . .
0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x
Estimated MultipleEstimated MultipleRegression EquationRegression Equation
Sample statistics areSample statistics are
bb00, , bb11, , bb22, , . . . , . . . , bbp p
bb00, , bb11, , bb22, , . . . , . . . , bbpp
provide provide estimates ofestimates of
00, , 11, , 22, . . . , , . . . , pp
44 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Two variable model Y
X1
X2
22110 XbXbbY
Slop
e fo
r var
iabl
e X 1
Slope for variable X2
Multiple Regression EquationMultiple Regression Equation
55 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Least Squares MethodLeast Squares Method
Least Squares CriterionLeast Squares Criterion
2ˆmin ( )i iy y 2ˆmin ( )i iy y Computation of Coefficient ValuesComputation of Coefficient Values
The formulas for the regression coefficientsThe formulas for the regression coefficients
bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of matrix algebra. involve the use of matrix algebra.
We will rely on computer software packages toWe will rely on computer software packages to
perform the calculations.perform the calculations.
66 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
The years of experience, score on the aptitude The years of experience, score on the aptitude testtest
test, and corresponding annual salary ($1000s) test, and corresponding annual salary ($1000s) for afor a
sample of 20 programmers is shown on the next sample of 20 programmers is shown on the next slide.slide.
Example: Example: Programmer Salary SurveyProgrammer Salary Survey
Multiple Regression ModelMultiple Regression Model
A software firm collected data for a sample A software firm collected data for a sample of 20of 20
computer programmers. A suggestion was computer programmers. A suggestion was made thatmade that
regression analysis could be used to determine regression analysis could be used to determine if if
salary was related to the years of experience salary was related to the years of experience and theand the
score on the firm’s programmer aptitude test.score on the firm’s programmer aptitude test.
77 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
4477115588101000116666
9922101055668844663333
787810010086868282868684847575808083839191
8888737375758181747487877979949470708989
24.024.043.043.023.723.734.334.335.835.838.038.022.222.223.123.130.030.033.033.0
38.038.026.626.636.236.231.631.629.029.034.034.030.130.133.933.928.228.230.030.0
Exper.Exper.(Yrs.)(Yrs.)
TestTestScoreScore
TestTestScoreScore
Exper.Exper.(Yrs.)(Yrs.)
SalarySalary($000s)($000s)
SalarySalary($000s)($000s)
Multiple Regression ModelMultiple Regression Model
88 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Suppose we believe that salary (Suppose we believe that salary (yy) is related ) is related to the years of experience (to the years of experience (xx11) and the score ) and the score on the programmer aptitude test (on the programmer aptitude test (xx22) by the ) by the following regression model:following regression model:
Multiple Regression ModelMultiple Regression Model
wherewhere yy = annual salary ($000) = annual salary ($000)
xx11 = years of experience = years of experience
xx22 = score on programmer aptitude test = score on programmer aptitude test
yy = = 00 + + 11xx1 1 + + 22xx2 2 + +
99 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Solving for the Estimates of Solving for the Estimates of 00, , 11, , 22
Input DataInput DataLeast SquaresLeast Squares
OutputOutput
xx11 xx22 yy
4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30
CComputeromputerPackagePackage
for for SolvingSolvingMultipleMultiple
RegressioRegressionn
ProblemsProblems
bb00 = =
bb11 = =
bb22 = =
RR22 = =
etc.etc.
1010 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Excel’s Regression Equation OutputExcel’s Regression Equation Output
Note: Columns F-I are not shown.Note: Columns F-I are not shown.
Solving for the Estimates of Solving for the Estimates of 00, , 11, , 22
A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
SALARY = 3.174 + 1.404(EXPER) + SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)0.251(SCORE)
Note: Predicted salary will be in thousands of dollars.Note: Predicted salary will be in thousands of dollars.
1111 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Interpreting the CoefficientsInterpreting the Coefficients
In multiple regression analysis, we interpret In multiple regression analysis, we interpret eacheach
regression coefficient as follows:regression coefficient as follows: bbii represents an estimate of the change in represents an estimate of the change in yy corresponding to a 1-unit increase in corresponding to a 1-unit increase in xxii when all when all other independent variables are held constant.other independent variables are held constant.
bb11 = 1.404 = 1.404bb11 = 1.404 = 1.404 Salary is expected to increase by Salary is expected to increase by
$1,404 for each additional year of $1,404 for each additional year of experience (when the variable experience (when the variable score on programmer attitude testscore on programmer attitude test is held constant).is held constant).
bb22 = 0.251 = 0.251bb22 = 0.251 = 0.251 Salary is expected to increase by Salary is expected to increase by
$251 for each additional point scored $251 for each additional point scored on the programmer aptitude test on the programmer aptitude test (when the variable (when the variable years of experienceyears of experience is held constant).is held constant).
1212 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Multiple Coefficient of DeterminationMultiple Coefficient of Determination
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error
SST = SSR + SST = SSR + SSE SSE
2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y== ++
1313 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Excel’s ANOVA OutputExcel’s ANOVA Output
Multiple Coefficient of DeterminationMultiple Coefficient of Determination
A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538
SSRSSRSSTSST
RR22 = SSR/SST = SSR/SSTRR22 = SSR/SST = SSR/SST RR22 = 500.3285/599.7855 = .83418 = 500.3285/599.7855 = .83418
1414 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Adjusted Multiple CoefficientAdjusted Multiple Coefficientof Determinationof Determination
2 2 11 (1 )
1a
nR R
n p
2 20 1
1 (1 .834179) .81467120 2 1aR
2 20 1
1 (1 .834179) .81467120 2 1aR
The coefficient of determination R2 is the proportion of variability in a data set that is accounted for by a statistical model. In this definition, the term "variability" is defined as the sum of squares.
Adjusted R-square is a modification of R-square that adjusts for the number of terms in a model. R-square always increases when a new term is added to a model, but adjusted R-square increases only if the new term improves the model more than would be expected by chance.
decomposition.
1515 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity
The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables. The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables.
When the independent variables are highly correlatedWhen the independent variables are highly correlated (say, |(say, |r r | > .7), it is not possible to determine the| > .7), it is not possible to determine the separate effect of any particular independent variableseparate effect of any particular independent variable on the dependent variable.on the dependent variable.
When the independent variables are highly correlatedWhen the independent variables are highly correlated (say, |(say, |r r | > .7), it is not possible to determine the| > .7), it is not possible to determine the separate effect of any particular independent variableseparate effect of any particular independent variable on the dependent variable.on the dependent variable.
If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.
If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.
Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated. Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated.
1616 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
In many situations we must work with In many situations we must work with categoricalcategorical independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.
In many situations we must work with In many situations we must work with categoricalcategorical independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.
For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female. For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female.
Categorical Independent VariablesCategorical Independent Variables
In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable.. In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable..
1717 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
The years of experience, the score on the The years of experience, the score on the
programmer aptitude test, whether the individual hasprogrammer aptitude test, whether the individual has
a relevant graduate degree, and the annual salarya relevant graduate degree, and the annual salary
($000) for each of the sampled 20 programmers are($000) for each of the sampled 20 programmers are
shown on the next slide.shown on the next slide.
Categorical Independent VariablesCategorical Independent Variables
Example: Example: Programmer Salary Survey Programmer Salary Survey
As an extension of the problem involving theAs an extension of the problem involving thecomputer programmer salary survey, suppose computer programmer salary survey, suppose
thatthatmanagement also believes that the annual management also believes that the annual
salary issalary isrelated to whether the individual has a graduate related to whether the individual has a graduate degree in computer science or information degree in computer science or information
systems.systems.
1818 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
4477115588101000116666
9922101055668844663333
787810010086868282868684847575808083839191
8888737375758181747487877979949470708989
24.024.043.043.023.723.734.334.335.835.838.038.022.222.223.123.130.030.033.033.0
38.038.026.626.636.236.231.631.629.029.034.034.030.130.133.933.928.228.230.030.0
Exper.Exper.(Yrs.)(Yrs.)
TestTestScoreScore
TestTestScoreScore
Exper.Exper.(Yrs.)(Yrs.)
SalarySalary($000s)($000s)
SalarySalary($000s)($000s)DegrDegr..
NoNoYesYes NoNoYesYesYesYesYesYes NoNo NoNo NoNoYesYes
DegrDegr..
YesYes NoNoYesYes NoNo NoNoYesYes NoNoYesYes NoNo NoNo
Categorical Independent VariablesCategorical Independent Variables
1919 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Estimated Regression EquationEstimated Regression Equation
^
where:where:
yy = annual salary ($1000) = annual salary ($1000)
xx11 = years of experience = years of experience
xx22 = score on programmer aptitude test = score on programmer aptitude test
xx33 = 0 if individual = 0 if individual does notdoes not have a graduate degree have a graduate degree 1 if individual 1 if individual doesdoes have a graduate degree have a graduate degree
xx33 is a dummy variable is a dummy variable
yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + + bb33xx33
^
2020 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Excel’s Regression StatisticsExcel’s Regression Statistics
Categorical Independent VariablesCategorical Independent Variables
A B C23 24 SUMMARY OUTPUT2526 Regression Statistics27 Multiple R 0.92021523928 R Square 0.84679608529 Adjusted R Square 0.81807035130 Standard Error 2.39647510131 Observations 2032
2121 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Excel’s ANOVA OutputExcel’s ANOVA Output
Categorical Independent VariablesCategorical Independent Variables
A B C D E F3233 ANOVA34 df SS MS F Significance F35 Regression 3 507.896 169.2987 29.47866 9.41675E-0736 Residual 16 91.88949 5.74309337 Total 19 599.785538
2222 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
Excel’s Regression Equation OutputExcel’s Regression Equation Output
Categorical Independent VariablesCategorical Independent Variables
A B C D E3839 Coeffic. Std. Err. t Stat P-value40 Intercept 7.94485 7.3808 1.0764 0.297741 Experience 1.14758 0.2976 3.8561 0.001442 Test Score 0.19694 0.0899 2.1905 0.0436443 Grad. Degr. 2.28042 1.98661 1.1479 0.2678944
Not significantNot significant A B F G H I
3839 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%40 Intercept 7.94485 -7.701739 23.5914 -7.7017385 23.59143641 Experience 1.14758 0.516695 1.77847 0.51669483 1.778468642 Test Score 0.19694 0.00635 0.38752 0.00634964 0.387524343 Grad. Degr. 2.28042 -1.931002 6.49185 -1.9310017 6.491849444
2323 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
More Complex Categorical VariablesMore Complex Categorical Variables
If a categorical variable has If a categorical variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variablevariables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.
If a categorical variable has If a categorical variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variablevariables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.
For example, a variable with levels A, B, and C couldFor example, a variable with levels A, B, and C could be represented by be represented by xx11 and and xx22 values of (0, 0) for A, (1, 0) values of (0, 0) for A, (1, 0) for B, and (0,1) for C.for B, and (0,1) for C.
For example, a variable with levels A, B, and C couldFor example, a variable with levels A, B, and C could be represented by be represented by xx11 and and xx22 values of (0, 0) for A, (1, 0) values of (0, 0) for A, (1, 0) for B, and (0,1) for C.for B, and (0,1) for C.
Care must be taken in defining and interpreting theCare must be taken in defining and interpreting the dummy variables.dummy variables. Care must be taken in defining and interpreting theCare must be taken in defining and interpreting the dummy variables.dummy variables.
2424 Slide Slide
© 2016 Cengage Learning. All Rights Reserved. © 2016 Cengage Learning. All Rights Reserved.
For example, a variable indicating level of For example, a variable indicating level of education could be represented by education could be represented by xx11 and and xx22 values as follows:values as follows:
More Complex Categorical VariablesMore Complex Categorical Variables