Post on 02-Jun-2018
transcript
8/11/2019 Multiple Linear Regression I
1/33
1/33
EC114 Introduction to Quantitative Economics17. Multiple Linear Regression I
Marcus Chambers
Department of EconomicsUniversity of Essex
28 February/01 March 2012
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/8/11/2019 Multiple Linear Regression I
2/33
2/33
Outline
1 Introduction
2 Ordinary least squares with multiple regressors
3 The Classical Multiple Regression Model
Reference: R. L. Thomas,Using Statistics in Economics,
McGraw-Hill, 2005, sections 13.1 and 13.2.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/8/11/2019 Multiple Linear Regression I
3/33
Introduction 3/33
So far we have been concerned with regression models
involving a single explanatory variable, X, of the form
Yi =+Xi+i, i= 1, . . . , n,
where andare the unknown population regressionparameters andi denotes a random disturbance.
We have also considered the set of Classical assumptions
onXandthat imply that the ordinary least squares (OLS)estimators ofand, denotedaandb, have goodsampling properties.
In particular, the OLS estimators are: best linear unbiased estimators (BLUE); efficient (under normality).
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/8/11/2019 Multiple Linear Regression I
4/33
Introduction 4/33
In addition to unbiasedness, the OLS estimators have the
smallest variance among linear unbiased estimators, and ifwe assume normality they have the smallest variance
among all unbiased estimators.
The OLS estimators therefore provide a good basis for
making inferences aboutand.
For example, we can use the results that
a
sa tn2 and
b
sb tn2
wheresa andsb are the estimated standard errors of aandb, respectively, to conduct hypothesis tests using the
t-distribution.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/http://goback/8/11/2019 Multiple Linear Regression I
5/33
Introduction 5/33
However, many relationships that we study in Economics
are concerned with more than two variables,YandX.
For example, the demand for a good (Qd
) may depend notonly on its own price (P1) but also on consumers income
(M) and the prices of other goods (substitutes and
complements) (P2,P3,...) e.g.
Qd =f(P1,M,P2,P3, . . .).
We therefore need to extend our regression model to
include additional explanatory variables (regressors) while,
at the same time, keeping the desirable properties of the
OLS estimators in the two-variable model.
Fortunately it is possible to apply OLS to regressions with
multiple explanatory variables and the optimality properties
carry over under suitable assumptions.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
O di l i h l i l /
http://find/http://goback/8/11/2019 Multiple Linear Regression I
6/33
Ordinary least squares with multiple regressors 6/33
We begin by assuming that a linear relationship exists
between a dependent variable Yandk 1explanatoryvariables,X2,X3, . . . , Xk:
Y=1+2X2+3X3+. . .+kXk+,
where is a random disturbance and the j (j= 1, . . . , k)are constants.
Note that it is common to denote the first explanatory
variable byX2 rather thanX1.
In fact, it is convenient to interpret the intercept 1 as thecoefficient on a variableX1 that always takes the value 1.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
O di l t ith lti l 7/33
http://find/8/11/2019 Multiple Linear Regression I
7/33
Ordinary least squares with multiple regressors 7/33
Assuming thatE() = 0and taking the Xvalues as given,
we obtain
E(Y) =1+2X2+3X3+. . .+kXk;
this is thepopulation regression equation.
Each coefficientj represents the effect onE(Y)of a unitchange inXj holding all otherXvariables constant.
For example,2 measures the change in E(Y)when X2changes by one unit; it is the partial derivative E(Y)/X2.
Thej coefficients are population parameters; their valuesare unknown and we aim to estimate them from a sampleof observations onYand the Xs.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 8/33
http://find/8/11/2019 Multiple Linear Regression I
8/33
Ordinary least squares with multiple regressors 8/33
We shall use the following notation:
Yi: observationion the dependent variableY;
Xji: observationion explanatory variable Xj;
i: the (unobserved) value of for observationi.
For example, observation 6 consists of
Y6,X26,X36, . . . ,Xk6;
these values are related by
Y6=1+2X26+3X36+. . .+kXk6+6.
For a general observation iwe have
Yi =1+2X2i+3X3i+. . .+kXki+i, i= 1, . . . , n.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 9/33
http://find/8/11/2019 Multiple Linear Regression I
9/33
Ordinary least squares with multiple regressors 9/33
Suppose we estimate1, . . . , kusing b1, . . . , bk; for thetime being we shall not specify how the estimates are
obtained.
Thesample regression equationcorresponding to
b1, . . . , bk is
Yi = b1+ b2X2i+ b3X3i+. . .+bkXki, i= 1, . . . , n;
the Yi (i= 1, . . . , n)are the fitted (or predicted) values of Y.
The difference betweenYand Yis, as before, called aresidual, and is denoted
ei=Yi Yi, i= 1, . . . , n.
We can also write
Yi =b1+ b2X2i+ b3X3i+. . .+bkXki+ ei, i= 1, . . . , n.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 10/33
http://find/8/11/2019 Multiple Linear Regression I
10/33
Ordinary least squares with multiple regressors 10/33
How do we chooseb1, . . . , bk?
The method of ordinary least squares (OLS) chooses theestimates so as to minimise the sum of squared residuals,
S=
e2i .
We can expresseexplcitly in terms of b1, . . . , bk:
ei = Yi b1 b2X2i b3X3i . . . bkXki, i= 1, . . . , n.
It follows that the objective function is
S=
n
i=1
e2
i =
n
i=1
(Yi b1 b2X2i b3X3i . . . bkXki)2
.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 11/33
http://find/8/11/2019 Multiple Linear Regression I
11/33
Ordinary least squares with multiple regressors 11/33
In order to minimiseSwith respect tob1, . . . , bkwe must:
(i) partially differentiateSwith respect to eachbj;
(ii) set thekpartial derivatives equal to zero and solve forb1, . . . , bk.
In step (i) we obtain
S
b1 ,
S
b2 , . . . ,
S
bk.
In step (ii) we equate to zero and solve the following k
equations jointly:
Sb1
= 0, Sb2
= 0, . . . , Sbk
= 0.
Askgets larger this becomes more and more difficult!
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 12/33
http://find/8/11/2019 Multiple Linear Regression I
12/33
Ordinary least squares with multiple regressors 12/33
For an arbitrary value ofkit is possible to write the solution
compactly in terms of matrices and vectors.In practice we rely on computer software to compute OLS
estimates based on such a representation of the solution.
Example. We return to the money demand example first
encountered in Lecture 11.
Our two-variable regression of money stock (Y) on GDPX2yielded
Y= 0.0212+ 0.1749X2,
based on our sample of 30 countries in 1985.
Suppose we also add the rate of interest variable, X3, to
the regression; we obtain the following output in Stata:
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 13/33
http://find/8/11/2019 Multiple Linear Regression I
13/33
y q p g
. regress m g ir
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2, 27) = 47.03
Model | 20.5135791 2 10.2567896 Prob > F = 0.0000Residual | 5.88865732 27 .218098419 R-squared = 0.7770
-------------+------------------------------ Adj R-squared = 0.7604
Total | 26.4022364 29 .910421946 Root MSE = .46701
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .172615 .0183198 9.42 0.000 .1350258 .2102042
ir | -.0006758 .0008844 -0.76 0.451 -.0024904 .0011388_cons | .0569582 .125639 0.45 0.654 -.2008317 .3147481
------------------------------------------------------------------------------
The regression results, including standard errors in
parentheses, can be represented as:
Y= 0.0570 + 0.1726 X2 0.000676 X3,(0.1256) (0.0183) (0.000884)
withR2 = 0.777.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 14/33
http://find/8/11/2019 Multiple Linear Regression I
14/33
y q p g
The magnitudes of the estimated coefficients differ
substantially, with the coefficient on X3 appearing to be
very small.
But this reflects the relative units of measurement of X3,
which is measured as, for example, 16% rather than 0.16.
If we had used the latter units of measurement (i.e. dividing
all observations onX3 by 100), then then estimatedcoefficient would have been 100 times larger.
Remember that statistical significance of a variable is
tested using at-test and is not judged by the magnitude of
the estimated coefficient!If we add another regressor,X4 (the rate of price inflation),
we obtain:
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 15/33
http://find/http://goback/8/11/2019 Multiple Linear Regression I
15/33
. regress m g ir pi
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
Residual | 5.81286631 26 .223571781 R-squared = 0.7798-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
These regression results, including standard errors in
parentheses, can be represented as:
Y = 0.0894 + 0.1704 X2 0.000169 X3 0.0022 X4,(0.1388) (0.0189) (0.001248) (0.0038)
withR2 = 0.7798.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 16/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
16/33
We could also carry out the estimations using logarithms of
the variables; for example
. regress lm lg lir
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2, 27) = 175.53
Model | 59.8192409 2 29.9096204 Prob > F = 0.0000
Residual | 4.60058503 27 .170392038 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9233
Total | 64.4198259 29 2.22137331 Root MSE = .41279
------------------------------------------------------------------------------
lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lg | 1.026927 .0570772 17.99 0.000 .9098146 1.14404
lir | -.2486999 .0671987 -3.70 0.001 -.3865802 -.1108195
_cons | -1.248211 .1991953 -6.27 0.000 -1.656926 -.8394964
------------------------------------------------------------------------------
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Ordinary least squares with multiple regressors 17/33
http://find/8/11/2019 Multiple Linear Regression I
17/33
The logarithmic results can be represented as
ln(Y)=1.2482 + 1.0269 ln(X2) 0.2487 ln(X3),
(0.1992) (0.0571) (0.0672)
withR2 = 0.9286and where figures in parentheses arestandard errors.
The estimated coefficients now have the interpretation of
elasticities.For example, the income elasticity of the demand for
money is estimated to be 1.0269, while the interest rateelasticity of money demand is estimated as 0.2487.
However, in order to conduct formal hypothesis tests, weneed to know the sampling properties of the OLS
estimators, and to do that we need to make some
assumptions. . .
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 18/33
http://find/8/11/2019 Multiple Linear Regression I
18/33
Just as in the two-variable regression model, the OLS
estimators in the multiple regression model are subject to
sampling variability.The properties of the OLS estimators and their
distributions depend on the conditions under which they
are obtained i.e. the assumptions made.
We have already studied the assumptions of the
two-variable Classical model, and the Classical multiple
regression model is basically a straightforward extension of
the two-variable case.
The assumptions we need to make concern the
explanatory variablesX2, . . . ,Xkand the error term.As before we shall focus on the small-sample properties of
the estimators and shall ignore large sample (n)properties.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 19/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
19/33
The assumptions concerning the regressors are as follows:
Assumptions concerning the explanatory variables
IA (non-random): X2, . . . ,Xkare non-stochastic;
IB (fixed): The values ofX2, . . . ,Xkare fixed in
repeated samples;ID (no collinearity): There exist no exact linear relationships
between the sample values of any two
or more of the explanatory variables.
Note that Assumption IC, used in Thomas, is a
large-sample assumption and has been omitted here.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 20/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
20/33
Assumptions IA (non-random) and IB (fixed) are identical
to the two-variable model but are now applied to all
regressors.
It means that X2. . . ,Xkare not random variables and thesame values would appear in each sample if it were
possible to conduct repeated sampling.
The new assumption is ID (no collinearity) which has no
equivalent in the two-variable model.
It is included in order to rule out the possibility of what is
calledperfect multicollinearity, which we will study in
more detail in Lecture 18.
For now, simply note that the assumption rules out thepossibility that, for example, X3i = 5+ 2X2i for all i.
If Assumption ID is violated then all estimation methods,
including OLS, are infeasible.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 21/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
21/33
The assumptions concerning are the same as in thetwo-variable model.
For completeness they are repeated below:
Assumptions concerning the disturbances
IIA (zero mean): E(i) = 0, for alli;
IIB (constant variance): V(i) =2
=constant for all i;IIC (zero covariance): Cov(i, j) = 0for alli =j;
IID (normality): eachi is normally distributed.
These assumptions govern the properties of therandompart of the model.
Given thatX2, . . . ,Xkare fixed they therefore govern thevariation inYin repeated samples.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 22/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
22/33
Assumption IIA (zero mean) implies that the average
effect of in repeated samples is zero and the value ofY,on average, is:
E(Yi) = E(1+2X2i+. . .+kXki+i)
= 1+2X2i+. . .+kXki+ E(i)
= 1+2X2i+. . .+kXki, i= 1, . . . , n,
becauseE(i) = 0under IIA.
Note thatE(Yi)is not the same for eachibut depends on
X2i, . . . ,Xki which are not constant throughout the sample(if they were constant they would violate Assumption ID).
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 23/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
23/33
Recall that combining IIA (zero mean), IIB (constant
variance) and IID (normality) gives
i N(0, 2), i= 1, . . . , n.
Note that
YiE(Y
i) =Y
i 1 2X2
i . . .
kX
ki=
i;
this implies that
V(Yi) =E(Yi E(Yi))2 =E(2i) =V(i) =
2
which in turn implies that
Yi N1+2X2i+. . .+kXki,
2, i= 1, . . . , n.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 24/33
http://find/8/11/2019 Multiple Linear Regression I
24/33
The implications of the Assumptions for the OLS
estimators can be summarised as follows:
Property Assumptions
Linearity IA, IB, ID
Unbiasedness IA, IB, ID, IIA
BLUness IA, IB, ID, IIA, IIB, IICEfficiency IA, IB, ID, IIA, IIB, IIC, IID
Normality IA, IB, ID, IIA, IIB, IIC, IID
These are the same as in the two-variable model exceptthat we now require Assumption ID (no collinearity) in all
cases.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 25/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
25/33
The unbiasedness and normality properties imply that
bj Nj, 2
bj , j= 1, . . . , k,
which can be used as a basis for inference.
Fork> 2the variances,2bj , are complicated functions of
the regressors, but all are proportional to 2 =V().
In order to conduct inference we therefore need toestimate2.
A generalisation of the estimator in the two-variable model
is used for this, and is given by
s2 =
e2i
n k;
it is an unbiased estimator i.e. E(s2) =2.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 26/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
26/33
Note that the denominator of s2 involves n k.
This is because we have had to estimate kparameters(1, . . . , k)in order to compute the residuals e1, . . . , en andhave therefore lost kdegrees of freedom.
If we uses2 in the (complicated) formulae for the estimator
variances we obtain the estimated variances s2
bj(j= 1, . . . , k).
It follows that, for inference, we then use Students
t-distribution instead of the normal distribution:
bj
jsbj
tnk.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 27/33
http://find/8/11/2019 Multiple Linear Regression I
27/33
So, to test the significance of a regressor,Xj, i.e. to test
H0:j = 0 against HA:j = 0,
we can use the test statistic
TS=
bj
sbj tnk under H0.
Lett0.025denote the 5% critical value from the tnkdistribution that puts 2.5% of the distribution into each tail.
As before the decision rule is:if |TS| >t0.025rejectH0; if |TS| < t0.025do not rejectH0.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 28/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
28/33
We can also use the t-distribution to form confidenceintervals (CIs) for the unknown population parameters
1, . . . , k.
Witht0.025 as defined on the previous slide, a 95% CI for j
is of the form
bj t0.025sbj orbj t0.025sbj , bj+ t0.025sbj
,
i.e. we are 95% confident thatj lies in this interval.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 29/33
http://goforward/http://find/http://goback/8/11/2019 Multiple Linear Regression I
29/33
Example. Lets return to the money demand data wherewe estimated the model
Y=1+2X2+3X3+4X4+,
whereYdenotes money stock, X2 is GDP,X3 is the interestrate andX4 is the rate of price inflation.
Lets test the hypotheses2= 0and3= 0and find a 95%confidence interval for4.
The regression output is as follows:
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 30/33
http://find/8/11/2019 Multiple Linear Regression I
30/33
. regress m g ir pi
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 3, 26) = 30.70Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
Residual | 5.81286631 26 .223571781 R-squared = 0.7798
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
Note thatt-ratios for testingj =
0are given in the output
above, as are 95% CIs, but we shall go through the
calculations nonetheless!
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 31/33
http://find/8/11/2019 Multiple Linear Regression I
31/33
To testH0:2= 0againstHA:2= 0we use
TS= b2
sb2= 0.1704
0.0189= 8.99 t26 under H0.
The 5% critical value for a two-tail test from the t26distribution is 2.056.
As |TS| = 8.99> 2.056we rejectH0 in favour ofHAi.e. there is evidence that2= 0and hence that GDP is asignificant determinant of the money stock.
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
The Classical Multiple Regression Model 32/33
http://find/8/11/2019 Multiple Linear Regression I
32/33
Repeating the process for3 we obtain
TS=0.0001693
0.001248 = 0.14.
Here |TS| = 0.14< 2.056and hence we do not rejectH0:3= 0i.e. we are unable to reject the hypothesis thatthe interest rate isnota significant determinant of money.
A 95% CI for4 is obtained as
b4 t0.025sb4 = 0.002197 (2.056 0.003773)
which gives 0.002197 0.007757or[0.00995, 0.00556].
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
Summary 33/33
Summary
http://find/8/11/2019 Multiple Linear Regression I
33/33
Summary
the Classical multiple linear regression model
Next week:
the problem of multicollinearitymaking inferences
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/