Multiple Linear Regression I

transcript

8/11/2019 Multiple Linear Regression I

1/33

1/33

EC114 Introduction to Quantitative Economics17. Multiple Linear Regression I

Marcus Chambers

Department of EconomicsUniversity of Essex

28 February/01 March 2012

EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
http://find/


2/33

2/33

Outline

1 Introduction

2 Ordinary least squares with multiple regressors

3 The Classical Multiple Regression Model

Reference: R. L. Thomas,Using Statistics in Economics,

McGraw-Hill, 2005, sections 13.1 and 13.2.

http://find/


3/33

Introduction 3/33

So far we have been concerned with regression models

involving a single explanatory variable, X, of the form

Yi =+Xi+i, i= 1, . . . , n,

where andare the unknown population regressionparameters andi denotes a random disturbance.

We have also considered the set of Classical assumptions

onXandthat imply that the ordinary least squares (OLS)estimators ofand, denotedaandb, have goodsampling properties.

In particular, the OLS estimators are: best linear unbiased estimators (BLUE); efficient (under normality).

http://find/


4/33

Introduction 4/33

In addition to unbiasedness, the OLS estimators have the

smallest variance among linear unbiased estimators, and ifwe assume normality they have the smallest variance

among all unbiased estimators.

The OLS estimators therefore provide a good basis for

making inferences aboutand.

For example, we can use the results that

a

sa tn2 and

b

sb tn2

wheresa andsb are the estimated standard errors of aandb, respectively, to conduct hypothesis tests using the

t-distribution.

http://find/http://goback/


5/33

Introduction 5/33

However, many relationships that we study in Economics

are concerned with more than two variables,YandX.

For example, the demand for a good (Qd

) may depend notonly on its own price (P1) but also on consumers income

(M) and the prices of other goods (substitutes and

complements) (P2,P3,...) e.g.

Qd =f(P1,M,P2,P3, . . .).

We therefore need to extend our regression model to

include additional explanatory variables (regressors) while,

at the same time, keeping the desirable properties of the

OLS estimators in the two-variable model.

Fortunately it is possible to apply OLS to regressions with

multiple explanatory variables and the optimality properties

carry over under suitable assumptions.


O di l i h l i l /


6/33

Ordinary least squares with multiple regressors 6/33

We begin by assuming that a linear relationship exists

between a dependent variable Yandk 1explanatoryvariables,X2,X3, . . . , Xk:

Y=1+2X2+3X3+. . .+kXk+,

where is a random disturbance and the j (j= 1, . . . , k)are constants.

Note that it is common to denote the first explanatory

variable byX2 rather thanX1.

In fact, it is convenient to interpret the intercept 1 as thecoefficient on a variableX1 that always takes the value 1.


O di l t ith lti l 7/33
http://find/


7/33


Assuming thatE() = 0and taking the Xvalues as given,

we obtain

E(Y) =1+2X2+3X3+. . .+kXk;

this is thepopulation regression equation.

Each coefficientj represents the effect onE(Y)of a unitchange inXj holding all otherXvariables constant.

For example,2 measures the change in E(Y)when X2changes by one unit; it is the partial derivative E(Y)/X2.

Thej coefficients are population parameters; their valuesare unknown and we aim to estimate them from a sampleof observations onYand the Xs.


http://find/


8/33


We shall use the following notation:

Yi: observationion the dependent variableY;

Xji: observationion explanatory variable Xj;

i: the (unobserved) value of for observationi.

For example, observation 6 consists of

Y6,X26,X36, . . . ,Xk6;

these values are related by

Y6=1+2X26+3X36+. . .+kXk6+6.

For a general observation iwe have

Yi =1+2X2i+3X3i+. . .+kXki+i, i= 1, . . . , n.


http://find/


9/33


Suppose we estimate1, . . . , kusing b1, . . . , bk; for thetime being we shall not specify how the estimates are

obtained.

Thesample regression equationcorresponding to

b1, . . . , bk is

Yi = b1+ b2X2i+ b3X3i+. . .+bkXki, i= 1, . . . , n;

the Yi (i= 1, . . . , n)are the fitted (or predicted) values of Y.

The difference betweenYand Yis, as before, called aresidual, and is denoted

ei=Yi Yi, i= 1, . . . , n.

We can also write

Yi =b1+ b2X2i+ b3X3i+. . .+bkXki+ ei, i= 1, . . . , n.


http://find/


10/33


How do we chooseb1, . . . , bk?

The method of ordinary least squares (OLS) chooses theestimates so as to minimise the sum of squared residuals,

S=

e2i .

We can expresseexplcitly in terms of b1, . . . , bk:

ei = Yi b1 b2X2i b3X3i . . . bkXki, i= 1, . . . , n.

It follows that the objective function is

S=

n

i=1

e2

i =

n

i=1

(Yi b1 b2X2i b3X3i . . . bkXki)2

.


http://find/


11/33


In order to minimiseSwith respect tob1, . . . , bkwe must:

(i) partially differentiateSwith respect to eachbj;

(ii) set thekpartial derivatives equal to zero and solve forb1, . . . , bk.

In step (i) we obtain

S

b1 ,

S

b2 , . . . ,

S

bk.

In step (ii) we equate to zero and solve the following k

equations jointly:

Sb1

= 0, Sb2

= 0, . . . , Sbk

= 0.

Askgets larger this becomes more and more difficult!


http://find/


12/33


For an arbitrary value ofkit is possible to write the solution

compactly in terms of matrices and vectors.In practice we rely on computer software to compute OLS

estimates based on such a representation of the solution.

Example. We return to the money demand example first

encountered in Lecture 11.

Our two-variable regression of money stock (Y) on GDPX2yielded

Y= 0.0212+ 0.1749X2,

based on our sample of 30 countries in 1985.

Suppose we also add the rate of interest variable, X3, to

the regression; we obtain the following output in Stata:


http://find/


13/33

y q p g

. regress m g ir

Source | SS df MS Number of obs = 30

-------------+------------------------------ F( 2, 27) = 47.03

Model | 20.5135791 2 10.2567896 Prob > F = 0.0000Residual | 5.88865732 27 .218098419 R-squared = 0.7770

-------------+------------------------------ Adj R-squared = 0.7604

Total | 26.4022364 29 .910421946 Root MSE = .46701

------------------------------------------------------------------------------

m | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

g | .172615 .0183198 9.42 0.000 .1350258 .2102042

ir | -.0006758 .0008844 -0.76 0.451 -.0024904 .0011388_cons | .0569582 .125639 0.45 0.654 -.2008317 .3147481

------------------------------------------------------------------------------

The regression results, including standard errors in

parentheses, can be represented as:

Y= 0.0570 + 0.1726 X2 0.000676 X3,(0.1256) (0.0183) (0.000884)

withR2 = 0.777.


http://find/


14/33

y q p g

The magnitudes of the estimated coefficients differ

substantially, with the coefficient on X3 appearing to be

very small.

But this reflects the relative units of measurement of X3,

which is measured as, for example, 16% rather than 0.16.

If we had used the latter units of measurement (i.e. dividing

all observations onX3 by 100), then then estimatedcoefficient would have been 100 times larger.

Remember that statistical significance of a variable is

tested using at-test and is not judged by the magnitude of

the estimated coefficient!If we add another regressor,X4 (the rate of price inflation),

we obtain:




15/33

. regress m g ir pi


-------------+------------------------------ F( 3, 26) = 30.70

Model | 20.5893701 3 6.86312337 Prob > F = 0.0000

Residual | 5.81286631 26 .223571781 R-squared = 0.7798-------------+------------------------------ Adj R-squared = 0.7544

Total | 26.4022364 29 .910421946 Root MSE = .47283

------------------------------------------------------------------------------


-------------+----------------------------------------------------------------

g | .1703745 .0189433 8.99 0.000 .1314361 .2093129

ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967

pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475

------------------------------------------------------------------------------

These regression results, including standard errors in

parentheses, can be represented as:

Y = 0.0894 + 0.1704 X2 0.000169 X3 0.0022 X4,(0.1388) (0.0189) (0.001248) (0.0038)

withR2 = 0.7798.


http://goforward/http://find/http://goback/


16/33

We could also carry out the estimations using logarithms of

the variables; for example

. regress lm lg lir


-------------+------------------------------ F( 2, 27) = 175.53

Model | 59.8192409 2 29.9096204 Prob > F = 0.0000

Residual | 4.60058503 27 .170392038 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9233

Total | 64.4198259 29 2.22137331 Root MSE = .41279

------------------------------------------------------------------------------

lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lg | 1.026927 .0570772 17.99 0.000 .9098146 1.14404

lir | -.2486999 .0671987 -3.70 0.001 -.3865802 -.1108195

_cons | -1.248211 .1991953 -6.27 0.000 -1.656926 -.8394964

------------------------------------------------------------------------------


http://find/


17/33

The logarithmic results can be represented as

ln(Y)=1.2482 + 1.0269 ln(X2) 0.2487 ln(X3),

(0.1992) (0.0571) (0.0672)

withR2 = 0.9286and where figures in parentheses arestandard errors.

The estimated coefficients now have the interpretation of

elasticities.For example, the income elasticity of the demand for

money is estimated to be 1.0269, while the interest rateelasticity of money demand is estimated as 0.2487.

However, in order to conduct formal hypothesis tests, weneed to know the sampling properties of the OLS

estimators, and to do that we need to make some

assumptions. . .


The Classical Multiple Regression Model 18/33
http://find/


18/33

Just as in the two-variable regression model, the OLS

estimators in the multiple regression model are subject to

sampling variability.The properties of the OLS estimators and their

distributions depend on the conditions under which they

are obtained i.e. the assumptions made.

We have already studied the assumptions of the

two-variable Classical model, and the Classical multiple

regression model is basically a straightforward extension of

the two-variable case.

The assumptions we need to make concern the

explanatory variablesX2, . . . ,Xkand the error term.As before we shall focus on the small-sample properties of

the estimators and shall ignore large sample (n)properties.




19/33

The assumptions concerning the regressors are as follows:

Assumptions concerning the explanatory variables

IA (non-random): X2, . . . ,Xkare non-stochastic;

IB (fixed): The values ofX2, . . . ,Xkare fixed in

repeated samples;ID (no collinearity): There exist no exact linear relationships

between the sample values of any two

or more of the explanatory variables.

Note that Assumption IC, used in Thomas, is a

large-sample assumption and has been omitted here.




20/33

Assumptions IA (non-random) and IB (fixed) are identical

to the two-variable model but are now applied to all

regressors.

It means that X2. . . ,Xkare not random variables and thesame values would appear in each sample if it were

possible to conduct repeated sampling.

The new assumption is ID (no collinearity) which has no

equivalent in the two-variable model.

It is included in order to rule out the possibility of what is

calledperfect multicollinearity, which we will study in

more detail in Lecture 18.

For now, simply note that the assumption rules out thepossibility that, for example, X3i = 5+ 2X2i for all i.

If Assumption ID is violated then all estimation methods,

including OLS, are infeasible.




21/33

The assumptions concerning are the same as in thetwo-variable model.

For completeness they are repeated below:

Assumptions concerning the disturbances

IIA (zero mean): E(i) = 0, for alli;

IIB (constant variance): V(i) =2

=constant for all i;IIC (zero covariance): Cov(i, j) = 0for alli =j;

IID (normality): eachi is normally distributed.

These assumptions govern the properties of therandompart of the model.

Given thatX2, . . . ,Xkare fixed they therefore govern thevariation inYin repeated samples.




22/33

Assumption IIA (zero mean) implies that the average

effect of in repeated samples is zero and the value ofY,on average, is:

E(Yi) = E(1+2X2i+. . .+kXki+i)

= 1+2X2i+. . .+kXki+ E(i)

= 1+2X2i+. . .+kXki, i= 1, . . . , n,

becauseE(i) = 0under IIA.

Note thatE(Yi)is not the same for eachibut depends on

X2i, . . . ,Xki which are not constant throughout the sample(if they were constant they would violate Assumption ID).




23/33

Recall that combining IIA (zero mean), IIB (constant

variance) and IID (normality) gives

i N(0, 2), i= 1, . . . , n.

Note that

YiE(Y

i) =Y

i 1 2X2

i . . .

kX

ki=

i;

this implies that

V(Yi) =E(Yi E(Yi))2 =E(2i) =V(i) =

2

which in turn implies that

Yi N1+2X2i+. . .+kXki,

2, i= 1, . . . , n.


http://find/


24/33

The implications of the Assumptions for the OLS

estimators can be summarised as follows:

Property Assumptions

Linearity IA, IB, ID

Unbiasedness IA, IB, ID, IIA

BLUness IA, IB, ID, IIA, IIB, IICEfficiency IA, IB, ID, IIA, IIB, IIC, IID

Normality IA, IB, ID, IIA, IIB, IIC, IID

These are the same as in the two-variable model exceptthat we now require Assumption ID (no collinearity) in all

cases.




25/33

The unbiasedness and normality properties imply that

bj Nj, 2

bj , j= 1, . . . , k,

which can be used as a basis for inference.

Fork> 2the variances,2bj , are complicated functions of

the regressors, but all are proportional to 2 =V().

In order to conduct inference we therefore need toestimate2.

A generalisation of the estimator in the two-variable model

is used for this, and is given by

s2 =

e2i

n k;

it is an unbiased estimator i.e. E(s2) =2.




26/33

Note that the denominator of s2 involves n k.

This is because we have had to estimate kparameters(1, . . . , k)in order to compute the residuals e1, . . . , en andhave therefore lost kdegrees of freedom.

If we uses2 in the (complicated) formulae for the estimator

variances we obtain the estimated variances s2

bj(j= 1, . . . , k).

It follows that, for inference, we then use Students

t-distribution instead of the normal distribution:

bj

jsbj

tnk.


http://find/


27/33

So, to test the significance of a regressor,Xj, i.e. to test

H0:j = 0 against HA:j = 0,

we can use the test statistic

TS=

bj

sbj tnk under H0.

Lett0.025denote the 5% critical value from the tnkdistribution that puts 2.5% of the distribution into each tail.

As before the decision rule is:if |TS| >t0.025rejectH0; if |TS| < t0.025do not rejectH0.




28/33

We can also use the t-distribution to form confidenceintervals (CIs) for the unknown population parameters

1, . . . , k.

Witht0.025 as defined on the previous slide, a 95% CI for j

is of the form

bj t0.025sbj orbj t0.025sbj , bj+ t0.025sbj

,

i.e. we are 95% confident thatj lies in this interval.




29/33

Example. Lets return to the money demand data wherewe estimated the model

Y=1+2X2+3X3+4X4+,

whereYdenotes money stock, X2 is GDP,X3 is the interestrate andX4 is the rate of price inflation.

Lets test the hypotheses2= 0and3= 0and find a 95%confidence interval for4.

The regression output is as follows:


http://find/


30/33

. regress m g ir pi


-------------+------------------------------ F( 3, 26) = 30.70Model | 20.5893701 3 6.86312337 Prob > F = 0.0000

Residual | 5.81286631 26 .223571781 R-squared = 0.7798

-------------+------------------------------ Adj R-squared = 0.7544

Total | 26.4022364 29 .910421946 Root MSE = .47283

------------------------------------------------------------------------------


-------------+----------------------------------------------------------------

g | .1703745 .0189433 8.99 0.000 .1314361 .2093129ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967

pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592

_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475

------------------------------------------------------------------------------

Note thatt-ratios for testingj =

0are given in the output

above, as are 95% CIs, but we shall go through the

calculations nonetheless!


http://find/


31/33

To testH0:2= 0againstHA:2= 0we use

TS= b2

sb2= 0.1704

0.0189= 8.99 t26 under H0.

The 5% critical value for a two-tail test from the t26distribution is 2.056.

As |TS| = 8.99> 2.056we rejectH0 in favour ofHAi.e. there is evidence that2= 0and hence that GDP is asignificant determinant of the money stock.


http://find/


32/33

Repeating the process for3 we obtain

TS=0.0001693

0.001248 = 0.14.

Here |TS| = 0.14< 2.056and hence we do not rejectH0:3= 0i.e. we are unable to reject the hypothesis thatthe interest rate isnota significant determinant of money.

A 95% CI for4 is obtained as

b4 t0.025sb4 = 0.002197 (2.056 0.003773)

which gives 0.002197 0.007757or[0.00995, 0.00556].


Summary 33/33

Summary
http://find/


33/33

Summary

the Classical multiple linear regression model

Next week:

the problem of multicollinearitymaking inferences

http://find/

Multiple Linear Regression I

Documents