Linear Regression Model

THE LINEAR REGRESSION MODELLectures 1 and 2

Francis Kramarz and Michael Visser

MASTER 1 EPP

2012

THE SIMPLE LINEAR REGRESSION MODEL

Introduction

DefinitionA simple linear regression model is a regression model where thedependent variable is continuous, explained by a single exogenousvariable, and linear in the parameters.

Theoretical model : Y = 0 + 1X The model is linear in theparameters 0 and 1. The single explanatory variable can becontinuous or discrete.

Assumption (1) (Random sampling)

(Xi , Yi ); i = 1, ..., n is a random sample of size n from thepopulation.

The sample is randomly drawn from the population of interest. Foreach individual of the sample, we observe Xi and Yi we want toestimate the simple linear model Yi = 0 + 1Xi + ui .

The error term captures all relevant variables not included in themodel because they are not observed in the data set (for exampleability, dynamism ...)

Assumption (2) (Sample variation)

j and k such that Xj 6= Xk .If all the Xi in the sample take the same value, the slope parameter1 cannot be identified.

Assumption (3) (zero mean)

E (ui ) = 0

The average value of the error term is 0. This is not a restrictiveassumption. If E (u) = , then we can rewrite the model asfollows : Y = 0 + + 1X + e, where e = u .Assumption (4) (zero conditional mean)

E (ui | Xi ) = 0This is a crucial and strong assumption.

The Ordinary Least Squares estimator

DefinitionThe OLS estimators 0 and 1 minimize the sum of squaredresiduals S(0, 1) = ni=1 u2i =

ni=1(Yi 0 1Xi )2

First order Conditions

S(0, 1)

0= 2

n

i=1

(Yi 0 1Xi ) = 0 (1)

S(0, 1)

1= 2

n

i=1

Xi (Yi 0 1Xi ) = 0 (2)

Proposition

0 = Y 1X and 1 = ni=1(YiY )(XiX )ni=1(XiX 2)

Proof

Using equation (1), equation (2) turns into 1 =ni=1(YiY )Xi

(XiX )Xi .

ni=1 Xi (Xi X ) = ni=1(Xi X )2and ni=1 Xi (Yi Y ) = ni=1(Xi X )(Yi Y ) , which gives theresult.

Remarks

I 0 is the constant (or intercept) Yi = 0 if Xi = 0I 1 is the slope estimate and measures the effect of Xi on Yi

Definition

I Yi = 0 + 1Xi is the predicted value of Y for individual i

I ui = Yi Yi is the residual for individual i. The OLSestimates minimize the sum of squared residuals.

I SST = ni=1(Yi Y )2 is the total variation in YI SSE = ni=1(Yi Y )2 is the explained sum of squaresI SSR = ni=1 u2i is the regression sum of squares

How well does the regression line fits the sample data ?

Proposition

SST = SSE + SSR (3)

This suggests a fitting criterionDefinitionLet R2 = SSESST = 1 SSRSST .R2 measures the proportion of the variation in Y explained byvariation in X .

Remarks

1. 0

Example : Determinants of the monthly wageI Data set French sample of the European Community

Household PanelI Dependent variable wage = monthly wageI Explanatory variable

1. age (continuous variable)2. sup= college graduate (discrete variable 0, 1)

I The sample is restricted to the individuals aged 20 60 andemployed in 2000 (n=5010)

I Descriptive statisticsI wage= 1432.3 eurosI age = 38.87 yearsI college graduates = 30.64 %

SAS ProgramProc reg data=c ;model wage = age ;model wage = sup ;run ;

Figure: Effect of age on wage

Figure: Effect of education on wage

Finite-sample properties of the OLS estimatorI Other estimation methods exist and could be used (e.g.

maximum likelihood, ...). How to choose ?I By comparing their properties in terms of

1. unbiasedness2. precision (minimization of the variance)

1. Unbiasedness

Proposition

Let Assumptions (1) to (4) be verified. Then E (0) = 0 andE (1) = 1. The OLS estimators are unbiased estimators of theparameters.

The sampling distribution of estimators is centered around thetrue parameter. If we could draw an infinite number of samples ofsize n from the population, and take the average of the infinitenumber of OLS estimates, we would obtain the true value of 0and 1. BUT it does not mean that the particular OLS estimatesobtained using a given sample of size n are equal to the true valuesof the parameters.

2. PrecisionThe question is now : are our OLS estimates far from the truevalues of 0 and 1 ?

Assumption (5) (Homoscedasticity and non autocorrelation)

V (ui | Xi ) = 2 and corr(ui , uj | Xi , Xj ) = 0Remarks

1. is the standard error

2. is unknown, since u represents all the unobservedexplanatory variables.

3. Under assumption (5), V (Yi | Xi ) = 24. Thus the variance of Y , given X , does not vary with X

(strong assumption)

5. Under assumptions (4) and (5), E (u2i | Xi ) = 2

Proposition

Let assumptions (1) to (5) be verified. Then

V (0 | X ) = 2n2(ni=1 Xi )2

ni=1(Xi X )2(4)

V (1 | X ) = 2

ni=1(Xi X )2(5)

Remarks

1. V (0) and V (1) increase with 2 (the higher the variance ofthe error term, the more difficult it is to estimate theparameters with precision).

2. V (0) and V (1) decrease with ni=1(Xi X )23. V (0) and V (1) decrease with n

4. As is unknown,V (0) and V (1) are also unknown

5. BUT can be estimated using the sum of squared residuals

ni=1 u2i

Proposition

2 = ni=1 u

2i

n2 =SSRn2 is an unbiased estimator of

2 E (2) = 2

Replacing by in the variance of the estimators give unbiasedestimators of V (0) and V (1).

We want to find the best unbiased linear estimator (BLUE) of 0and 1.

How is defined the best estimator ? It is, among the unbiased linearestimators, the one with the smaller variance.

Theorem (Gauss-Markov)

Under assumptions (1) to (5), the OLS estimators 0 and 1 arethe best linear unbiased estimators of respectively 0 and 1.

The effect of omitting relevant variables

I Assume the true model is Y = 0 + 1X1 + 2X2 + u, whilethe estimated model is Y = 0 + 1X1 + e.

X2 is omitted and e = 2X2 + uI Example : both age and education affect the wageI Let 0 and 1 denote the OLS estimators of

Y = 0 + 1X1 + e.

I Are they biased estimators of 0 and 1 ?I There are 2 cases

1. If Cov(X1, X2) = 0 or 2 = 0, then 0 and 1 are unbiased.

2. If Cov(X1, X2) 6= 0 and 2 6= 0, then 0 and 1 are biased.I The bias in 1 is equal to 2

Cov(X1,X2V (X1)

)

I Sign of the bias1. If Cov(X1, X2) > 0 (resp. < 0) and 2 > 0 (resp. < 0), then

1 is upward biased.2. If Cov(X1, X2) > 0 (resp. < 0) and 2 < 0 (resp. > 0), then

1 is downward biased.

Example : age and education in a wage equationI We estimate the true model

wage = 0 + 1age + 2sup + u (6)

2 > 0ceteris paribus education has a positive effect on the wage

I We estimate the false model

wage = 0 + 1age + e (7)

I Comparison : 1 = 26.24 < 1 = 30.47 the effect of age is under-estimated in the false model

I We estimate education as a function of age

sup = 0 + 1age + v (8)

1 < 0age has a negative effect on education

I As 1 < 0 and 2 > 0, the estimator is logically downwardbiased

Figure: Effect of age and education on wage

Figure: Effect of age on education

THE MULTIPLE LINEAR REGRESSION MODEL

Introduction

DefinitionA multiple linear regression model is a regression model where thedependant variable is continuous, explained by several exogenousvariables, and linear in the parameters.

Example

Y = 0 +K

k=1

kXk + u = X + u (9)

where is a vector of K + 1 parameters (0, 1, ..., K )and X is a matrix (1, X1, ..., XK ) with K + 1 columns.

I Linearity of the model in the parameters k , k = 0, ..., KI 0 is the constant

I k is the slope parameter and measures the ceteris paribuseffect of Xk on Y .

I In the multiple case, the matrix notation is more convenient.

Assumptions

Assumption

1. Random sampling : {(X1i , X2i ..., XKi , Yi ); i = 1, ..., n} is arandom sample of size n from the population.

2. Sample variation and no collinearity : the explanatory variablesare not linearly related and none is constant Rank of X Xis K + 1

3. Zero mean : E (ui ) = 0

4. Zero conditional mean : E (ui | X1i , ..., XKi ) = 05. Homoscedasticity and non-autocorrelation :

V (ui | X1i , ..., XKi ) = 2 andcorr(ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0

Remarks

I Assumptions (1), (3), (4) and (5) are similar to the simplecase.

I Assumption (2) is an extension of the simple case.I Assumption (2) is required for the identification of the

parameters. WHY ?I Assume Xki = Xki . Then 0 and k cannot be separately

identified.

I Similarly, assume that the variables X1 and X2 are collinear :X1 = X2.The model can be rewritten :Y = 0 + (1+ 2)X2 +Kk=3 kXk + u. 1 and 2 cannot be separately identified.

contd

Remarks

I Other example : dummy variablesIf all the dummy variables and the constant are included in themodel, then the constant and the dummy parameters cannotbe identified separately.One of the dummy variables must be dropped (the referencevariable) (i.e. education).

I The rank of a matrix X is equal to the number of nonzerocharacteristic roots in X X .

I rank(X ) min(number of rows, number of columns).

The Ordinary Least Squares estimator

DefinitionThe OLS estimates k , k = {0, ..., K}, minimize the sum ofsquared residuals

S(0, ..., K ) =n

i=1

u2i =n

i=1

(Yi 0K

k=1

kXki )2 = (Y X )(Y X )

(10)

First order Conditions

2ni=1(Yi 0 Kk=1 kXki ) = 02ni=1 Xki (Yi 0 Kk=1 kXki ) = 0, k = {1, ..., K}

OR, using matrix notation

2n

i=1

X i (Yi Xi ) = 2X (Y X ) = 0 (11)

Proposition

k =ni=1(Yi Y )((Xki Xki ) (X k X k))

ni=1((Xki Xki ) (X k X k))2

where Xki is the predicted value of Xki obtained from a regressionof Xki on a constant and all the other covariates.

0 = Y K

k=1

kX k

OR using matrix notation,

= (X X )1X Y (12)

Comparison with the simple case : k is equal to the slopeestimate in the regression of Y on a constant and (Xki Xki ) ceteris paribus effect.

How well does the regression line fits the sample data ?

Definition

Let SST = ni=1(Yi Y )2 denote the total variationin Y ,

SSE = ni=1(Yi Y )2 denote the explained sum ofsquares,

and SSR = ni=1 u2i denote the regression sum ofsquares.

Then

R2 =SSE

SST= 1 SSR

SST= 1

ni=1 u

2i

ni=1(Yi Y )2(13)

Remarks

I R2 does not decrease when one more variable is included.

I The R2 is useful to compare two models with the samenumber of explanatory variables, but not useful if the numberof variables is different.

I Use the adjusted R2 : R2= 1 (n1)(1R2)nK1 .

Example : Determinants of the monthly wage

I Data set French sample of the European CommunityHousehold Panel

I Dependent variable wage = monthly wageI Explanatory variables

1. age (continuous variable or discrete variable)2. diplo0, diplo1, ... = educational level (discrete variables 0, 1)3. sex, children

I The sample is restricted to the individuals aged 20 60 andemployed in 2000 (n=5010)

SAS ProgramProc reg data=c ;model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1nbenf2 nbenf3 ;model wage = sex1 ag1 ag2 ag4 ag5 diplo1 diplo2 diplo3 diplo4nbenf1 nbenf2 nbenf3 ;run ;

Figure: Determinants of wage - model 1

Finite-sample properties of the OLS estimator

Proposition

Let Assumptions (1) to (4) be verified. Then

E () = (14)

Proposition

Let Assumptions (1) to (5) be verified. Then

V (k | X1, ..., XK ) = 2

ni=1(Xki X k)2(1 R2k )(15)

where R2k is the R2 obtained from the regression of Xk on the

covariates.OR using the matrix notation

V ( | X ) = 2(X X )1 (16)

RemarksV (k | X ) increases with 2 and R2k , and decreases withni=1(Xki X k)2.Estimating the error variance

I We dont know what the error variance 2 is, because wedont observe the error term u.

I BUT we observe the residuals u and we can use the residualsto find an estimate of the error variance.

I Replacing by in the variance of the estimators givesunbiased estimators of V ().

Proposition

2 = ni=1 u

2i

nK1 =SSR

degree of freedom is an unbiased estimator of 2.

Theorem (Gauss-Markov)

Under assumptions (1) to (5), is the best linear unbiasedestimator (BLUE).

Asymptotic properties

We study now the properties of the OLS estimator when n +.First we study the consistency of the OLS estimator. Consistency isstronger than unbiasedness.

TheoremUnder assumptions (1) to (4), plim() = .Under assumptions (1) to (5), plim() =

Next we study the asymptotic distribution of the OLS estimator.

Theorem (asymptotic normality)

Under assumptions (1) to (5),

n( ) N(0, 2Q1) (17)

with Q1 = plim((X Xn )1)

The maximum likelihood method

The likelihood is the probability of observing the sample{(Y1, X1) , ..., (YN , XN)}.DefinitionThe contribution of individual i to the likelihood is the function Lidefined by : Li (Yi , Xi ; ) = f (Yi , Xi ; ).

The likelihood function of the sample is the functionL(Yi , Xi , i = 1, ..., n; ) defined as the product of the individualcontributions :

L(Yi , Xi , i = 1, ..., n; ) =N

i=1

Li (Yi , Xi ; )

If the dependent variable is continuous, L(Y , X ; ) is theproduct of the distribution functions associated with each couple.

Assumption (6) (Normality of the error term)

The error term is independent of Xi and normally distributed withzero mean and variance 2 : ui |Xi N(0, 2).Under this assumption, we can use the Maximum Likelihoodmethod.

TheoremUnder assumptions (1) to (6), the Maximum Likelihood estimatorof is the OLS estimator.

MOREOVER

TheoremUnder assumptions (1) to (6), the OLS estimator and the MLestimator are the minimum variance linear unbiased estimator of .

Tests and inference - finite samples

I We want to test hypotheses about the parameters of themodel.

I Example : wage equationI Are men significantly better paid than women ?I Are the 45-55 significantly better paid than the 35-45 (the

reference category) ?I Are the 45-55 significantly better paid than the 25-35 ?

I In order to perform statistical tests on finite sample, we needto add an assumption on the distribution of the error term.

Assumption (6) (Normality of the error term)

The error term is independent of Xi and normally distributed withzero mean and variance 2 : ui |Xi N(0, 2). the distribution of the error term conditional on the vector ofthe explanatory variables is normal.

Remarks

I Assumption (6) implies assumptions (3), (4) and (5).

I Conditionally on the explanatory variables,the dependentvariable is normally distributed with mean Xi and variance2 Yi |X N(0 +Kk=1 kXki , 2)

Consequence for the distribution of

TheoremUnder assumptions (1) to (6), k is normally distributed withmean k and variance V (k) k k N(0, V (k)) kk

V (k ) N(0, 1)

Proof is a linear combination of the error term.

We cannot use directly this property since V () is unknown. BUTwe can replace the variance by the estimate of the variance, whichgives :

TheoremUnder assumptions (1) to (6),

kkV (k )

TnK1Idea

Assume we want to test the null hypothesis H0

We need

(i) a test statistic (t), i.e. a decision function that takesits values in the set of hypotheses

(ii) a decision rule that determines when H0 is rejected choose = Pr( rejectH0|H0 is true) is the significance level (usually = 5%)

(iii) a critical region, i.e. the set of values of the teststatistic for which the null hypothesis is rejected we want to find the critical value c that verifiesPr( rejectH0|H0 is true) = Pr(|t| > c) = c is the 2 th percentile in the t distribution withnK 1 degrees of freedom

The t-test : is k significantly different from 0 ?{H0 : k = 0H1 : k 6= 0

The null hypothesis means that Xk has no effect on Y .

(i) Under the null hypothesis, t = kV (k )

The test statistic follows a t distribution withnK 1 degrees of freedom.

(ii) = 5%(iii) Pr(|t| > c) = 5%

c is the 97.5-th percentile in the t distribution withnK 1 degrees of freedom. When nK 1 islarge, c = 1.96

DecisionI If |t| =

V ()< 1.96, then H0 is accepted at the 5%

significance level

I If |t| = V ()

1.96, then H0 is rejected at the 5%significance level.

The t-test : is k significantly greater than 0 ?{H0 : k = 0H1 : k > 0

The null hypothesis means that Xk has no effect on Y .

(i) Under the null hypothesis, t = kV (k )


(ii) = 5%(iii) Pr(t > c) = 5%

c is the 95-th percentile in the t distribution withnK 1 degrees of freedom. When nK 1 islarge, c = 1.645

DecisionI If t =

V ()< 1.645, then H0 is accepted at the 5%

significance level

I If t = V ()

1.645, then H0 is rejected at the 5%significance level.

The t-test : is k significantly different from a ?{H0 : k = aH1 : k 6= a

(i) Under the null hypothesis, t = kaV (k )


(ii) = 1%

(iii) Pr(|t| > c) = 1%c is the 99.5-th percentile in the t distribution withnK 1 degrees of freedom. When nK 1 islarge, c = 2.576

Decision

I If |t| < 2.576, then H0 is accepted at the 1% significance levelI If |t| 2.576, then H0 is rejected at 1%

The t-test : is k significantly different from j ?{H0 : k = jH1 : k 6= j

The null hypothesis means that Xk and Xj have the same effect onY .

(i) Under the null hypothesis,

t =kj

V (k )+V (j )2Cov(k ,j )The test statistic follows a t distribution withnK 1 degrees of freedom.

(ii) = 10%

(iii) Pr(|t| > c) = 10%c is the 95-th percentile in the t distribution withnK 1 degrees of freedom. When nK 1 islarge, c = 1.645

Decision

I If |t| < 1.645, then H0 is accepted at 10%I If |t| 1.645, then H0 is rejected at 10%

The F-test : exclusion restrictions

We test q linear restrictions on the parameters{H0 : K+1q = K+2q = ... = K = 0H1 : H0 is false

(i) Under the null hypothesis, the model becomesY = 0 + 1X1 + ... + KqXKq.

(ii) The test statistic is F = (R2R2c )/q

(1R2)/(nK1) , where R2c

denotes the R2 of the constrained model.

The F-statistic follows a Fisher distribution with(q, nK 1) degrees of freedom.

(iii) Pr(F > c) =

Decision

I If F < c, then H0 is accepted at the significance level

I If F c, then H0 is rejected

The F-test : Overall significance

Question : is the model completely false ?{H0 : 1 = 2 = ... = K = 0H1 : H0 is false

K is the number of restrictions.

(i) Under the null hypothesis, F = R2/K

(1R2)/(nK1) .The F-statistic follows a Fisher distribution with(K , nK 1) degrees of freedom.

(ii) = 1%

(iii) Pr(F > c) = 1%c is the 99-th percentile in the F distribution with(K , nK 1) degrees of freedom.

Decision

I If F < c, then H0 is accepted at the 1% significance level

I If F c, then H0 is rejected at the 1% significance level.

HeteroscedasticityAssumption (5) (homoscedasticity) is restrictive

How to proceed when this assumption is dropped ?Assumption (5)

V (ui |X1i , ..., XKi ) = 2i andcorr(ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0

Remarks

I The OLS estimator remains unbiased

I The OLS estimator remains consistent

BUT

I V (|X ) is no longer equal to 2(X X )1I The asymptotic normality is no longer verified

I The OLS estimator is no longer the BLUE

Testing for heteroscedasticity

I There exist two tests : Breusch-Pagan and White. Both testsare based on the residuals of the fitted model.

I IDEA

E (u2i |X1i , ..., XKi ) = 2i u2i = 2i + eiI The variance of the error term is of the general form

2i = i = (X1i , ..., XKi )

The Breusch-Pagan test

I More restrictive assumption on the form of i :

(X1i , ..., XKi ) = 0 +Kk=1 kXki

u2i = 0 +Kk=1 kXki + ei

I{

H0 : 1 = 2 = ... = K = 0H1 : k such that k 6= 0

I Under the null hypothesis, F = R2/K

(1R2)/(nK1) follows aFisher distribution with (K , nK 1) degrees of freedom.

The White test

I Less restrictive assumption on the form of i :

(X1i , ..., XKi ) = 0 +Kk=1 kXki +

Kk,j=1 kjXki Xji

u2i = 0 +Kk=1 kXki +Kk,j=1 kjXki Xji + ei

I{

H0 : k = kj = 0 k , jH1 : k such that k 6= 0 or kj 6= 0

I Under the null hypothesis, F = R2/q

(1R2)/(nq1) follows aFisher distribution with (q, n q 1) degrees of freedom andq = K + K (K+1)2

Example : Determinants of the monthly wageI Data set French sample of the European Community

Household PanelI The sample is restricted to the individuals aged 20 60 and

employed in 2000 (n=5010)I Dependent variable wage = monthly wageI Explanatory variables age (continuous variable), diplo0,

diplo1, ... = educational level (discrete variables 0, 1), sex,children

SAS ProgramProc model data=c ;Parms ac asex1 aage aage2 adiplo1 adiplo2 adiplo3 adiplo4anbenf1 anbenf2 anbenf3 ;wage=ac+ +asex1*sex1+aage*age+aage2*age2+ adiplo1*diplo1+ adiplo2*diplo2 + adiplo3*diplo3 + adiplo4*diplo4+anbenf1*nbenf1 +anbenf2*nbenf2 +anbenf3*nbenf3 ;fit wage / white breusch=(1 sex1 age age2 diplo1 diplo2 diplo3diplo4 nbenf1 nbenf2 nbenf3) ;run ;

Figure: Example : Determinants of wage

Correcting for heteroscedasticityThere are two methods to improve the efficiency of the estimationin the presence of heteroscedastic errors

1. Use the Feasible General Least Squares (FGLS) method if thefunction (X1i , ..., XKi ) is known

2. Use the OLS method but compute a heteroscedasticconsistent-covariance matrix estimator (White(1980),Davidson and MacKinnon (1993))

1. The Feasible General Least Squares (FGLS) method

I Assumption : 2i = i = g(0 +Kk=1 kXki ) where g(.) is

known.

I IDEAapply a transformation to the initial model Yi = Xi+ ui thatmakes the error terms of the transformed modelhomoscedastic.

Yi , Xi , and ui are divided by0 +Kk=1 kXki

The transformed model isY ii

= 0i

+Kk=1 kXki

i+ ui

i

Remarks

I there is no constant in the transformed model

I the parameters k are the same than in the initial model

I uii

is homoscedastic

The transformed model can then be estimated by OLS

EXCEPT that we do not know the parameters 0 and k .

we proceed in the following way1. estimate the initial model by OLS and compute ui

2. estimate u2i = 0 +Kk=1 kXki + ei and compute

i = 0 +Kk=1 kXki3. divide the initial model by i

4. estimate the transformed model by the OLS

if is well specified, the FGLS estimator is unbiased,consistent, and asymptotically efficient

2. Compute a heteroscedastic consistent-covariance matrixestimator

The idea is to use the OLS method but compute a heteroscedasticconsistent-covariance matrix estimator (cf. White (1980),Davidsonand MacKinnon (1993)).

Example

SAS Program

Proc Reg Data=outc ;Model UHat2 = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1nbenf2 nbenf3 ;Output OUT= MyData PREDICTED = Sig2hat ;Run ;

/* Create weights as inverse of root of variance */

Data MyData ; Set MyData ; OmegaInv = SQRT(1/Sig2hat) ;Run ;

Proc Reg Data=MyData ;Model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1nbenf2 nbenf3 ;Weight OmegaInv ;Run ;

Figure: Estimation of u2 when g(x) = x

Figure: Estimation of u2 when g(x) = exp(x)

Figure: Determinants of wage correcting for heteroscedasticity

The simple linear regression modelIntroductionThe Ordinary Least Squares estimation methodExampleFinite-sample propertiesExample

The multiple linear regression modelIntroductionThe Ordinary Least Squares estimation methodExamplePropertiesComparison between OLS and MLTests and inferenceHeteroscedasticity

Date post:	04-Nov-2015
Category:	Documents
Upload:	muluneh-hideto
View:	9 times
Download:	0 times

Linear Regression Model

Documents