Chapter 8: Analysis of Multiple Regression:...

Chapter 8: Analysis of Multiple Regression:Inference

Statistics and Introduction to Econometrics

M. Angeles Carnero

Departamento de Fundamentos del Análisis Económico

Year 2014-15

M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 1 / 51

Distribution of the OLS estimators

In Chapter 7 we saw that the OLS estimators are unbiased and weobtained an expression for their variance. Knowing the expectedvalue and the variance of an estimator is useful in order to analyseits precision. However, in order to make inference, that is, to dohypothesis testing on the parameters and do confidence intervals,we need to know the distribution of the estimator.When we condition on the observed values of the explanatoryvariables in our sample, the distributions of the OLS estimatorsdepend on the distribution of the errors. In order for thedistributions of bβj to be simple, and in order to be able to obtainconfidence intervals and test statistics, we are going to assumethat the error term is normally distributed in the populations. Thisis known as the normality assumption.


Assumption MLR.6 (Normality)The population error u is independent on the explanatoryvariables x1, x2, ..., xk and its distribution is normal with zero meanand variance σ2 :

u � N(0, σ2)

Comments on assumption MLR.6

This assumption is much stronger than the assumptions we madein Chapter 7. In fact, since u is independent on xj, neither the meannor the variance of u depends on xj. Therefore, assumption MLR.6,implies

E(u j x1, .., xk) = E(u) = 0 (assumption MLR.3)

Var(u j x1, .., xk) = σ2 (assumption MLR.5)

Assumptions MLR.1 to MLR.6 are denoted as assumptions of theclassical linear model.


Comments on assumption MLR.6 (cont.)Under the assumptions of the classical linear model, the efficiencyproperty of the OLS estimators is stronger than under theGauss-Markov assumptions.

It can be shown that, under the assumptions of the classical linearmodel, the OLS estimators are estimators with the smallest varianceamong all the unbiased estimators, that is, we do not need to restrictthe comparison only to linear estimators in yi.

Assumption MLR.6 can also be written as

y j x1, .., xk � N(β0 + β1x1 + β2x2 + ...+ βkxk, σ2)

and therefore, assumption MLR.6 is equivalent to say that thedistribution of y given xj is normal, with a linear mean in xj andconstant variance.The fact that the distribution of y given xj is normal or not is anempirical question, which can be true or not.

In many occasions, taking logs in variables such as wages, income,prices, expenditures, etc makes the normality assumption moreplausible.In any case, we will see later that if errors are not normal it is not abig problem as long as the sample size is large.


Distribution of the OLS estimatorsUnder assumptions MLR.1 to MLR.6, and conditioning on theobserved values of x1, x2, ..., xk,bβj � N

�βj, Var

�bβj

��and standardising bβj � βjr

Var�bβj

� � N (0, 1) (1)

where the expression of Var�bβj

�is the one we saw in Chapter 7.

Note that this property only adds to the statistical properties ofthe OLS estimators seen in Chapter 7 the fact that the distributionof bβj is normal.

The proof of the normality property of bβj is based on the fact thatbβj is a linear function of yi. Therefore, since yi are normal and any

linear function of normal variables is normal, we have that bβjfollows a normal distribution.


Hypothesis testing on a single population parameter.The t test.

In this section, we study hypothesis testing on a unique parameterof the multiple regression model. This type of tests often appearsin practice, and as they only involve one parameter, they are thesimplest tests.Consider the regression model

y = β0 + β1x1 + β2x2 + ...+ βkxk + u


In order to construct the tests statistics, we need the followingresult:

Under assumptions MLR.1 to MLR.6

bβj � βj

se(bβj)� tn�k�1 (2)

If we compare this result with the one in equation (1), we see that

the difference ins that we have substituter

Var�bβj

�by se(bβj).

The difference betweenr

Var�bβj

�and se(bβj) is that

rVar

�bβj

�depends on the unknown parameter σ2, while se(bβj) depends on

the random variable bσ2.When σ2 is substituted by its estimator, bσ2, it can be shown that thedistribution becomes a t with degrees of freedom equal to thesample size, n, minus the number of estimated parameters, k+ 1.


For any of the parameters of the model, we can consider thefollowing hypothesis tests

a)H0 : βj = β0

j

H1 : βj > β0j

b)H0 : βj = β0

j

H1 : βj < β0j

c)H0 : βj = β0

j

H1 : βj 6= β0j

where β0j is a real number. For example, we can test H0 : β2 = 1

versus the alternative H1 : β2 < 1, or H0 : β1 = 0 versus thealternative H1 : β1 6= 0, etc.The test statistics is obtained by replacing in equation (2), βj by its

value under the null hypothesis, β0j ,

t =bβj � β0

j

se(bβj)� tn�k�1 under H0 (3)

Note that t is a test statistic since it is a function of the sample, itdoes not depend on any unknown parameter, its distribution isknown under the null hypothesis and the plausible values of thestatistic depends on whether the null hypothesis is true or not.


The alternative hypothesis in test a (and also in test b) is denotedby one-sided alternative, while the alternative hypothesis in test cis denoted by a two-sided alternative.The critical region of the test depends on the alternativehypothesis. We obtain intuitively the critical region of the 3 tests.In test a the alternative is that βj > β0

j , or equivalently βj � β0j > 0,

since bβj is an estimator of βj, it is expected that if the alternative is

true then we obtain a large positive value for bβj � β0j .

Therefore, we reject the null hypothesis in favour of the alternativeif bβj � β0

j is positive and large "enough".This means that we reject the null hypothesis if t is positive andlarger than a certain critical value.In particular, the decision rule for test a is

Reject H0 at level α if t > tn�k�1,α


In test b the alternative is βj < β0j , or equivalently βj � β0

j < 0,

since bβj is an estimator of βj, it is expected that if the alternative

holds, we obtain a negative value for bβj � β0j .

Therefore, the null hypothesis is rejected in favour of the alternativeif bβj � β0

j is negative and is quite far away from zero.This means that we reject the null hypothesis if t is negative andlower than a certain critical value.In particular, the decision rule for test b is

Reject H0 at level α if t < �tn�k�1,α


In test c the alternative is βj 6= β0j , or equivalently βj � β0

j 6= 0,

since bβj is an estimator of βj, it is expected that if the alternative is

true we obtain a value of bβj � β0j far away from zero (positive or

negative).

Therefore, we reject the null hypothesis in favour of the alternativeif bβj � β0

j is far enough from zero.This means that we reject the null hypothesis if the absolute valueof t is larger than a certain critical value.In particular, the decision rule for test c is

Reject H0 at level α if jtj > tn�k�1,α/2


Particular Case:H0 : βj = 0H1 : βj 6= 0

The test statistic is:

t =bβj

se(bβj)� tn�k�1 under H0

The statistic t for this test is denoted t-ratio.In this test we are testing whether variable xj has any effect on y,once we have taken into account the effect on y of the rest of theexplanatory variables of the model.This test is denoted significance test of variable xj. When we rejectH0 at a certain level of significance, for example 5%, we say that xjis statistically significant at the 5% level. Analogously, if we cannotreject H0 at the 5% level, we say that xj is not statisticallysignificant at the 5% level.In most statistical packages, including Gretl, the output of theregression provides the t� ratio and p-value for this test.


Example 1Let us consider the following model for the grade point average ofcollege students

colgpa = β0 + β1hsgpa+ β2act+ β3skipped+ u

where colgpa is the college grade point average measured on afour-point scale, hsgpa is the high school grade point average alsomeasured on a four-point scale, act is the score obtained on a testand skipped is the average number of college classes missedweekly. Using data from the file GPA1 of Wooldridge’s book, thefollowing results are obtained:

\colgpa = 1.39+ 0.412(0.094)

hsgpa+ 0.015(0.011)

act� 0.083(0.026)

skipped

n = 141, R2 = 0.234


Example 1 (cont.)

We are now going to see which variables are statistically significant.To do so, we calculate the t�ratios:

t1 =0.4120.094

= 4.38, t2 =0.0150.011

= 1.36, t3 = �0.0830.026

= �3.19

If we take a significance level of 5%, sincen� k� 1 = 141� 3� 1 = 137, the critical value of these tests ist137,0.025 = 1.98Since jt1j > 1.98 and jt3j > 1.98, the variables hsgpa and skipped arestatistically significant at the 5% level. However, given thatjt2j � 1.98, act is not statistically significant at 5%.We therefore conclude that the test score does not influence thecollege grade point average, once we have taken into account theeffect of the high school grade point average and missed classes onthe college grade point average.


The p-valueThe decision on what level of significance should be used toperform a test is arbitrary, and it is possible that depending on thesignificance level we choose, the conclusions we reach may bedifferent.To avoid this problem we can repeat the test for different levels ofsignificance.Another possibility, which is more informative than repeating thetest using different levels of significance, is to answer thefollowing question:

Given the results we have obtained in the estimation, what is thesmallest significance level at which we can reject the nullhypothesis?The answer to this question is what is known as the p-value.

The p-value of the t tests is therefore calculated as follows: If t� isthe value of the test statistic of the sample

Test a ) p-value = Prob(tn�k�1 > t�)Test b ) p-value = Prob(tn�k�1 < t�)Test c ) p-value = Prob(jtn�k�1j > jt�j)


Example 1 (cont.)

In Example 1, the p-value for the significance test of the variable actis

Prob(jt137j > 1.36) = 0.176

Hence, we cannot reject H0 at any significance level less than 17.6%.For example, we cannot reject the null hypothesis at 5% as we hadseen, or at 10%.

The p-value we have obtained indicates that there is no evidenceagainst the null hypothesis.

The p-value for the significance test of the variable skipped is

Prob(jt137j > j�3.19j) = Prob(jt137j > 3.19) = 0.0018

Hence, we can reject H0 at any significance level greater than 0.18%.For example, in addition to rejecting the null hypothesis at 5% aswe had seen, we can also reject it at 1%.

The p-value we have obtained indicates that there is sufficientevidence to reject the null hypothesis.


Example 2Let us consider a simple model that relates the annual number ofcrimes on college campuses (crime) with the number of studentsenrolled (enroll):

log(crime) = β0 + β1 log(enroll) + u

This is a constant elasticity model, where β1 is the elasticity of thecrimes with respect to enrolment.

In this model it is interesting to test the hypothesis that theelasticity of crime with respect to the number of students enrolledis one, H0 : β1 = 1.β1 = 1 means that a 1% increase in the number of students enrolledleads, on average, to a 1% increase in crime.Alternatively, let us consider H1 : β1 > 1, which implies that a 1%increase in the number of students enrolled increases crime bymore than 1%.If β1 > 1, then in relative terms, not only in absolute terms, crime isa bigger problem on larger campuses.


Example 2 (cont.)

Using data on 97 colleges and universities in the United States for1992 contained in the data file CAMPUS of Wooldridge’s book, wehave obtained the following results:

\log(crime) = �6.363(1.03)

+ 1.27(0.11)

log(enroll)

We want to testH0 : β1 = 1H1 : β1 > 1

The test statistic is:

t =bβ1 � 1

se(bβ1)� t95 under H0

The value of the sample statistic is t = (1.27� 1)/0.11 = 2.45. Sincethe p-value of the test is p-value = Prob(t95 > 2.45) = 0.008, wereject the null hypothesis at 1%. There is evidence that, in relativeterms, crime is a bigger problem on larger campuses.


Economic significance versus statistical significance

It is important to note that we have to pay attention not only tothe fact that a given variable is statistically significant or not, butthe magnitude of the effect of each variable on the dependentvariable.Example 3Let us consider the following model for participation in pensionplans

prate = β0 + β1mrate+ β2age+ β2totemp+ u

where prate is the percentage of company employees participatingin the pension plan, mrate is the index of correspondence of thepension plan (amount the company contributes to the pensionplan for every dollar contributed by the employee), age is theyears of participation in the pension plan and totemp is thenumber of company employees.


Example 3 (cont.)Using data on 1534 pension plans in the United States from file401K of Wooldridge’s book, the model is estimated, obtaining thefollowing results:

[prate = 80.29+ 5.44(0.52)

mrate+ 0.269(0.045)

age� 0.00013(0.00004)

totemp

n = 1534, R2 = 0.100

If we calculate the t-ratios, we have that t1 = 10.46, t2 = 5.98 andt3 = �3.25 and all the variables are statistically significant at theusual significance levels.However, if we analyse the effect of totemp, we see that, holdingconstant mrate and age, if the company increases the number ofworkers by 10000, the rate of participation in the pension plandecreases by only 1.3 percentage points (0.00013� 10000 = 1.3).Hence, we see that although the variable is statistically significant,its effect is not very large in practical terms.


Confidence intervals

Using the assumptions of the classical regression model weimmediately obtain confidence intervals (CI) for the parametersof the regression model.These confidence intervals are obtained in the same way as theconfidence interval for the mean of a normal population withunknown variance was obtained in the statistics course.Since we know that

bβj � βj

se(bβj)� tn�k�1

We have thathbβj � tn�k�1,α/2se(bβj), bβj + tn�k�1,α/2se(bβj)i

is a confidence interval for βj at a confidence level of100 � (1� α)%.


It is important to remember the meaning of a confidence interval.

If we have a large number of random samples, and for each of themwe calculate the CI for βj for a confidence level of 95%, theparameter would βj would be within 95%, approximately, of theCIs calculated.Unfortunately, we usually have only a single sample and do notknow for that sample if βj is contained in the CI or not. The hope isthat our sample belongs to 95% of the samples for which the CIcontains βj, but we have no guarantee.

We can use the confidence interval for a parameter to test ahypothesis about this parameter.

Let CI be the confidence interval for βj at 100 � (1� α)%. Hence the

decision rule is: Reject H0 if β0j /2 CI is a test with a significance

level α forH0 : βj = β0

jH1 : βj 6= β0

j

This test is identical to the one based on the t statistic.


Example 2 (cont.)

In Example 2, considering that t95,0.025 = 1.99, we have that theconfidence interval at 95% for the elasticity of crime with respect toenrolment, β1, is

[1.27� 1.99 � 0.11, 1.27+ 1.99 � 0.11] = [1.05, 1.49]

We can use the CI to test

H0 : β1 = 1H1 : β1 6= 1

Given that 1 /2 [1.05, 1.49], we can reject H0 at 5% in favour of atwo-sided alternative hypothesis.Note that this test is not the same as the one we did before becausenow the alternative hypothesis is two-sided. CIs can only be usedto perform tests with a two-sided alternative hypothesis.


Let us now suppose that the dependent variable of the model is inlogarithms

log(y) = β0 + β1x1 + ...+ βkxk + u

we know that 100β1 measures the percentage effect on y of achange of one unit in x1, keeping the rest of the factors constant.

To calculate a confidence interval for the percentage effect on y anda change of one unit in x1, we simply have to multiply the ends ofthe confidence interval for β1 by 100.The reason for this is that since se(100bβ1) = 100se(bβ1), theconfidence interval for 100β1 is

h100bβ1 � tn�k�1,α/2100se(bβ1), 100bβ1 + tn�k�1,α/2100se(bβ1)

ih100(bβ1 � tn�k�1,α/2se(bβ1)), 100(bβ1 + tn�k�1,α/2se(bβ1))

i


Testing a linear restriction. The t test.

Now we will see how to test any linear restriction on theparameters of the multiple regression model.To illustrate the general approach we will use a simple example.Example 4

In the United States there are two types of higher educationinstitutions: junior colleges (a two-year post-secondary academicinstitution) and universities (four-year colleges). Let us suppose wewant to compare the returns to education, i.e., the effect ofeducation on wages in these two types of academic institutions.To do so we consider the following model

log(wage) = β0 + β1jc+ β2univ+ β3exper+ u (4)

where wage is the hourly salary, jc is the number of years attendinga junior college, univ is the number of years attending a universityand exper is work experience.


Example 4 (cont.)

Using the sample of 6763 workers with at least a high school degreein the TWOYEAR file of Wooldridge’s book, we estimated themodel, obtaining the following results:

\log(wage) = 1.472+ 0.0667(0.0068)

jc+ 0.0769(0.0023)

univ+ 0.0049(0.0002)

exper (5)

n = 6763, R2 = 0.222

Based on these results we have that:

Holding constant the number of years at a university and workexperience, the return to one year at a university is 6.67%While holding constant the number of years at a junior college andwork experience, the return to one year at a university is 7.69%.Therefore, the return to a year at a junior college is about 1 percentagepoint (6.67� 7.69 = �1.02) less than the return to a year at auniversity.


Example 4 (cont.)

The hypothesis of interest is whether the return to a year at a juniorcollege is the same as the return to a year at a university.Mathematically, we can write this hypothesis as

H0 : β1 = β2

The alternative hypothesis of interest is one-sided if the return to ayear at a junior college is less than the return to a year at auniversity. Mathematically, we can write the alternative hypothesisas

H1 : β1 < β2

Since the hypothesis of interest relates two parameters of themodel, we cannot directly use the t statistics we saw in the previoussection.Conceptually, however, this test is similar to the tests in theprevious section.


Example 4 (cont.)

If we write the null and alternative hypotheses as H0 : β1 � β2 = 0and H1 : β1 � β2 < 0, and we follow the same reasoning as in theprevious section, the t statistic for this test is

t =bβ1 � bβ2

se(bβ1 � bβ2)

The question is how to calculate se(bβ1 � bβ2)?

Given that se(bβ1 � bβ2) =

q\var(bβ1 � bβ2), and the variance of a

difference depends on the covariance, to calculate se(bβ1 � bβ2) wewould need the estimated covariance between bβ1 and bβ2. Theproblem is that some statistical packages do not provide thiscovariance.

In this case, another way to obtain the test statistic, which is easierthan this one, is to conveniently reparametrise the model so thathypothesis of interest in the reparametrised model refers to a singleparameter.


Example 4 (cont.)

To do so, we define θ = β1 � β2, and we write the model as afunction of θ such that the hypothesis to be tested is

H0 : θ = 0H1 : θ < 0

Since θ = β1 � β2, we can obtain the value of β1 as a function of β2and θ, β1 = θ + β2, and substituting in the model (4) we have

log(wage) = β0 + (θ + β2) jc+ β2univ+ β3exper+ u= β0 + θjc+ β2jc+ β2univ+ β3exper+ u= β0 + θjc+ β2(jc+ univ) + β3exper+ u

If we define totcoll = jc+ univ, we can write the model as

log(wage) = β0 + θjc+ β2totcoll+ β3exper+ u (6)

Note that the model (6) is the same as the model (4) but is writtendifferently.


Example 4 (cont.)

Using the same data, we have estimated the reparametrised model(model (6)), obtaining the following results:

\log(wage) = 1.472� 0.0102(0.0069)

jc+ 0.0769(0.0023)

totcoll+ 0.0049(0.0002)

exper

n = 6763, R2 = 0.222(7)

The test statistic is t = � 0.01020.0069 = �1.48. The p-value is

Prob(t6759 < �1.48) = 0.069, and therefore there is some evidence,although not very strong, against the null hypothesis.Note that the estimates for the constant and the coefficient of exper,as well as their standard errors, are the same as when we estimatedthe original model.


Example 4 (cont.)

Note also that the estimated coefficient of totcoll and its standarderror coincide with the results obtained for univ in the originalmodel.

The reason is that the estimated coefficient of totcoll in equation (7)measures the return to 1 more year at a university or junior college,holding constant the number of years at a junior college and workexperience.Given that the number of years at a junior college is held constant, theestimated coefficient of totcoll in equation (7) actually measures thereturn to 1 more year at a university, and therefore measures exactlythe same thing that the coefficient of univ measured in equation (5).


Generally, we will proceed in a similar manner for any linearrestriction on the parameters of the model we want to test.

We first define an auxiliary parameter so that we can express theconstraint as a function of that parameter.We will then reparametrise the model and estimate thereparametrised model to perform the test.

We will see several examples on the Problem Set.


Confidence interval for a linear combination of parameters

To calculate a confidence interval for a linear combination ofparameters, we must also define an auxiliary parameter andreparametrise the model as a function of this parameter in thesame way as the linear restriction test.Example 4 (cont.)

We will now obtain a confidence interval for the difference betweenthe return to one year at a junior college and the return to one yearat a university.In model (4), we have seen that the difference in returns is100(β1 � β2).Hence, again defining θ = β1 � β2, we have to calculate aconfidence interval for 100θ.Using the estimates of the reparametrised model as a function of θ(equation (7)), and bearing in mind that t6759,0.025 = 1.96, theconfidence interval at 95% for 100(β1 � β2) is

[100(�0.0102� 1.96 � 0.0069), 100(�0.0102+ 1.96 � 0.0069)]= [�2.37, 0.33]


Testing several linear restrictions. The F test.

In practice, in many cases, we are interested in jointly testingseveral hypotheses on model parameters.Example 5 Let us consider in the model

log(wage) = β0 + β1educ+ β2exper+ β3expersq+ u

where we want to test, holding constant years of education,whether wages depend on work experience. The partial effect ofexperience on the log of wages is

∂ log(wage)∂exper

= β2 + 2β3exper

Testing if this effect is zero for any level of experience meanstesting H0 : β2 = β3 = 0.As in the previous example, we will begin with the case in whichwe want to test whether a subset of the variables are jointlysignificant.


Exclusion restrictions testsLet us consider the multiple regression model with k explanatoryvariables (in addition to the constant term)

y = β0 + β1x1 + ...+ βkxk + u (8)

In this model we want to test whether q of these variables havezero coefficients. To simplify the notation we will assume thatthese q variables are the last of the list xk�q+1, .., xk.The null hypothesis we want to test is

H0 : βk�q+1 = 0, .., βk = 0

The alternative hypothesis is that the null hypothesis is false,meaning that at least one of the parameters appearing in the nullhypothesis is different from zero.To perform the test, we first impose restrictions on the nullhypothesis in model (8), obtaining

y = β0 + β1x1 + ...+ βk�qxk�q + u (9)


Model (8) is denoted as the unrestricted model, while model (9) isdenoted as the restricted model.The test is based on the following intuitive idea:

First we estimate the two models and calculate the sum of squaredresiduals (SSR) for each. Let us denote the SSR of the unrestrictedmodel (8) as SSRnr, and the SSR of the restricted model (9) as SSRr.If the variables xk�q+1, .., xk are significant in explaining thedependent variable, we can expect the SSR to increase substantiallywhen eliminating these variables, that is, we should expect thatSSRr will be considerably larger than SSRnr.On the contrary, if these variables are not significant in explainingthe dependent variable, we can expect that SSRr will be verysimilar to SSRnr (remember that the SSR cannot decrease byeliminating variables and therefore SSRr � SSRnr).The test statistic is based on calculating the increase in the SSR oncethe variables xk�q+1, .., xk have been eliminated.


Specifically, the test statistic is:

F =(SSRr � SSRnr) /qSSRnr/(n� k� 1)

(10)

As we mentioned above SSRr � SSRnr. Hence, the F statistic isalways non-negative.It can be shown that

F � Fq,n�k�1 under H0

Moreover, as we have seen, if the null hypothesis is true, SSRr isexpected to be very similar to SSRnr. However, if the nullhypothesis is false SSRr is expected to be considerably larger thanSSRnr.The decision rule for the test is:

Reject H0 at level α if F > Fq,n�k�1,α


Example 5 (cont.)Returning again to the example of wages as a quadratic functionof work experience

log(wage) = β0 + β1educ+ β2exper+ β3expersq+ u (11)

we have estimated this model using data from the file WAGE1 ofWooldridge’s book and have obtained the following results:

\log(wage) = 0.128+ 0.090(0.0075)

educ+ 0.041(0.0052)

exper� 0.00071(0.00012)

expersq

n = 526, SSR = 103.7904, R2 = 0.30

We want to test whether work experience has an effect on wages.We have seen that the null hypothesis that we have to test is

H0 : β2 = β3 = 0

If we impose the restrictions in model (11), we have the restrictedmodel

log(wage) = β0 + β1educ+ u (12)


Example 5 (cont.)

and estimating this model we obtain

\log(wage) = 0.584+ 0.083(0.0076)

educ

n = 526, SSR = 120.7691, R2 = 0.186

The value of the F statistic for this sample is

F =(120.7691� 103.7904) /2

103.7904/522= 42.69

Given thatp-value = Prob(F2,522 > 42.69) = 0

we can reject the null hypothesis at any reasonable significancelevel.


Testing the global significance of the regressionIn this test, the null hypothesis is that none of the explanatoryvariables of the model affect the dependent variable, that is

H0 : β1 = β2 = ... = βk = 0

This test is a particular case of the exclusion restriction test we justsaw. To perform the test we have to consider the restricted model,which in this case is simply a constant

y = β0 + u

in this case the SSE of the restricted model is zero and thereforeSSRr = SST.Substituting in the general expression of the F statistic for this testwe have

F =(SST� SSR) /kSSR/(n� k� 1)


Dividing the numerator and denominator by SST, and recallingthat R2 = 1� SSR

SST , we can write the F statistic for the globalsignificance test of the regression as a function of R-squared

F =�1� SSR

SST

�/k

SSRSST /(n� k� 1)

=R2/k

(1� R2) /(n� k� 1)

Most statistical packages, including Gretl, provide the value of theF statistic in the output table of the regression for the globalsignificance test.


Testing general linear restrictionsExclusion restriction tests are the most frequent tests performed inempirical analyses. However, in some cases we may be interestedin simultaneously testing several restrictions which are not simplythat several coefficients are zero.The methodology for this type of test is identical to the case of theexclusion restrictions. First we have to impose restrictions on themodel to define the restricted model. We then estimate bothmodels (the unrestricted and restricted models) and calculate theF statistic as in the case of the exclusion restrictions.We will illustrate these tests with an example


Example 6Let us consider the model

log(price) = β0 + β1 log(assess) + β2 log(lotsize)+β3 log(sqrft) + β4bdrms+ u

where price is the selling price of the house, assess is the estimatedvalue of the house before being sold, lotsize is the size of the plot,sqrft is the surface area of the house, and bdrms is the number ofrooms.

Suppose we want to test whether the estimated value of the houseis rational or not. If it is, then a 1% increase in the estimated valueof the house should result in a 1% increase in the price, i.e., β1 = 1.Moreover, after taking into account the estimate, the remainingvariables should not help to explain the price, i.e.,β2 = β3 = β4 = 0.Thus the null hypothesis that we have to test is

H0 : β1 = 1, β2 = β3 = β4 = 0

and the restricted model is

log(price)� log(assess) = log(price/assess) = β0 + uM. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 43 / 51

Example 6 (cont.)

Using data from the file HPRICE1 of Wooldridge’s book, the modelhas been estimated, obtaining the following results:

\log(price) = 0.264+ 1.043(0.151)

log(assess) + 0.0074(0.0386)

log(lotsize)

�0.1032(0.1384)

log(sqrft) + 0.0338(0.0221)

bdrms

n = 88, , SSR = 1.822 R2 = 0.773

As for the restricted model, we generate the variablelog(price/value) and estimate the regression of this variable on aconstant (not including any explanatory variable). The valueobtained for SSRr is 1.880.The test statistic is F = (1.880�1.822)/4

1.822/83 = 0.66 and as the p-value= Prob(F4,83 > 0.66) = 0.6215, we cannot reject the null hypothesisat any reasonable significance level as there is no evidence againstthe hypothesis that the estimated value of the house is rational.


Relationship between the t and F statisticsSo far we have seen how we can use the t statistic for testing alinear restriction and the F statistic for testing several linearrestrictions.However, there is no reason why we cannot also use the F statisticfor testing a single linear restriction, and the question that arisesis: What is the relationship between these two statistics whentesting a single restriction?The answer is that it can be shown that F = t2 and that the p-valueof the two-sided test based on the t statistic is the same as thep-value based on the F statistic, hence the conclusion with eithermethod will be identical .


Example 4 (cont.)Let us consider the model again

log(wage) = β0 + β1jc+ β2univ+ β3exper+ u (13)

where wage is salary per hour, jc is the number of years at a juniorcollege, univ is the number of years at a university and exper isyears of work experience. We want to test

H0 : β1 = β2

If we impose this restriction we have that the restricted model is

log(wage) = β0 + β2jc+ β2univ+ β3exper+ u= β0 + β2 (jc+ univ) + β3exper+ u

and defining as above totcoll = jc+ univ, we can write the restrictedmodel as

log(wages) = β0 + β2totcoll+ β3exper+ u (14)


Example 4 (cont.)

When we estimate models (13) and (14) we obtain the followingresults:

\log(wage) = 1.472+ 0.0667(0.0068)

jc+ 0.0769(0.0023)

univ+ 0.0049(0.0002)

exper

n = 6763, SSR = 1250.54352 R2 = 0.2224

\log(wage) = 1.472+ 0.0762(0.0023)

totcoll+ 0.0049(0.0002)

exper

n = 6763, SSR = 1250.94205 R2 = 0.2222

The value of the F statistic in the sample is

F =1250.94205� 1250.54352

1250.54352/6759= 2.15

The value that we obtained for the t statistic was �1.48 and(�1.48)2 = 2.19, which is slightly different from the value of F dueto rounding error.


Example 4 (cont.)

If the alternative hypothesis is two-sided, we can directly use the Fstatistic to perform the test.When the alternative hypothesis is one-sided, we have to use the t tstatistic. The reason is that the F statistic is always positive andtherefore does not allow us to distinguish between the alternativehypothesis β1 < β2 and the alternative hypothesis β1 > β2.Once we have calculated the F statistic as F = t2, we have that t canbe equal to

pF or �

pF.

To determine whether t has a positive or negative sign, we use that

t =bβ1 � bβ2

se(bβ1 � bβ2)

and therefore the sign of t coincides with the sign of bβ1 � bβ2.In this manner, once F is calculated, we have thatt =sign(bβ1 � bβ2)

pF.

In this example, given that bβ1 � bβ2 < 0,t = �

pF = �

p2.15 = �1.47, which is slightly different than the

value we obtained previously for t due to rounding error.


The F statistic as a function of R2

We will now see that in many applications we can also calculatethe F statistic from the R2.Let us consider the expression of the F statistic based on the sumof squared residuals

F =(SSRr � SSRnr) /qSSRnr/(n� k� 1)

Dividing the numerator and the denominator by the total sum ofsquares of the unrestricted mode, SSTnr, we have

F =

�SSRrSSTnr

� SSRnrSSTnr

�/q

SSRnrSSTnr

/(n� k� 1)


Given the definition of R-squared, if the dependent variable of therestricted model is the same as that of the unrestricted model, wehave that SSTr = SSTnr, and we can therefore write the F statisticas a function of the R-squared

F =�1� R2

r ��1� R2

nr��

/q(1� R2

nr) /(n� k� 1)=

�R2

nr � R2r�

/q(1� R2

nr) /(n� k� 1)

Example 5 (cont.) Using the results from Example 5 we canre-calculate the F statistic to test that we obtain the same result aswhen we calculated it using the SSR

F =(0.300� 0.186) /2(1� 0.300)/522

= 42.5

The number is not exactly the same as the one we obtainedpreviously due to rounding error.Test statistics and measurement units.When we make a changein the measurement units of one or more of the variables in themodel, the test statistics t and F do not change.


Inference with large samples

When we do not know the distribution of the errors we can alsomake inference provided the sample size is large.Using the central limit theorem, it can be shown that even if theerrors are not normal, if the sample size is large, the test statistic tis approximately distributed as N(0, 1) under the null hypothesis.Moreover, in the case of the F test of q linear restrictions, it canalso be shown that although the errors are not normal, if thesample size is large, the F statistic is approximately distributed asFq,n�k�1 under the null hypothesis.In both tests the critical region is defined as in the tests undernormality, except that in the case of the t statistic we now have touse the critical values of the N(0, 1) instead of the critical values ofthe t distribution.


Date post:	21-Feb-2021
Category:	Documents
Upload:	others
View:	20 times
Download:	0 times

Chapter 8: Analysis of Multiple Regression:...

Documents