Chapter 8: Analysis of Multiple Regression:Inference
Statistics and Introduction to Econometrics
M. Angeles Carnero
Departamento de Fundamentos del Análisis Económico
Year 2014-15
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 1 / 51
Distribution of the OLS estimators
In Chapter 7 we saw that the OLS estimators are unbiased and weobtained an expression for their variance. Knowing the expectedvalue and the variance of an estimator is useful in order to analyseits precision. However, in order to make inference, that is, to dohypothesis testing on the parameters and do confidence intervals,we need to know the distribution of the estimator.When we condition on the observed values of the explanatoryvariables in our sample, the distributions of the OLS estimatorsdepend on the distribution of the errors. In order for thedistributions of bβj to be simple, and in order to be able to obtainconfidence intervals and test statistics, we are going to assumethat the error term is normally distributed in the populations. Thisis known as the normality assumption.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 2 / 51
Assumption MLR.6 (Normality)The population error u is independent on the explanatoryvariables x1, x2, ..., xk and its distribution is normal with zero meanand variance σ2 :
u � N(0, σ2)
Comments on assumption MLR.6
This assumption is much stronger than the assumptions we madein Chapter 7. In fact, since u is independent on xj, neither the meannor the variance of u depends on xj. Therefore, assumption MLR.6,implies
E(u j x1, .., xk) = E(u) = 0 (assumption MLR.3)
Var(u j x1, .., xk) = σ2 (assumption MLR.5)
Assumptions MLR.1 to MLR.6 are denoted as assumptions of theclassical linear model.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 3 / 51
Comments on assumption MLR.6 (cont.)Under the assumptions of the classical linear model, the efficiencyproperty of the OLS estimators is stronger than under theGauss-Markov assumptions.
It can be shown that, under the assumptions of the classical linearmodel, the OLS estimators are estimators with the smallest varianceamong all the unbiased estimators, that is, we do not need to restrictthe comparison only to linear estimators in yi.
Assumption MLR.6 can also be written as
y j x1, .., xk � N(β0 + β1x1 + β2x2 + ...+ βkxk, σ2)
and therefore, assumption MLR.6 is equivalent to say that thedistribution of y given xj is normal, with a linear mean in xj andconstant variance.The fact that the distribution of y given xj is normal or not is anempirical question, which can be true or not.
In many occasions, taking logs in variables such as wages, income,prices, expenditures, etc makes the normality assumption moreplausible.In any case, we will see later that if errors are not normal it is not abig problem as long as the sample size is large.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 4 / 51
Distribution of the OLS estimatorsUnder assumptions MLR.1 to MLR.6, and conditioning on theobserved values of x1, x2, ..., xk,bβj � N
�βj, Var
�bβj
��and standardising bβj � βjr
Var�bβj
� � N (0, 1) (1)
where the expression of Var�bβj
�is the one we saw in Chapter 7.
Note that this property only adds to the statistical properties ofthe OLS estimators seen in Chapter 7 the fact that the distributionof bβj is normal.
The proof of the normality property of bβj is based on the fact thatbβj is a linear function of yi. Therefore, since yi are normal and any
linear function of normal variables is normal, we have that bβjfollows a normal distribution.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 5 / 51
Hypothesis testing on a single population parameter.The t test.
In this section, we study hypothesis testing on a unique parameterof the multiple regression model. This type of tests often appearsin practice, and as they only involve one parameter, they are thesimplest tests.Consider the regression model
y = β0 + β1x1 + β2x2 + ...+ βkxk + u
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 6 / 51
In order to construct the tests statistics, we need the followingresult:
Under assumptions MLR.1 to MLR.6
bβj � βj
se(bβj)� tn�k�1 (2)
If we compare this result with the one in equation (1), we see that
the difference ins that we have substituter
Var�bβj
�by se(bβj).
The difference betweenr
Var�bβj
�and se(bβj) is that
rVar
�bβj
�depends on the unknown parameter σ2, while se(bβj) depends on
the random variable bσ2.When σ2 is substituted by its estimator, bσ2, it can be shown that thedistribution becomes a t with degrees of freedom equal to thesample size, n, minus the number of estimated parameters, k+ 1.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 7 / 51
For any of the parameters of the model, we can consider thefollowing hypothesis tests
a)H0 : βj = β0
j
H1 : βj > β0j
b)H0 : βj = β0
j
H1 : βj < β0j
c)H0 : βj = β0
j
H1 : βj 6= β0j
where β0j is a real number. For example, we can test H0 : β2 = 1
versus the alternative H1 : β2 < 1, or H0 : β1 = 0 versus thealternative H1 : β1 6= 0, etc.The test statistics is obtained by replacing in equation (2), βj by its
value under the null hypothesis, β0j ,
t =bβj � β0
j
se(bβj)� tn�k�1 under H0 (3)
Note that t is a test statistic since it is a function of the sample, itdoes not depend on any unknown parameter, its distribution isknown under the null hypothesis and the plausible values of thestatistic depends on whether the null hypothesis is true or not.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 8 / 51
The alternative hypothesis in test a (and also in test b) is denotedby one-sided alternative, while the alternative hypothesis in test cis denoted by a two-sided alternative.The critical region of the test depends on the alternativehypothesis. We obtain intuitively the critical region of the 3 tests.In test a the alternative is that βj > β0
j , or equivalently βj � β0j > 0,
since bβj is an estimator of βj, it is expected that if the alternative is
true then we obtain a large positive value for bβj � β0j .
Therefore, we reject the null hypothesis in favour of the alternativeif bβj � β0
j is positive and large "enough".This means that we reject the null hypothesis if t is positive andlarger than a certain critical value.In particular, the decision rule for test a is
Reject H0 at level α if t > tn�k�1,α
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 9 / 51
In test b the alternative is βj < β0j , or equivalently βj � β0
j < 0,
since bβj is an estimator of βj, it is expected that if the alternative
holds, we obtain a negative value for bβj � β0j .
Therefore, the null hypothesis is rejected in favour of the alternativeif bβj � β0
j is negative and is quite far away from zero.This means that we reject the null hypothesis if t is negative andlower than a certain critical value.In particular, the decision rule for test b is
Reject H0 at level α if t < �tn�k�1,α
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 10 / 51
In test c the alternative is βj 6= β0j , or equivalently βj � β0
j 6= 0,
since bβj is an estimator of βj, it is expected that if the alternative is
true we obtain a value of bβj � β0j far away from zero (positive or
negative).
Therefore, we reject the null hypothesis in favour of the alternativeif bβj � β0
j is far enough from zero.This means that we reject the null hypothesis if the absolute valueof t is larger than a certain critical value.In particular, the decision rule for test c is
Reject H0 at level α if jtj > tn�k�1,α/2
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 11 / 51
Particular Case:H0 : βj = 0H1 : βj 6= 0
The test statistic is:
t =bβj
se(bβj)� tn�k�1 under H0
The statistic t for this test is denoted t-ratio.In this test we are testing whether variable xj has any effect on y,once we have taken into account the effect on y of the rest of theexplanatory variables of the model.This test is denoted significance test of variable xj. When we rejectH0 at a certain level of significance, for example 5%, we say that xjis statistically significant at the 5% level. Analogously, if we cannotreject H0 at the 5% level, we say that xj is not statisticallysignificant at the 5% level.In most statistical packages, including Gretl, the output of theregression provides the t� ratio and p-value for this test.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 12 / 51
Example 1Let us consider the following model for the grade point average ofcollege students
colgpa = β0 + β1hsgpa+ β2act+ β3skipped+ u
where colgpa is the college grade point average measured on afour-point scale, hsgpa is the high school grade point average alsomeasured on a four-point scale, act is the score obtained on a testand skipped is the average number of college classes missedweekly. Using data from the file GPA1 of Wooldridge’s book, thefollowing results are obtained:
\colgpa = 1.39+ 0.412(0.094)
hsgpa+ 0.015(0.011)
act� 0.083(0.026)
skipped
n = 141, R2 = 0.234
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 13 / 51
Example 1 (cont.)
We are now going to see which variables are statistically significant.To do so, we calculate the t�ratios:
t1 =0.4120.094
= 4.38, t2 =0.0150.011
= 1.36, t3 = �0.0830.026
= �3.19
If we take a significance level of 5%, sincen� k� 1 = 141� 3� 1 = 137, the critical value of these tests ist137,0.025 = 1.98Since jt1j > 1.98 and jt3j > 1.98, the variables hsgpa and skipped arestatistically significant at the 5% level. However, given thatjt2j � 1.98, act is not statistically significant at 5%.We therefore conclude that the test score does not influence thecollege grade point average, once we have taken into account theeffect of the high school grade point average and missed classes onthe college grade point average.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 14 / 51
The p-valueThe decision on what level of significance should be used toperform a test is arbitrary, and it is possible that depending on thesignificance level we choose, the conclusions we reach may bedifferent.To avoid this problem we can repeat the test for different levels ofsignificance.Another possibility, which is more informative than repeating thetest using different levels of significance, is to answer thefollowing question:
Given the results we have obtained in the estimation, what is thesmallest significance level at which we can reject the nullhypothesis?The answer to this question is what is known as the p-value.
The p-value of the t tests is therefore calculated as follows: If t� isthe value of the test statistic of the sample
Test a ) p-value = Prob(tn�k�1 > t�)Test b ) p-value = Prob(tn�k�1 < t�)Test c ) p-value = Prob(jtn�k�1j > jt�j)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 15 / 51
Example 1 (cont.)
In Example 1, the p-value for the significance test of the variable actis
Prob(jt137j > 1.36) = 0.176
Hence, we cannot reject H0 at any significance level less than 17.6%.For example, we cannot reject the null hypothesis at 5% as we hadseen, or at 10%.
The p-value we have obtained indicates that there is no evidenceagainst the null hypothesis.
The p-value for the significance test of the variable skipped is
Prob(jt137j > j�3.19j) = Prob(jt137j > 3.19) = 0.0018
Hence, we can reject H0 at any significance level greater than 0.18%.For example, in addition to rejecting the null hypothesis at 5% aswe had seen, we can also reject it at 1%.
The p-value we have obtained indicates that there is sufficientevidence to reject the null hypothesis.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 16 / 51
Example 2Let us consider a simple model that relates the annual number ofcrimes on college campuses (crime) with the number of studentsenrolled (enroll):
log(crime) = β0 + β1 log(enroll) + u
This is a constant elasticity model, where β1 is the elasticity of thecrimes with respect to enrolment.
In this model it is interesting to test the hypothesis that theelasticity of crime with respect to the number of students enrolledis one, H0 : β1 = 1.β1 = 1 means that a 1% increase in the number of students enrolledleads, on average, to a 1% increase in crime.Alternatively, let us consider H1 : β1 > 1, which implies that a 1%increase in the number of students enrolled increases crime bymore than 1%.If β1 > 1, then in relative terms, not only in absolute terms, crime isa bigger problem on larger campuses.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 17 / 51
Example 2 (cont.)
Using data on 97 colleges and universities in the United States for1992 contained in the data file CAMPUS of Wooldridge’s book, wehave obtained the following results:
\log(crime) = �6.363(1.03)
+ 1.27(0.11)
log(enroll)
We want to testH0 : β1 = 1H1 : β1 > 1
The test statistic is:
t =bβ1 � 1
se(bβ1)� t95 under H0
The value of the sample statistic is t = (1.27� 1)/0.11 = 2.45. Sincethe p-value of the test is p-value = Prob(t95 > 2.45) = 0.008, wereject the null hypothesis at 1%. There is evidence that, in relativeterms, crime is a bigger problem on larger campuses.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 18 / 51
Economic significance versus statistical significance
It is important to note that we have to pay attention not only tothe fact that a given variable is statistically significant or not, butthe magnitude of the effect of each variable on the dependentvariable.Example 3Let us consider the following model for participation in pensionplans
prate = β0 + β1mrate+ β2age+ β2totemp+ u
where prate is the percentage of company employees participatingin the pension plan, mrate is the index of correspondence of thepension plan (amount the company contributes to the pensionplan for every dollar contributed by the employee), age is theyears of participation in the pension plan and totemp is thenumber of company employees.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 19 / 51
Example 3 (cont.)Using data on 1534 pension plans in the United States from file401K of Wooldridge’s book, the model is estimated, obtaining thefollowing results:
[prate = 80.29+ 5.44(0.52)
mrate+ 0.269(0.045)
age� 0.00013(0.00004)
totemp
n = 1534, R2 = 0.100
If we calculate the t-ratios, we have that t1 = 10.46, t2 = 5.98 andt3 = �3.25 and all the variables are statistically significant at theusual significance levels.However, if we analyse the effect of totemp, we see that, holdingconstant mrate and age, if the company increases the number ofworkers by 10000, the rate of participation in the pension plandecreases by only 1.3 percentage points (0.00013� 10000 = 1.3).Hence, we see that although the variable is statistically significant,its effect is not very large in practical terms.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 20 / 51
Confidence intervals
Using the assumptions of the classical regression model weimmediately obtain confidence intervals (CI) for the parametersof the regression model.These confidence intervals are obtained in the same way as theconfidence interval for the mean of a normal population withunknown variance was obtained in the statistics course.Since we know that
bβj � βj
se(bβj)� tn�k�1
We have thathbβj � tn�k�1,α/2se(bβj), bβj + tn�k�1,α/2se(bβj)i
is a confidence interval for βj at a confidence level of100 � (1� α)%.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 21 / 51
It is important to remember the meaning of a confidence interval.
If we have a large number of random samples, and for each of themwe calculate the CI for βj for a confidence level of 95%, theparameter would βj would be within 95%, approximately, of theCIs calculated.Unfortunately, we usually have only a single sample and do notknow for that sample if βj is contained in the CI or not. The hope isthat our sample belongs to 95% of the samples for which the CIcontains βj, but we have no guarantee.
We can use the confidence interval for a parameter to test ahypothesis about this parameter.
Let CI be the confidence interval for βj at 100 � (1� α)%. Hence the
decision rule is: Reject H0 if β0j /2 CI is a test with a significance
level α forH0 : βj = β0
jH1 : βj 6= β0
j
This test is identical to the one based on the t statistic.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 22 / 51
Example 2 (cont.)
In Example 2, considering that t95,0.025 = 1.99, we have that theconfidence interval at 95% for the elasticity of crime with respect toenrolment, β1, is
[1.27� 1.99 � 0.11, 1.27+ 1.99 � 0.11] = [1.05, 1.49]
We can use the CI to test
H0 : β1 = 1H1 : β1 6= 1
Given that 1 /2 [1.05, 1.49], we can reject H0 at 5% in favour of atwo-sided alternative hypothesis.Note that this test is not the same as the one we did before becausenow the alternative hypothesis is two-sided. CIs can only be usedto perform tests with a two-sided alternative hypothesis.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 23 / 51
Let us now suppose that the dependent variable of the model is inlogarithms
log(y) = β0 + β1x1 + ...+ βkxk + u
we know that 100β1 measures the percentage effect on y of achange of one unit in x1, keeping the rest of the factors constant.
To calculate a confidence interval for the percentage effect on y anda change of one unit in x1, we simply have to multiply the ends ofthe confidence interval for β1 by 100.The reason for this is that since se(100bβ1) = 100se(bβ1), theconfidence interval for 100β1 is
h100bβ1 � tn�k�1,α/2100se(bβ1), 100bβ1 + tn�k�1,α/2100se(bβ1)
ih100(bβ1 � tn�k�1,α/2se(bβ1)), 100(bβ1 + tn�k�1,α/2se(bβ1))
i
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 24 / 51
Testing a linear restriction. The t test.
Now we will see how to test any linear restriction on theparameters of the multiple regression model.To illustrate the general approach we will use a simple example.Example 4
In the United States there are two types of higher educationinstitutions: junior colleges (a two-year post-secondary academicinstitution) and universities (four-year colleges). Let us suppose wewant to compare the returns to education, i.e., the effect ofeducation on wages in these two types of academic institutions.To do so we consider the following model
log(wage) = β0 + β1jc+ β2univ+ β3exper+ u (4)
where wage is the hourly salary, jc is the number of years attendinga junior college, univ is the number of years attending a universityand exper is work experience.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 25 / 51
Example 4 (cont.)
Using the sample of 6763 workers with at least a high school degreein the TWOYEAR file of Wooldridge’s book, we estimated themodel, obtaining the following results:
\log(wage) = 1.472+ 0.0667(0.0068)
jc+ 0.0769(0.0023)
univ+ 0.0049(0.0002)
exper (5)
n = 6763, R2 = 0.222
Based on these results we have that:
Holding constant the number of years at a university and workexperience, the return to one year at a university is 6.67%While holding constant the number of years at a junior college andwork experience, the return to one year at a university is 7.69%.Therefore, the return to a year at a junior college is about 1 percentagepoint (6.67� 7.69 = �1.02) less than the return to a year at auniversity.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 26 / 51
Example 4 (cont.)
The hypothesis of interest is whether the return to a year at a juniorcollege is the same as the return to a year at a university.Mathematically, we can write this hypothesis as
H0 : β1 = β2
The alternative hypothesis of interest is one-sided if the return to ayear at a junior college is less than the return to a year at auniversity. Mathematically, we can write the alternative hypothesisas
H1 : β1 < β2
Since the hypothesis of interest relates two parameters of themodel, we cannot directly use the t statistics we saw in the previoussection.Conceptually, however, this test is similar to the tests in theprevious section.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 27 / 51
Example 4 (cont.)
If we write the null and alternative hypotheses as H0 : β1 � β2 = 0and H1 : β1 � β2 < 0, and we follow the same reasoning as in theprevious section, the t statistic for this test is
t =bβ1 � bβ2
se(bβ1 � bβ2)
The question is how to calculate se(bβ1 � bβ2)?
Given that se(bβ1 � bβ2) =
q\var(bβ1 � bβ2), and the variance of a
difference depends on the covariance, to calculate se(bβ1 � bβ2) wewould need the estimated covariance between bβ1 and bβ2. Theproblem is that some statistical packages do not provide thiscovariance.
In this case, another way to obtain the test statistic, which is easierthan this one, is to conveniently reparametrise the model so thathypothesis of interest in the reparametrised model refers to a singleparameter.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 28 / 51
Example 4 (cont.)
To do so, we define θ = β1 � β2, and we write the model as afunction of θ such that the hypothesis to be tested is
H0 : θ = 0H1 : θ < 0
Since θ = β1 � β2, we can obtain the value of β1 as a function of β2and θ, β1 = θ + β2, and substituting in the model (4) we have
log(wage) = β0 + (θ + β2) jc+ β2univ+ β3exper+ u= β0 + θjc+ β2jc+ β2univ+ β3exper+ u= β0 + θjc+ β2(jc+ univ) + β3exper+ u
If we define totcoll = jc+ univ, we can write the model as
log(wage) = β0 + θjc+ β2totcoll+ β3exper+ u (6)
Note that the model (6) is the same as the model (4) but is writtendifferently.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 29 / 51
Example 4 (cont.)
Using the same data, we have estimated the reparametrised model(model (6)), obtaining the following results:
\log(wage) = 1.472� 0.0102(0.0069)
jc+ 0.0769(0.0023)
totcoll+ 0.0049(0.0002)
exper
n = 6763, R2 = 0.222(7)
The test statistic is t = � 0.01020.0069 = �1.48. The p-value is
Prob(t6759 < �1.48) = 0.069, and therefore there is some evidence,although not very strong, against the null hypothesis.Note that the estimates for the constant and the coefficient of exper,as well as their standard errors, are the same as when we estimatedthe original model.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 30 / 51
Example 4 (cont.)
Note also that the estimated coefficient of totcoll and its standarderror coincide with the results obtained for univ in the originalmodel.
The reason is that the estimated coefficient of totcoll in equation (7)measures the return to 1 more year at a university or junior college,holding constant the number of years at a junior college and workexperience.Given that the number of years at a junior college is held constant, theestimated coefficient of totcoll in equation (7) actually measures thereturn to 1 more year at a university, and therefore measures exactlythe same thing that the coefficient of univ measured in equation (5).
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 31 / 51
Generally, we will proceed in a similar manner for any linearrestriction on the parameters of the model we want to test.
We first define an auxiliary parameter so that we can express theconstraint as a function of that parameter.We will then reparametrise the model and estimate thereparametrised model to perform the test.
We will see several examples on the Problem Set.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 32 / 51
Confidence interval for a linear combination of parameters
To calculate a confidence interval for a linear combination ofparameters, we must also define an auxiliary parameter andreparametrise the model as a function of this parameter in thesame way as the linear restriction test.Example 4 (cont.)
We will now obtain a confidence interval for the difference betweenthe return to one year at a junior college and the return to one yearat a university.In model (4), we have seen that the difference in returns is100(β1 � β2).Hence, again defining θ = β1 � β2, we have to calculate aconfidence interval for 100θ.Using the estimates of the reparametrised model as a function of θ(equation (7)), and bearing in mind that t6759,0.025 = 1.96, theconfidence interval at 95% for 100(β1 � β2) is
[100(�0.0102� 1.96 � 0.0069), 100(�0.0102+ 1.96 � 0.0069)]= [�2.37, 0.33]
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 33 / 51
Testing several linear restrictions. The F test.
In practice, in many cases, we are interested in jointly testingseveral hypotheses on model parameters.Example 5 Let us consider in the model
log(wage) = β0 + β1educ+ β2exper+ β3expersq+ u
where we want to test, holding constant years of education,whether wages depend on work experience. The partial effect ofexperience on the log of wages is
∂ log(wage)∂exper
= β2 + 2β3exper
Testing if this effect is zero for any level of experience meanstesting H0 : β2 = β3 = 0.As in the previous example, we will begin with the case in whichwe want to test whether a subset of the variables are jointlysignificant.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 34 / 51
Exclusion restrictions testsLet us consider the multiple regression model with k explanatoryvariables (in addition to the constant term)
y = β0 + β1x1 + ...+ βkxk + u (8)
In this model we want to test whether q of these variables havezero coefficients. To simplify the notation we will assume thatthese q variables are the last of the list xk�q+1, .., xk.The null hypothesis we want to test is
H0 : βk�q+1 = 0, .., βk = 0
The alternative hypothesis is that the null hypothesis is false,meaning that at least one of the parameters appearing in the nullhypothesis is different from zero.To perform the test, we first impose restrictions on the nullhypothesis in model (8), obtaining
y = β0 + β1x1 + ...+ βk�qxk�q + u (9)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 35 / 51
Model (8) is denoted as the unrestricted model, while model (9) isdenoted as the restricted model.The test is based on the following intuitive idea:
First we estimate the two models and calculate the sum of squaredresiduals (SSR) for each. Let us denote the SSR of the unrestrictedmodel (8) as SSRnr, and the SSR of the restricted model (9) as SSRr.If the variables xk�q+1, .., xk are significant in explaining thedependent variable, we can expect the SSR to increase substantiallywhen eliminating these variables, that is, we should expect thatSSRr will be considerably larger than SSRnr.On the contrary, if these variables are not significant in explainingthe dependent variable, we can expect that SSRr will be verysimilar to SSRnr (remember that the SSR cannot decrease byeliminating variables and therefore SSRr � SSRnr).The test statistic is based on calculating the increase in the SSR oncethe variables xk�q+1, .., xk have been eliminated.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 36 / 51
Specifically, the test statistic is:
F =(SSRr � SSRnr) /qSSRnr/(n� k� 1)
(10)
As we mentioned above SSRr � SSRnr. Hence, the F statistic isalways non-negative.It can be shown that
F � Fq,n�k�1 under H0
Moreover, as we have seen, if the null hypothesis is true, SSRr isexpected to be very similar to SSRnr. However, if the nullhypothesis is false SSRr is expected to be considerably larger thanSSRnr.The decision rule for the test is:
Reject H0 at level α if F > Fq,n�k�1,α
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 37 / 51
Example 5 (cont.)Returning again to the example of wages as a quadratic functionof work experience
log(wage) = β0 + β1educ+ β2exper+ β3expersq+ u (11)
we have estimated this model using data from the file WAGE1 ofWooldridge’s book and have obtained the following results:
\log(wage) = 0.128+ 0.090(0.0075)
educ+ 0.041(0.0052)
exper� 0.00071(0.00012)
expersq
n = 526, SSR = 103.7904, R2 = 0.30
We want to test whether work experience has an effect on wages.We have seen that the null hypothesis that we have to test is
H0 : β2 = β3 = 0
If we impose the restrictions in model (11), we have the restrictedmodel
log(wage) = β0 + β1educ+ u (12)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 38 / 51
Example 5 (cont.)
and estimating this model we obtain
\log(wage) = 0.584+ 0.083(0.0076)
educ
n = 526, SSR = 120.7691, R2 = 0.186
The value of the F statistic for this sample is
F =(120.7691� 103.7904) /2
103.7904/522= 42.69
Given thatp-value = Prob(F2,522 > 42.69) = 0
we can reject the null hypothesis at any reasonable significancelevel.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 39 / 51
Testing the global significance of the regressionIn this test, the null hypothesis is that none of the explanatoryvariables of the model affect the dependent variable, that is
H0 : β1 = β2 = ... = βk = 0
This test is a particular case of the exclusion restriction test we justsaw. To perform the test we have to consider the restricted model,which in this case is simply a constant
y = β0 + u
in this case the SSE of the restricted model is zero and thereforeSSRr = SST.Substituting in the general expression of the F statistic for this testwe have
F =(SST� SSR) /kSSR/(n� k� 1)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 40 / 51
Dividing the numerator and denominator by SST, and recallingthat R2 = 1� SSR
SST , we can write the F statistic for the globalsignificance test of the regression as a function of R-squared
F =�1� SSR
SST
�/k
SSRSST /(n� k� 1)
=R2/k
(1� R2) /(n� k� 1)
Most statistical packages, including Gretl, provide the value of theF statistic in the output table of the regression for the globalsignificance test.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 41 / 51
Testing general linear restrictionsExclusion restriction tests are the most frequent tests performed inempirical analyses. However, in some cases we may be interestedin simultaneously testing several restrictions which are not simplythat several coefficients are zero.The methodology for this type of test is identical to the case of theexclusion restrictions. First we have to impose restrictions on themodel to define the restricted model. We then estimate bothmodels (the unrestricted and restricted models) and calculate theF statistic as in the case of the exclusion restrictions.We will illustrate these tests with an example
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 42 / 51
Example 6Let us consider the model
log(price) = β0 + β1 log(assess) + β2 log(lotsize)+β3 log(sqrft) + β4bdrms+ u
where price is the selling price of the house, assess is the estimatedvalue of the house before being sold, lotsize is the size of the plot,sqrft is the surface area of the house, and bdrms is the number ofrooms.
Suppose we want to test whether the estimated value of the houseis rational or not. If it is, then a 1% increase in the estimated valueof the house should result in a 1% increase in the price, i.e., β1 = 1.Moreover, after taking into account the estimate, the remainingvariables should not help to explain the price, i.e.,β2 = β3 = β4 = 0.Thus the null hypothesis that we have to test is
H0 : β1 = 1, β2 = β3 = β4 = 0
and the restricted model is
log(price)� log(assess) = log(price/assess) = β0 + uM. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 43 / 51
Example 6 (cont.)
Using data from the file HPRICE1 of Wooldridge’s book, the modelhas been estimated, obtaining the following results:
\log(price) = 0.264+ 1.043(0.151)
log(assess) + 0.0074(0.0386)
log(lotsize)
�0.1032(0.1384)
log(sqrft) + 0.0338(0.0221)
bdrms
n = 88, , SSR = 1.822 R2 = 0.773
As for the restricted model, we generate the variablelog(price/value) and estimate the regression of this variable on aconstant (not including any explanatory variable). The valueobtained for SSRr is 1.880.The test statistic is F = (1.880�1.822)/4
1.822/83 = 0.66 and as the p-value= Prob(F4,83 > 0.66) = 0.6215, we cannot reject the null hypothesisat any reasonable significance level as there is no evidence againstthe hypothesis that the estimated value of the house is rational.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 44 / 51
Relationship between the t and F statisticsSo far we have seen how we can use the t statistic for testing alinear restriction and the F statistic for testing several linearrestrictions.However, there is no reason why we cannot also use the F statisticfor testing a single linear restriction, and the question that arisesis: What is the relationship between these two statistics whentesting a single restriction?The answer is that it can be shown that F = t2 and that the p-valueof the two-sided test based on the t statistic is the same as thep-value based on the F statistic, hence the conclusion with eithermethod will be identical .
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 45 / 51
Example 4 (cont.)Let us consider the model again
log(wage) = β0 + β1jc+ β2univ+ β3exper+ u (13)
where wage is salary per hour, jc is the number of years at a juniorcollege, univ is the number of years at a university and exper isyears of work experience. We want to test
H0 : β1 = β2
If we impose this restriction we have that the restricted model is
log(wage) = β0 + β2jc+ β2univ+ β3exper+ u= β0 + β2 (jc+ univ) + β3exper+ u
and defining as above totcoll = jc+ univ, we can write the restrictedmodel as
log(wages) = β0 + β2totcoll+ β3exper+ u (14)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 46 / 51
Example 4 (cont.)
When we estimate models (13) and (14) we obtain the followingresults:
\log(wage) = 1.472+ 0.0667(0.0068)
jc+ 0.0769(0.0023)
univ+ 0.0049(0.0002)
exper
n = 6763, SSR = 1250.54352 R2 = 0.2224
\log(wage) = 1.472+ 0.0762(0.0023)
totcoll+ 0.0049(0.0002)
exper
n = 6763, SSR = 1250.94205 R2 = 0.2222
The value of the F statistic in the sample is
F =1250.94205� 1250.54352
1250.54352/6759= 2.15
The value that we obtained for the t statistic was �1.48 and(�1.48)2 = 2.19, which is slightly different from the value of F dueto rounding error.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 47 / 51
Example 4 (cont.)
If the alternative hypothesis is two-sided, we can directly use the Fstatistic to perform the test.When the alternative hypothesis is one-sided, we have to use the t tstatistic. The reason is that the F statistic is always positive andtherefore does not allow us to distinguish between the alternativehypothesis β1 < β2 and the alternative hypothesis β1 > β2.Once we have calculated the F statistic as F = t2, we have that t canbe equal to
pF or �
pF.
To determine whether t has a positive or negative sign, we use that
t =bβ1 � bβ2
se(bβ1 � bβ2)
and therefore the sign of t coincides with the sign of bβ1 � bβ2.In this manner, once F is calculated, we have thatt =sign(bβ1 � bβ2)
pF.
In this example, given that bβ1 � bβ2 < 0,t = �
pF = �
p2.15 = �1.47, which is slightly different than the
value we obtained previously for t due to rounding error.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 48 / 51
The F statistic as a function of R2
We will now see that in many applications we can also calculatethe F statistic from the R2.Let us consider the expression of the F statistic based on the sumof squared residuals
F =(SSRr � SSRnr) /qSSRnr/(n� k� 1)
Dividing the numerator and the denominator by the total sum ofsquares of the unrestricted mode, SSTnr, we have
F =
�SSRrSSTnr
� SSRnrSSTnr
�/q
SSRnrSSTnr
/(n� k� 1)
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 49 / 51
Given the definition of R-squared, if the dependent variable of therestricted model is the same as that of the unrestricted model, wehave that SSTr = SSTnr, and we can therefore write the F statisticas a function of the R-squared
F =�1� R2
r ��1� R2
nr��
/q(1� R2
nr) /(n� k� 1)=
�R2
nr � R2r�
/q(1� R2
nr) /(n� k� 1)
Example 5 (cont.) Using the results from Example 5 we canre-calculate the F statistic to test that we obtain the same result aswhen we calculated it using the SSR
F =(0.300� 0.186) /2(1� 0.300)/522
= 42.5
The number is not exactly the same as the one we obtainedpreviously due to rounding error.Test statistics and measurement units.When we make a changein the measurement units of one or more of the variables in themodel, the test statistics t and F do not change.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 50 / 51
Inference with large samples
When we do not know the distribution of the errors we can alsomake inference provided the sample size is large.Using the central limit theorem, it can be shown that even if theerrors are not normal, if the sample size is large, the test statistic tis approximately distributed as N(0, 1) under the null hypothesis.Moreover, in the case of the F test of q linear restrictions, it canalso be shown that although the errors are not normal, if thesample size is large, the F statistic is approximately distributed asFq,n�k�1 under the null hypothesis.In both tests the critical region is defined as in the tests undernormality, except that in the case of the t statistic we now have touse the critical values of the N(0, 1) instead of the critical values ofthe t distribution.
M. Angeles Carnero (UA) Tema 8: AMRI Year 2014-15 51 / 51