SOME TESTS FOR MODEL SELECTION AND … Procedures for Model...SOME TESTS FOR MODEL SELECTION AND...

SOME TESTS FOR MODEL SELECTION AND MODEL VALIDATION IN REGRESSION ANALYSIS

Peter Donners, Paul Knottnerus, Arie Oudshoorn, Kees van de Sande and Wil Velding

Netherlands Postal and Telecommunications Services, Headquarters

1. Introduction

As regards Econometrics and so many other disciplines as well, many important contributions

to the literature have recently been given. These contributions can be considered as

significant innovations for the corresponding branches of the natural sciences. In the

econometric area the main reason for such an innovation certainly is the actual economic

crisis and recession which fOllowed after a long period of economic growth. The numerous

classical econometric modelS fitted the situation of economic growth rather well but failed

to represent the later situation of economic decline.

In the classical econometric approach much attention is paid to the disturbances of a

regression model rather than to the exogenous variables.

An alternative approach is given by Hendry and Richard (1982,1983). In their strategy for

constructing an econometric model two steps can be distinguished. The first step consists of

model selection, i.e., the construction of an empirical model that adequately represents the

relevant data. The model has to satisfy some design criteria. In this stage much attention is

paid to the structure of the regression equation. In particular, the dynamic properties

play an important role. Five tests which we briefly describe in this paper, deal with model

selection. Given a single equation regression model the test statistics indicate to what

extent the design criteria are satisfied. If at least one test statistic has a Significant

value then the model is rejected. 2 For this reason the tests are called (mis)specification

tests. The five tests for model selection are :

- the Goldfeld-Quandt test for heteroskedasticity;

- the Lagrange Multiplier (LM) test for autocorrelation;

- the Chow test for a structural break;

- the Wu-Hausman test for orthogonality of the regressors and the disturbances;

- Ramsey's RESET test for functional misspecification and/or omitted variables.

Finally, the second step of the approach of Hendry and Richard consists of mOdel validation:

the validity of the model is investigated. In order to establish the validitY,the model under

consideration is applied to new data and/or compared to other competitive models. The most

convenient test for model validation which we will also describe in this paper, is the so

called Chow predictive failure test.

The plan of the paper is as follows. In section 2 five tests for model selection and one test

for model validation are briefly described. In section 3 the tests are applied to an example

from the econometric literature. In the appendix more detailed information is given about the

SAS programs to be used for applying the six tests in the environment of 'PROC REG'. The

information about the SAS programs in the framework of 'PROC AUTOREG' is available upon

request. It should be noted that the tests are only appropriate for a single equation

regression m:odel. Finally, some conclusions are formulated in section 4.

36

2. Test~5 for modeLselection and validation

In the first five subsections the tests for model selection are described and in the last

subsection a test for model validation, the Chow predictive failure test, is described.

2.1. The Goldfeld-Quandt test for heteroskedasticity

2.1.1. Description

The Goldfeld-Quandt test (see Goldfeld and Quandt (1965,1972}) is meant to detect

heteroskedasticity among the disturbances of a regression model of time series. For that

purpose the observations are ordered by the values of one of the explaining variables and

divided into two subperiods with the same number of observations. It is possible to omit a

certain number, say r, of observations between the two subperiods. This yields:

subperiod 1

omitted observations between subperiod 1 and subperiod 2

subperiod 2

number of observations

(T - r}/2

r

(T - r) /2

where T is the total number of observations. Now for both subperiods the regression model is

estimated by ordinary least squares (OLS). If .the estimation results differ too much, then the

null hypothesis of homoskedasticity is rejected.

2.1.2. The test statistic

Denote the regression model by

y xs + ('; ,

where y and ('; are vectors of length T, X is a T x k matrix, and S is a vector of length k.

Then we have in terms of covariance matrices

Ecc' no heteroskedasticity among the disturbances

Ecc' heteroskedasticity among the disturbances.

2 where IT is the identity matrix of order T, DT is an arbitrary TXT diagonal matrix, and cr

denotes the variance of the disturbances E. If r is the number of omitted observations, then

the test statistic of the Goldfeld-Quandt test is defined by

37

SSR1' T - r -/ 2

- k) SSR1 0 2

F SSR2 T - r SSR2

2 / ( -2- - k) 0

where SSR1 and SSR2 denote the sum of squared residuals from the OLS estimation of the

regression model during subperiod 1 and subperiod 2, respectively. The test statistic F has

the well known F distribution; the degrees of freedom n and m of numerator and denominator,

respectively, are n=m=~(T-r-2k).

2.2. The Lagrange Multiplier (LM) test for autocorrelation

2.2.1. Description

The LM test can be applied in many ways. For instance, by Breusch (1978) and Pagan (1974) this

tes~is applied in the context of nonlinear models. However, we restrict ourselves to the

simple case of the linear regression model, in matrix notation, y = xS + E. The LM test

consists of an extension of the regression model by adding lagged residuals from the OLS

estimation of the original regression model. If the extended regression model yields a sum of

squared residuals which is significant lower, we may conclude that there is autocorrelation

among the disturbances of the regression model under consideration.

2.2.2. The LM test statistic

Consider the extended regression model

y XS + zy + E

'V,"here Z is the T x h matrix of lagged residuals from the regression y

o

Z

e~_l

o o

38

o o o

o

xS + E •

Now the HO and ~1 hypotheses can be formulated as

y 0

y + 0

no autocorrelation of order p, 1 ~ p ~ h

autocorrelation of order p, p ~ 1

Let SSR denote the sum of squared residuals from the OLS estimation of the regression model

y = xS + £ and ler ESSR denote the sum of squared residuals from the OLS estimation of the

extended regression model y = xS + Zy + £ • Then the LM test statistic is defined by

(SSR - ESSR)/h F

ESSR/(T - k - h)

which follows a F distribution, the degrees of freedom of numerator and denominator being

equal to h and T - k - h, respectively.

2.3. The Chow test for a structural break

2.3.1. Description

It may occur that regression parameters Significantly change by a structural break in the

data. ,Examples of such a structural break are war, strike, revolution, and oil crisis. The

parameters to be estimated, may substantially differ before and after such a kind of events.

Before applying the Chow test (see Chow (1960» the period of T observations is divided into

two subperiods. Then two OLS estimations are performed. First, the original regression model

is estimated by OLS by using the data of the whole period. Second, the original regression

model is extended by adding k variables which are zero during the first subperiod and are

equal to the k regressors from the original regression model during the second subperiod.

Finally, the sums of squared residuals from both OLS estimations are ,compared with each other

in order to verify whether the regression parameters are affected by the structural break

or not. In fact, this gives the same result as comparing the sums of squared residuals

obtained from the two regressions from the two subperiods.

2.3.2. ,The Chow test statistic

The test period is divided into two subperiods and the extended model can be written as

subperiod 1

subperiod 2

observations 1 to Tl

observations Tl + 1 to T •

Then the hypotheses HO and H1 can be formulated as

no structural bre,ak after observation Tl

a structural break after observation T 1 •

39

,-.. -."."' i. J _

In order to calculate the Chow test statistic the extended regression model is alternatively

written as

where y = 62 - 61 • The null hypothesis HO is y

The Chow test statistic is defined by

(SSR - ESSR) /k F

ESSR/(T - 2k)

o against the hypothesis Hl that y + 0 .


equal to k and T - 2k, respectively.

2.4. The Wu-Hausman test for orthogonality of the regressors and the disturbances

2.4.1. Description

One of the assumptions of the standard regression model y = X6 + € (see for the symbols

used section 2.1.2.) is that the set of explaining variables X and the disturbances € are

independently distributed or

(1) o

Violation of this assumption will generally lead to biased and inconsistent OLS estimates.

If cov (X. ,€) + 0 we call the regressors jointly dependent. On the other hand, if a 1.t t

regression model satisfies the classical assumption (1) the regressors are predetermined

variables.

The Wu-Hausman test (see Hausman(1978» is meant to check the orthogonality of (a part of)

the regressors and the disturbances € In order to apply this test we must have the

availability of a set of instrumental variables W which are distinct to the set of regressors

X and are independent of the disturbances € • Next we regress the original variables X on W

and we add the resulting residuals Z to the original model y= X6 + € If the coefficients

of Z are significantly different from zero (a part of) the regressors X are correlated with

€ and hence it may be concluded that some of the regressors are jointly dependent. As a

matter of fact it makes no sense to test all regressors for orthogonality, in particular if

some of them are lagged variables and therefore probably predetermined. The problem will

remain to be manageable by splitting up the set of regressors in two parts Xl and X2 . The

orthogonality of Xl and the disturbances € is checked by applying the Wu-Hausman test while

X2 is supposed to be uncorrelated with the disturbances € •

40

2.4.2. The Wu-Hausmxn test statistic

The matrix X of explaining variables can be partitioned as follows

where Xl is a T x kl matrix of explaining variables of which the orthogonality with respect

to the disturbances £ is examined. T is the number of observations and klis the number of

explaining variables under consideration. X2 is a T x (k - k l ) matrix of explaining variables

which we don't consider in the test for orthogonality •. IIJ is a T x k2 matrix of instrumental

variables and k2 > kl • After regressing Xl on W we get the T x kl residual matrix Z, with

Next the regression model y xS + £ is extended by adding Z. This yields

y XS + Zy + £ •

Now it is easy to see that the set of explaining variables Xl is orthogonal to the

disturbances £ if Y = O. Thus the HO and Hl hypotheses can be formulated as follows

y o y t 0

the explaining variables Xl are orthogonal to the disturbances £

the explaining variables Xl and the disturbances £ are not orthogonal

and some of the regressors are jointly dependent.

The Wu-Hausman test statistic is defined by

F (SSR - ESSR) /k l

ESSR/(T - k - k 1)


equal to kl and T - k - kl ' respectively.

2.5. The Ramsey RESET test for functional misspecification and/or omitted variables

2.5.1. Description

A functional misspecification of a regression equation may ll'ad to heteroskedasticity of the

disturbances while the incorrect omission of explaining variable>, may cause autocorrelation

and biased estimates.

Ramsey (1969) developed a test to detect such a kind of missp,'cl.fi.cCltion errors. Ramsey's

specification error test is based on adding test variables to till' initial regression model

y = XS + £·and on testing whether the corresponding coeffici,'nh; .n-,' significant or not.

Only one variant of Ramsey's test is examined here, namely til" .ldd i lllJ of powers of

estimations of the independent variable y to the original mod,' I.

41

i· '.

I ,.

2.5.2. The Ramsey test statistic

We have y = XS + e: and <j = Xb where b is the OLS estimator of Sand y'is the vector of

estimated values of y. Next we raise the elements of y.to the power 2,3, .•• , h+1 and

compose a new T x h matrix Z of explaining variables with

~h+1 ) y ,

where ~j denotes the vector whose elements are equal to the jth power of the elements of ~. Then we add Z to the initial regression model and we get the extended regression model

y xS + Zy + e:

HO and Hi can be formulated as follows

y 0, i.e., a correct functional specification and no omitted variables

Hi Y t 0, i~e., there are specification errors.

The Ramsey test statistic is defined by

(SSR - ESSR) /h F

ESSR/(T - k - h)

F follows a F distribution, the degrees of freedom of numerator and denominator being equal

to hand T - k - h, respectively.

2.6. The Chow predictive failure test

2.6.1. Description

After the model selection has been finished we have to carry out the model validation. One

of the tests mostly used for model validation, is the Chow predictive failure test.

The basic principle of this test is to divide the sample period into two subperiods. On the

basis of data of the first subperiod the model is estimated and with the help of the

estimated model predictions are generated for the second subperiod. The test investigates

to what extent the predictions in the second part of the sample period fit the realizations.

Technically the model under consideration is estimated on the basis of data of the whole

period where dummy variables are added to the initial model for each observation in the

second subperiod. The values of the estimated coefficients of the dummy variables and their

corresponding 't-values give an indication to what extent the predictions and realizations

of the dependent variable differ.

2.6.2. The test statistic

As said above we extend the regression model y XS + e: to y xS + Zy + e: , Z being a

42

T -x h matrix z = [ 0 J ' where 0 is a (T - h) x h matrix of zeros and Ih is the identity Ih

matrix of order h. Now the hypotheses are

y 0, i.e., the model is valid and the predictions are close to the realizations

H1 Y + Dr i.e., we have to reject the model because it fails to predict the

dependent variable reasonably well.

The Chow predictive failure test statistic is defined by

(SSR - ESSR) /h F

ESSR/(T - k - h)

F follows a F distribution, the degrees of freedom of numerator and denominator being equal

to hand T - k - h, respectively.

3. An example

In this section all tests, discussed in section 2, are applied to an example taken from the

econometric literature (see Koutsoyiannis(1976». In this example a single equation

regression model is built, with the imports of the UK as the dependent variable (y t) and the

gross national product of the UK (GNP) as independent variable (xt ). Also an intercept term

is included. So, the following import function is established

where £t denotes the disturbance term (t = 1,2, .•. ,T).

This section consists of two subsections: the description of the SAS program which carries

out the actual testing and a discussion of the test results tabulated by the SAS procedure

'PROC TABULATE'.

3.1. The description of the SAS program

The SAS program Which will carry out the testing, looks as follows DATA KOUTS:I~PuT ZT XT ~a:

RETAIN YEAR 1941:YEAR=YEARol: LABEL ZT=IMPORTS XT=GROSS NATIONAL PRODUCT YEAR=VEAR; CARDS:

3748 21117 4010 22418 3111 l230a 4004 23319 4151 24180 45&9 2489J 45sfz 25310 4697 2519q 4153 25881> 5(11)2 21>86R 5669 28134 5"2~ 290'll 5131> 29450 5946 30105 1>501 32312 6549 33152 6105 33764 1104 34411 7609 35429 HI00 36200

DATA INST:SET KOUTS: XT1=LAGIIXTI:XT2=LAG2IXTI: KEEP XTl XT2:

PROC ~EG DATA=KOUTS: ~ODEL ZT=XT lOW:

'UNClUDE TEST.S: tGOLDFELDIKOUTS,lT,xT,2,ID,201: %L~TEST IKOUT5,ZT,XT.2.20,11: %L~TEST IKDUTS,lT,XT,2,20,21; %L~rEST IKOUTS.ZT,XT,2.2Q,31: %CHOWSTR IKO~TS,lT.XT,2,20,81:

%HAUS~AN IKOUTS,ZT,XT,2,20,XT,1.I~ST,21: %RESET IKOUTS,lT,xr,2,20,21: %CHOWFOREtKOurs.ZT,XT.2.20.2);

43

',-{:

Three parts can be distinguished: the datasteps, the separate procedure step ('PROC REG' or

'PROC AUTOREG') and the call commands of the macros of the test procedures.

The first datastep transforms the raw data from the Koutsoyiannis' example to a SAS dataset

"KOUTS" which will be the input dataset of the regression. Imports and GNP data are read

by list input mode. The observation period is 1950-1969. In the second datastep a SAS

dataset, named "INST", is created. This dataset contains the regression data of the

instrumental variables used by the Wu-Hausman test. In this particular case there are only

two instrumental variables, being the one and the two years lagged GNP series, XTl and XT2,

respectively.

In the second part of the SAS program 'PROC (AUTO) REG' is invoked in order to estimate the

model. Finally, in the last part of the program all the six tests are applied to the model

(the LM test three times). The '%INCLUDE' statement provides the including of a set of SAS

statements from an external file, named "TESTS". This latter file contains all macro

functions of the test procedures.

Both the statements defining the macro functions and the call commands of the macros of the

test procedures are presented in the appendix.

3.2. SAS output and testing results

In this section the SAS output and the test results are briefly discussed per individual

test.

3./,.1. The Goldfeld-Quandt test As a result of the call command %GOLDFELD(KOUTS,ZT,XT,2,10,20), the following table is

printed :

INTERCEPT

GROSS NATIONAL PRODUCT

SSR

F FOR HO

PROB > F( 8 , 8 )

FIRST OBSERVATION

LAST OBSERVATION

TABLE 1 GOLDFELD & QUANDT TEST

RESULTS OF TESTING

FIRST

RESULTS OF

PERIOD

-1958.47230

0.25899

83033.59535888

0.17815

0.98754

1

10

ESTIMATION

SECOND PERIOD

-2885.09660

0.29252

466093.22959398

11

20

The test period is divided into two subperiods of ten years. Per subperiod the estimated

regression coefficients are presented as well as the sum of squared residuals. It can be

seen from table 1 that P { F(8,8) ~ 0.18 } ~ 0.988 • Hence, the null hypothesis of

homoskedasticity is rejected if we choose the Significance level a ~ 0.05 .3

44

3.2.2. The LM test

First, the LM test is applied to check on first order autocorrelation (h=l) by using the

call command %LM(KOUTS,ZT,XT,2,20,l). The SAS output of this macro is tabulated in table 2.

INTERCEPT

GROSS NATIONAL

RESIDI

SSR

F FOR HO

PROB > F( 1 .

PRODUCT

17 )

TABLE 2 LM - TEST

RESULTS OF TESTING

RESULTS OF ESTIMATION

WITHO!JT WITH RESIDUALS RESIDUALS

-2461.37543 -2615.84127

0.27952 0.28546

0.60784

573069.23719818 452069.54101083

4.55017

0.04779

In case of a chosen significance level a 0.05 the null hypothesis is rejected since

P { F (1,17) ~ 4.55 } = 0.048 •

Second, the regression model is checked on a second order autocorrelation (h=2) by the call

command %LM(KOUTS,ZT,XT,2,20,2). The SAS output of this macro is presented in table 3.

INTERCEPT

GROSS NATIONAL

RESIDI

RESID2

SSR

F FOR HO

PROB > F( 2

PRODUCT

16 )

TABLE 3 LM - TEST

RESULTS OF TESTING


WITHOUT WITH RESIDUALS RESIDUALS

-2461.37543 -2563.32629

0.27952 0.28346

0.63234

-0.15396

573069.23719818 446531.38008563

2.26704

0.13588

Now, surprisingly, the null hypothesis of no second order autocorrelation is not rejected,

since P { F(2,16) ~ 2.267 } = 0.136 ~ 0.05 •

Third, we consider the case where h=3 by invoking the call command %LM(KOUTS,ZT,XT,2,20,3).

The SAS output of this macro iST~~(r~nted in table 4.

INTERCEPT

GROSS NATIONAL

RESIDI

RESID2

RESID3

SSR

F FOR HO

PROB > F( 3 .

PRODUCT

15 )

LM - TEST RESULTS OF TESTING

45


WITHOUT WITH RESIDUALS RESIDUALS

-2461. 37543 -2318.41333

0.27952 0.27416

0.50878

-0.04531

-0.67305

573069.23719818 342227.19570541

3.37264

0.04658

And as P { F(3,15) ~ 3.373 } = 0.047 ~ 0.05 the value of the LM test statistic in this case

is significant and the null hypothesis of no autocorrelation of order p, 1 ~ p ~ 3 , is

rejected.

In order to discriminate between AR(1) disturbances and AR(2) disturbances we consider the

following test statistic derived from the tables 2 and 3

(452069 - 446531)/1 F(1,16) 0.20 ,

446531/16

where the·degrees ·of freedom of numerator and denominator are equal to 1 and 16,

respectively. From the table of the F distribution it can be seen that the value of the test

statistic is insignificant ( F. 05 (1,16) = 4.49 ).

Finally, in order to discriminate between AR(2) and AR(3) disturbances we combine the

tables 3 and 4 and consider the test statistic

(446531 - 342227)/1 F(1,15) 4.57

342227/15

It emerges that this value is significant (F. 05 (1,15) = 4.54 ). This indicates that there

is a third order autocorrelation structure among the disturbances,

while Koutsoyiannis only considers the first order autocorrelation. Note that 'PROC AUTOREG'

can be used in case of autocorrelation.

3.2.3. The Chow test on a struct~Z break A structural break after the eighth observation (1957) is checked. The call command is

%CHOW(KOUTS,ZT,XT,2,20,8) • The SAS output of this macro is tabulated in table 5.

INTERCEPT


ZETl

ZET2

SSR

F FOR HO

PROB > FC 2 . 16 )

BREAK AFTER OBSERVATION NUMBER

TABLE 5 CHOW TEST ON STRUCTURAL BREAK

RESULTS OF TESTING

RESULTS OF

BASIC MODEL

-2461.37543

0.27952

573069.23719818

0.37038

0.69624

8

ESTIMATION

EXTENDED MODEL

-1658.45910

0.24599

-1033.36437

0.04076

547711.50891343

Besides the two basic explaining variables (intercept and GNP), two extra explaining

variables ZET1 and ZET2 are added (see section 2.3.). The estimated coefficients of ZET1

and ZET2 indicate to what extent the intercept and the coefficient of GNP, estimated in the

first subperiod, change in the second subperiod. In this specific case the regression

46

coefficient of GNP changes from 0.246 during the first subperiod to 0.246 + 0.041= 0.287

during the second subperiod.

The value of the Chow test statistic is 0.370, indicating that the sum of squared residuals

didn't change significantly (F. 05 (2,16) = 3.63). The null hypothesis of no structural

break after eight observations is not rejected.

3.2.4. The Wu-Hausman test

The orthogonality of GNP and the disturbances is Checked by means of two instrumental

variables. The call command is %HAUSMAN(KOUTS,ZT,XT,2,20,XT,1 ,INST,2). As said in

section 3.1. the SAS dataset "INST" contains the instrumental variables, in this case the

one and two years lagged GNP series. The SAS output of the macro is tabulated in table 6.

INTERCEPT


TRANSXl

SSR

F FOR HO

PROB > F( 1 . 17 )

INSTRUMENT NUMBER

TABLE 6 WU-HAUSMAN TEST

RESULTS OF TESTING


WITHOUT WITH INSTRUMENTS INSTRUMENTS

-2461. 37543 -2531.58829

0.27952 0.28175

0.00671

573069.23719818 554711.18303916

0.56261

0.46347

2

TRANSX1 denote the residuals resulting f.rom regressing xt (GNP) on xt _1 and xt _2 • Extending

the basic model by TRANSx1 the sum of squared residuals didn't change significantly. The

null hypothesis of orthogonality is not rejected.

3.2.5. The RESET test

The functional form of the regression model and the omission of explaining variables are

tested by adding two variables to the basic regression model which are powers of the

estimated dependent variable of the basic regression, ~. They are called YHATEXP2 and

YHATEXP3, being ~2 and ~3 (see section 2.5.). The call command is %RESET(KOUTS,ZT,XT,2,20,2).

The SAS output of the macro is presented in the following table.

INTERCEPT

GROSS NA TI DNA L

YHATEXP2

YHATEXP3

SSR

F FOR HO

PROB > Fe 2 .

PRODUCT

16 )

TABLE 7 RESET TEST

RESULTS OF TESTING

47


WITHOUT YHATCS) WITH YHATCS)

-2461.37543 -20250.49285

0.27952 1.49527

-0.00086

0.00000

5731>69.23719818 352058.61376375

5.02213

0.02029

The sum of squared residuals has changed substantially. The calculated F value is

high (5.022). And as P { F(2,16) ~ 5.022 } = 0.02 ~ 0.05, the null hypothesis is rejected.

The functional form is incorrect and/or there are omitted variables.

3.2.6. The Chow predictive failure test

The forecast performance of the model during the last two years (1968 and 1969) of the

observation period is checked after the model has actually been-estimated during a period

from 1950 up to 1967. The call command is %CHO'lllFORE(KOUTS,.ZT,XT,2,20,2). The SAS output

of the macro is tabulated in table 8.

INTERCEPT


DUMMV1

DUMMV2

SSR

F FOR HO

PROB > F( 2 . 16 l

NUMBER OF DUMMIES

TABLE B CHOW PREDICTIVE FAILURE TEST

RESULTS OF TESTING

MODelS

WITHOUT DUMMIES WITH DUMMIES

-2461.37543 -2003.82522

0.27952 0.26161

344.16157

633.45845

573069.23719818 235191.99802256

11. 49281

0.00080

0 2

The forecast performance is evaluated by adding two dummy variables (one for 1968, one

for 1969) to the basic model (see section 2.6.). The change of the sum of squared residuals

is rather large and so is the value of the test statistic (11.493). The null hypothesis is

rejected. The forecast performance is rather bad. The coefficients are very likely to

change when the number of observations change from 18 to 20. This indicates that the model

is not valid.

4. Conclusions

It turns out that four test statistics have a significant value, namely, the Goldfeld

Quandt test, the LM test, the RESET test, and the Chow predictive failure test.

In particular, the significant value of the RESET test suggests that there is a functional

misspecification and/or that explaining variables are wrongly omitted. This may cause the

significant values of the three other tests. A more extensive analysis has to be done.

However, it is beyond the scope of this paper to go into further details.

SAS software appears to be a powerful and efficient instrument to extend the set of

instruments to evaluate certain aspects of an econometric model. In our daily practice of

econometric model building, the tests which were discussed in this paper, are applied in

order to find the 'best' model. At the same time we make progress in extending the set of

tests. Two tests on so called recursive residuals are already programmed in the same way

as shown in this paper.

48

Footnotes

1. With the exception of Mr. Oudshoorn who works at the AMRa Bank, Amsterdam,

The Netherlands, all authors are employees of the Econometric and Statistical Branch

of the Headquarters of the Netherlands Postal and Telecommunications Services,

P.O. Box 30 000, 2500 GA The Hague, The Netherlands.

More detailed information or questions about the paper can be addressed to one of the

authors.

2. Additionally, it should be noted that if the five tests a:te ass'umed to be independent

then the significance level aX for a specific test is aX = 1 - (1_a)1/5 where

a denotes the overall significance level. See Kramer et al. (1985).

3. Of all the tests presented here, the Goldfeld-Quandt test is the only one with a lower

critical value F • For the other tests only an upper critical value has to be derived. L

49

Appendix

A. The call commands of the macros of the test procedures

Some of the macro variables appear in every parameter list. These are:

DSN name of the SAS dataset, containing the (transformed), data of the variables

DEPVAR

which are actually used in the regressioni

name of the dependent variable~

INDEPVAR: name(s) of the independent variable(s), the intercept term not included,

delimited by a blank~

k

T

number of independent variables, the intercept ,term included~

total, number of observations.

The call cOIllDlands are:

Goldfeld-Quandt test

LM test

Chow test on a structural break

Wu-Hausman test

RESET test

Chow predictive failure test

50

%GOLDFELD(DSN,DEPVAR,INDEPVAR,k,T1 ,T) ,

Tl being the number of the observations of a

subperiod with Tl = (T-r)/2 where r is the number

of omitted observations~

%LM(DSN,DEPVAR,INDEPVAR,k,T,h)

h being the number of lagged residuals under

consideration~

%CHOWSTR(DSN,DEPVAR,INDEPVAR,k,T,T1)

Tl being the number of observations before the

structural break~

%HAUSMAN(DSN,DEPVAR,INDEPVAR,k,T,XIVAR,k1 ,W,k2)

X1VAR being the name(s) of the explaining

variable(s), delimited by a blank, of which the

orthogonality with the disturbances is tested,

kl being the number of explaining variables in

XIVAR, W being the name of the SAS dataset

containing the data of the instrumental variables

and k2 being the number of instrumental variables

appearing in W;

%RESET(DSN,DEPVAR,INDEPVAR,k,T,h)

h being the number of powers of the estimated

variable y that are added to the basic regression

model as independent variables. Note: the extra 'd d t 'abl _2 ,,3 ... h+l ~n epen en var~ es are y , x ,. ., x

%CHOWFORE(DSN,DEPVAR,INDEPVAR,k,T,h)

h being the number of dUIllDlY variables.

B.The SAS program of the tests

%LET TESTED = XXXXXXXX ;

%MACRO MAKERESICN,NUMDF,DENOMDF) RESULT JC2,&N ,1) RESULTCI,I) RSSR RESULTC2,I) USSR RESULTCI,2) TEST RESULTC2,2) RESULTCl,3) RESULTC2,3) RESULTCI,&N RESULTC2,&N

I-PROBFCTEST ,&NUMDF , &DENOMDF )

%~lEND ;

1 2

%MACRO VARLISTCNAME,N) %DO X=I %TO &N ;

&NAME&X %END ;

%~lEND ;

%MACRO TABLES(DSNI,DSN2) DATA TABLEI ; SET TABLE

SET &DSNI &DSN2 %MEND ;

%MACRO RENAMECNAME,N) %00 X=I %TO &N-I ;

COL&X = &NAME&X %ENO ; COL&N = &NAME&N

%MEND ;

%MACRO LABELCNAME,N) %DO X=I %TO &N;

%LET K=%EVALC&X+l); LABEL &NAME&X=&NAME&K;

%END; %MEND;

%MACRO LAGlISTCN); %00 X=I ~no &N

RESID&X = LAG&X CRESIDBAS) ; IF RESIO&X = . THEN RESID&X = 0

%END %~lEND

%MACRO DUMMIES %DO X=l %TO &T2

DUMMY&X = 0 ; IF N_ %EVALC&X + &Tl ) THEN DUMMV&X =1

%END ; %MENO ;

PROC FORMAT

51

VALUE LMFMT 1 'WITHOUT RESIDUALS 2 , WITH RESIDUALS ' . ,

VALUE GOLDFMT 1 'FIRST PERIOD 2 'SECOND PERIOD ' . ,

VALUE HAUSFMT 1 ' WITHOUT INSTRUMENTS' 2 , WITH INSTRUMENTS';

VALUE RSFMT 1 'WITHOUT YHATCS) 2 , WITH YHATCS) , . ,

VALUE CHOWSFMT 1 BASIC MODEL 2 'EXTENDED MODEL ' . ,

VALUE CHOWFFMT 1 'WITHOUT DUMMIES 2 , WITH DUMMIES ' . ,

%MACRO STARTREG(DSN) ;

PROC REG DATA=&DSN OUTEST=BASICEST TITLE3 OLS REGRESSION OF THE BASIC MODEL (DATA FROM &DSN )

MODEL &DEPVAR = &INDEPVAR / P R DW OUTPUT OUT = BASICOUT R = RESIDBAS P = PREDBAS PROC PRINT ;VAR &DEPVAR &INDEPVAR ;

%tET TESTED = &DSN ;

%MEND ;

%MACRO LMTEST(DSN,DEPVAR,INDEPVAR,K,T,LAGS) ; PROC PRINTTO UNIT=18 NEW *****;

TITLE2 LM - TEST ;

%LET NUMDF = &LAGS %LET DENOMDF = %EVAL(&T - &K - &LAGS ) ; %IF &TESTED -= &DSN %THEN %STARTREG(&DSN

DATA LMRES ; SET &DSN; SET BASICOUT(KEEP=RESIDBAS) %LAGLIST(&LAGS ) ;

PROC REG DATA=LMRES OUTEST=XXESTIB ; TITLE3 MODEL WITH &LAGS LAGGED RESIDUALS MODEL &DEPVAR = &INDEPVAR %VARLIST(RESID,&LAGS ) /P R DW OUTPUT OUT = XXOUTIB R = RESIDNEW P = PRED2 ; PROC PRINT ;VAR &DEPVAR &INDEPVAR %VARLIST(RESID,&LAGS )

PROC MATRIX ; FETCH R DATA=BASICOUT(KEEP=RESIDBAS) FETCH U DATA=XXOUTIB(KEEP=RESIDNEW) USSR= U' * U RSSR= R' * R ;

TEST = ( (RSSR - USSR ) t/ &NUMDF ) t/ (USSR t/ &DENOMDF )

%MAKERESI(4,&NUMDF ,&DENOMDF ) ; OUTPUT RESULT OUT TABLE;

%TABLES( BASICEST, XXESTIB) PROC PRINTTO ; *****;

PROC TABULATE DATA = TABLEI FORMAT = 15.5 ORDER = DATA FORMCHAR='FABFACCCBCEB8FECABCBBB'X;

52

TITLE3 RESULTS OF TESTING ; CLASS COL4 ; VAR %VARLISTCCOL.4) INTERCEP &INDEPVAR %VARlISTCRESID.&lAGS ) FORMAT COL4 lMFMT. ; LABEL COL4 'RESULTS OF ESTIMATION'

COLI = 'SSR ' COL2 = 'F FOR HO' INTERCEP = 'INTERCEPT' COL3 = PROB > FC &NUMDF. &DENOMDF ) ;

TABLE INTERCEP &INDEPVAR %VARLISTCRESID.&lAGS COLl*F=lO.8 COl2 COl3 .COL4/RTS=50;

KEYLABEL SUM = , , ;

%MEND

%MACRO GOLDFELD(DSN.DEPVAR.INDEPVAR.K.T1.T2) PROC PRINTTO UNIT=18 NEW

TITlE2 GOLDFELD & QUANDT TEST; %LET NUMDF =%EVALC&T1 - &K ) %LET DENOMDF =%EVAL(&T1 - &K )

OPTIONS OBS = &T1 ; PROC REG DATA=&DSN OUTEST=XXEST2A

TITlE3 FIRST &T1 OBSERVATIONS ; MODEL &DEPVAR = &INDEPVAR /P R DW OUTPUT OUT = XXOUT2A R = RESID1

OPTIONS OBS = 999 ; PROC PRINT ;VAR &DEPVAR &INDEPVAR

OPTIONS FIRSTOBS = %EVAL(&T2 - &T1 + 1 PROC REG DATA=&DSN OUTEST=XXEST2B ;

TITLE3 LAST &T1 OBSERVATIONS ; MODEL &DEPVAR = &INDEPVAR /P R DW OUTPUT OUT = XXOUT2B R = RESID2

OPTIONS FIRSTOBS=1 ; PROC PRINT iVAR &DEPVAR &INDEPVAR

OPTIONS OBS = 999 PROC MATRIX i FETCH R1 DATA=XXOUT2ACKEEP=RESID1) rETCH R2 DATA=XXOUT2BCKEEP=RESID2) RSSR=R1' * R1 USSR=R2' * R2

TEST = RSSR t/ USSR

%MAKERESIC6.&NUMDF ,&DENOMDF RE5ULTCl,4) 1 RE5ULT(1,S) &T1 RESULTC2,4) %EVALC&T2 - &T1 + 1) RESULTC2,S) &T2 OUTPUT RESULT OUT TABLE

%TABLESCXXEST2A • XXEST2B) PROC PRINTTO i *****;

PROC TABULATE DATA TABLEI FORMAT = 15.5 ORDER = DATA FORMCHAR='FABFACCCBCEB8FECABCBBB'Xi

TITLE3 RESULTS OF TESTING i

53

CLASS COL6 ; VAR %VARLIST(COL,6) INTERCEP &INDEPVAR FORMATCOL6 GOLDFMT. ; LABEL COLI 'SSR '

COL2 'F FOR HO' COL4 'FIRST OBSERVATION' COL5 'LAST OBSERVATION' COL6 'RESULTS OF ESTIMATION' INTERCEP 'INTERCEPT' COL3 PROB > Fe &NUMDF, &DENOMDF ) ;

TABLE INTERCEP &INDEPVAR COLl*F=lO.8 COL2 COL3 COL4*F=lO.O COL5*F=lO.O ,COL6/RTS=50;

KEYLABEL SUM = , ,

%MEND ;

%MACRO CHOWFORE(DSN,DEPVAR,INDEPVAR,K;T,T2) ; PROC PRINTTO UNIT=18 NEW; *****;

TITLE2 CHOW PREDICTIVE FAILURE TEST ;

%IF &TESTED -=&DSN %THEN %STARTREG(&DSN %LET Tl =%EVAL(&T -&T2 ) %LET NUMDF=&T2 ; %LET DENOMDF=%EVAL(&Tl - &K

DATA CHOWDUM SET &DSN %DUMMIES

PROC REG DATA = CHOWDUM OUTEST=XXEST4B ; TITLE3 MODEL WITH &T2 DUMMIES ;

MODEL &DEPVAR = &INDEPVAR %VARLIST(DUMMY,&T2 ) /P R DW OUTPUT OUT XXOUT4B R = RESID2 ;

PROC PRINT VAR &DEPVAR &INDEPVAR %VARLIST(DUMMY,&T2 );

PROC MATRIX ; FETCH Rl DATA=BASICDUT(KEEP=RESIDBAS) FETCH R2 DATA=XXOUT4B(KEEP=RESID2) RSSR=Rl' * Rl USSR=R2' * R2 ;

TEST = ( (RSSR - USSR ) i/ &NUMDF ) i/ (USSR i/ &DENOMDF )

%MAKERESI(5,&NUMDF ,&DENOMDF ) ; RESULT(1,4) = 0 ; RESULT(2,4) = &T2 ; OUTPUT RESULT OUT=TABLE

%TABLES(BASICEST,XXEST4B) PROC PRINTTO; *****;

PROC TABULATE DATA=TABLEI FORMAT=15.5 ORDER=DATA FORMCHAR='FABFACCCBCEB8FECABCBBB'X;

TITLE3 RESULTS OF TESTING; CLASS COL5 ; VAR %VARLIST(COL,5) &INDEPVAR INTERCEP %VARLIST(PUMMY,&T2 ); FORMAT COL5 CHOWFFMT.; LABEL COLI 'SSR'

COL2 = 'F FOR HO'

54

COL4 = 'NUMBER OF DUMMIES' COLS = 'MODELS ' INTERCEP = 'INTERCEPT' COL3 = PROB > F( &NUMDF, &DENOMDF ) ;

TABLE INTERCEP &INDEPVAR %VARLIST(DUMMY,&TZ ) COLl*F=10.8 COLZ COL3 COL4*F=10.0 ,COLS/RTS=SO;

KEY LABEL SUM = , ,

%MEND ;

%MACRO CHOWSTR(DSN,DEPVAR,INDEPVAR,K,T,Tl) PROC PRINTTO UNIT=18 NEW

TITLE2 CHOW TEST ON STRUCTURAL BREAK ;

%IF &TESTED -=&DSN %THEN %STARTREGC&DSN %LET NUMDF &K; %LET DENOMDF = %EVAL(&T - &K - &K ) ;

PROC MATRIX DUMC=JC&T,l,l); FETCH HELP DATA=BASICOUT(KEEP &INDEPVAR HELP=DUMCIIHELP; HELP(l:&Tl , ) = 0 ; OUTPUT HELP OUT = MAKEZET CRENAME=(%RENAME(ZET,&K»)

DATA CHOWZET ; SET &DSN SET MAKEZET ;

PROC REG DATA=CHOWZET OUTEST=XXEST3B ; TITLE3 BREAK AFTER &Tl OBSERVATIONS ;

MODEL &DEPVAR = &INDEPVAR %VARLIST(ZET,&K) /P R DW OUTPUT OUT = XXOUT3B R = RESIDZ ; PROC PRINT ;VAR &DEPVAR &INDEPVAR %VARLISTCZET,&K) ;

PROC MATRIX FETCH Rl DATA=BASICOUTCKEEP=RESIDBAS) FETCH R2 DATA=XXOUT3B(KEEP=RESIDZ) RSSR=Rl' * Rl USSR=R2' * R2 ;

*****;

TEST = ( C RSSR - USSR ) i/ &NUMDF ) i/ (USSR i/ &DENOMDF )

%MAKERESI(S,&NUMDF ,&DENOMDF ) ; RESULT(I,4) = &Tl ; RESULT(2,4) = . , OUTPUT RESULT OUT=TABLE

%TABLES(BASICEST,XXEST3B) PROC PRINTTO; *****;

PROC TABULATE DATA=TABLEI FORMAT=IS.S ORDER=DATA FORMCHAR='FABFACCCBCEB8FECABCBBB'X;

TITLE3 RESULTS OF TESTING; CLASS COLS ; VAR %VARLIST(COL,S) &INDEPVAR INTERCEP

%VARLIST(ZET,&K) ; FORMAT COLS CHOWSFMT.;

'SSR ' 'F FOR HO'

LABEL COLl COL2 COL4 'BREAK AFTER OBSERVATION NUMBER'

55

COLS = 'RESULTS OF ESTIMATION' INTERCEP = 'INTERCEPT' COL3 = PROB > F( &NUMDF, &DENOMDF

TABLE INTERCEP &INDEPVAR %VARLIST(ZET,&K)

KEYLABEL SUM I ,

%MEND ;

COLl*F=lO.8 COL2 COL3 COL4*F=lO.O ,COLS/RTS=SO;

%MACRO RESET(DSN,DEPVAR,INDEPVAR,K,T,Tl) PROC PRINTTO UNIT=l8 NEW

TITLE2 RESET TEST ;

%IF &TESTED -=&DSN %THEN %STARTREG(&DSN %LET NUMDF=&Tl ; %LET DENOMDF=%EVAL(&T - &K -&Tl ) ;

PROC MATRIX ; FETCH HELP DATA=BASICOUT(KEEP=PREDBAS) HELPI = J(&T ,&Tl,l) ; HELPl(,l) = HELP(,I) t HELP(,l) ; DO 1=2 TO &Tl

HELP1(,I) = HELP(,l) t HELP1(,I-l) END ; OUTPUT HELPl OUT=RESETO(RENAME=(%RENAME(YHATEXP,&Tl»)

DATA RESETl SET &DSN; SET RESETO

PROC REG DATA = RESETI OUTEST=XXEST5B ; TITLE3 MODEL WITH &Tl YHAT(S) ;

****l<' ;

MODEL &DEPVAR = &INDEPVAR %VARLIST(YHATEXP,&Tl) / P R DW ; OUTPUT OUT XXOUT5B R = RESID2 ; PROC PRINT; VAR &DEPVAR &INDEPVAR %VARLIST(YHATEXP,&Tl) ;

PROC MATRIX ; FETCH Rl DATA=BASICOUT(KEEP=RESIDBAS) FETCH R2 DATA=XXOUT5B(KEEP=RESID2) RSSR=Rl' * Rl USSR=R2 ' * R2 ;

TEST = ( (RSSR - USSR ) t/ &NUMDF ) t/ (USSR t/ &DENOMDF )

%MAKERESI(4,&NUMDF ,&DENOMDF ) ; OUTPUT RESULT OUT=TABLE ;

%TABLES(BASICEST,XXEST5B) PROC PRINTTO ; *****;

PROC TABULATE DATA=TABLEl FORMAT=l5.5 ORDER=DATA FORMCHAR='FABFACCCBCEB8FECABCBBB ' X;

TITLE3 RESULTS OF TESTING; CLASS COL4 ; VAR %VARLIST(COL,4) &INDEPVAR INTERCEP %VARLIST(YHATEXP,&Tl); FORMAT COL4 RSFMT.; LABEL COLI 'SSR I

COL2 'F FOR HO' COL4 'RESULTS OF ESTIMATION'

56

INTERCEP = 'INTERCEPT' COl3 = PROB > FC &NUMDF, &DENOMDF ); %lABElCYHATEXP,&TI);

TABLE INTERCEP &INDEPVAR %VARlISTCYHATEXP,&TI) COlI*F=IO.8 COl2 COl3 ,COl4/RTS=SO;

KEYlABEl SUM ~ , ,

%MEND ;

%MACRO HAUSMANCDSN,DEPVAR,INDEPVAR,K,T,XIVAR,KI,W,K2) PROC PRINTTO UNIT=18 NEW ; *****;

%IF &TESTED ~=&DSN %THEN %STARTREGC&DSN ) ; %LET NUMDF = &KI ; %lET DENOMDF = %EVAlC&T - &K - &KI ) ;

PROC MATRIX ; FETCH W DATA=&W FETCH Xl DATA=&DSN CKEEP = &XlVAR ROW = NROW(W) ; TRANS2 = C CW * CINVCW' * W * W'») TRANSI = (ICROW) CW * CINVCW' * W ) * W'») * Xl OUTPUT TRANS I OUT=HAUSMANO(RENAME=(%RENAMECTRANSX,&KI»)

DATA HAUSMANl SET &DSN; SET HAUSMANO

TITLE2 WU-HAUSMANTEST ;

PROC REG DATA = HAUSMANl OUTEST=XXEST6B ; TITLE3 MODEL WITH &K2 INSTRUMENTS ;

MODEL &DEPVAR = &INDEPVAR %VARLISTCTRANSX,&Kl )/ P R DW OUTPUT OUT XXOUT6B R = RESIDZ ; PROC PRINT ; VAR &DEPVAR &INDEPVAR %VARlISTCTRANSX,&Kl

PROC MATRIX ; FETCH Rl DATA=BASICOUTCKEEP=RESIDBAS) FETCH R2 DATA=XXOUT6BCKEEP=RESID2) RSSR=Rl' * Rl USSR=RZ' * R2 ;

TEST = C CRSSR - USSR ) i/ &NUMDF ) i/ (USSR i/ &DENOMDF )

%MAKERESICS,&NUMDF ,&DENOMDF ) ; RESUlT(1,4)= . ; RESULT(Z,4)= &KZ ; OUTPUT RESULT OUT=TABlE

%TABLESCBASICEST,XXEST6B); PROC PRINTTO; *****;

PROC TABULATE DATA=TABLEI FORMAT=IS.S ORDER=DATA FORMCHAR='FABFACCCBCEB8FECABCBBB'X;

TITlE3 RESULTS OF TESTING; CLASS COLS ; VAR %VARLIST(COL,S) &INDEPVAR INTERCEP %VARLISTCTRANSX,&Kl ) FORMAT COlS HAUSFMT.; LABEL COLI 'SSR'

COLZ 'F FOR HO' COl4 'INSTRUMENT NUMBER' COlS 'RESULTS OF ESTIMATION'

INTERCEP = 'INTERCEPT' COl3 = PROB > FC &NUMDF. &DENOMDF ) ;

TABLE INTERCEP &INDEPVAR %VARlISTCTRANSX,&KI COll*F=lO.8 COlZ COL3 COL4*F=lO.O • COLS/RTS=SO;

KEY lABEL SUM =

%MEND ;

57

References

Breusch, T.S. (1978), "Testing for Autocorrelation in Dynamic Linear Models, "Australian

Economic Papers,17, 334-355.

Chow, G.C. (1960), "Tests of Equality be-tween Sets of Coefficients in Two Linear

Regressions,"Econometrica,28, 591-605.

Goldfeld, S.M., and R.E. Quandt (19,65), "Some Tes·ts for Homo8cedasticity, "Journal of the

American Statistical Association,60, 539-547.

Goldfeld, S.M., and R.E. Quandt (1972), "Nonlinear Methods in Econometrics, "North Holland,

Amsterdam.

Hausman, J.A. (1978), "Specification Tests in Econometrics, "Econometrica,46, 1251-1271.

Hendry, D.F. and J.F. Richard (1982), "On the Formulatiqn of Empirical Models in Dynamic

Econometrics," Journal of Econometrics,20, 3-33.

Hendry, D.F. and J.F. Richard (1983), "The Econometric Analysis of Economic Time Series,"

International Statistical Review,51, 111-163.

Kramer, W., H. Sonnberger, J. Maurer and P. Havlik (1985), "Diagnostic Checking in

Practice," The Review of EConomics and Statistics ,67, 118-123.

Pagan, A. (1974), "A Generalized Approach on the Treatment of Autocorrelation, "Australian

Economic Papers,13, 267-280.

Ramsey, J .B. (1969), "Tests for Specification Errors in Classical Linear Least Squares

Regression Analysis, "Journal of the Royal Statistical Society Series B,31. 350-371.

Koutsoyiannis, A. (1976), Theory of Econometrics. London and Basingstoke: the MacMillan

Press Ltd, 220-222.

58

Date post:	19-Apr-2018
Category:	Documents
Upload:	ngonga
View:	225 times
Download:	3 times

SOME TESTS FOR MODEL SELECTION AND … Procedures for Model...SOME TESTS FOR MODEL SELECTION AND...

Documents