+ All Categories
Home > Documents > The Influence of VAR Dimensions on Estimator Biases

The Influence of VAR Dimensions on Estimator Biases

Date post: 20-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
18
Electronic copy available at: http://ssrn.com/abstract=1985485 The inuence of VAR dimensions on estimator biases 1 By Karim M. Abadir, Kaddour Hadri, and Elias Tzavalis 1 INTRODUCTION Vector AutoRegressions (VARs) have now become the most popular tool of Time Series analysis amongst econometricians. Unfortunately, little is known about the analytic nite- sample properties of parameter estimators for such systems. The asymptotic analysis of VARs published to date does not address questions regarding the inuence of the number and nature of the system’s variates on parameter estimates. Clearly, both questions will have repercussions on the way VARs are used, and we intend to address them here. We consider the implications of varying the dimensions of VARs on the biases of Maximum Likelihood and Least Squares Estimators (MLE and LSE, respectively). In the purely nonsta- tionary case (-dimensional random walk), estimator biases are approximately equal to the dimension of the system () times the univariate bias, even when the variates are generated independently of each other. We show that the variance too increases with the dimension of the system, hence also raising the Mean Squared Error (MSE) of the estimator. When some stable linear combinations exist, the biases are generally smaller and are asymptotically pro- portional to the sum of the characteristic roots of the VAR. One source of such combinations is meaningful economic relations that are represented by the cointegration of some of the components of the VAR. Adding economically-irrelevant variables to a VAR is thus shown to have more serious negative consequences in integrated time series than in classical ergodic or cross section analyses. The ndings strengthen the case for parsimonious modelling and for the reduction step of the general-to-specic marginalization method. They also support the use of seasonally unadjusted data whenever possible. Let { } be a × 1 discrete time series which is sampled over =1 , and which follows the rst order VAR [a VAR(1)] = 1 + ( ) 1 + IN (0 ) (1) where is the identity matrix of order , is the backward dierence operator such that 1 , and we condition on 0 .A rst order autoregressive system is chosen for ease of exposition, though any VAR() can be reformulated as a VAR(1) of dimensions aug- mented to where becomes the companion matrix of the VAR() and becomes singular. In such a case, the derivations that follow give the (dominant) leading-term approximation for the biases. In (1), we let be positive denite, and be a × matrix with zero eigenvalues such that 1 and the remaining eigenvalues are inside a unit circle centred around 1. Though (1) could allow vectors that are integrated of order [denoted ()] with individual components that are at most (), we restrict further to have rank so that (1). This rules out cases like " 1 2 # = " 0 1 0 0 #" 1 2 # 1 + (2) where the rank of the nilpotent matrix is 1 (in spite of = 0), with the eect that 2 (1) and 1 (2). Generally, economic series seem to be at most (1), so we focus our analysis on this case only. 1 We are very grateful to Giovanni Forchini and Patrick Marsh for their extensive comments. We would also like to thank Les Godfrey, David Hendry, Grant Hillier, Bent Nielsen, Paolo Paruolo, Peter Robinson and Tony Sudbery for useful comments on an earlier version of this paper. This research was supported by ESRC (UK) grant R000236627. 1
Transcript

Electronic copy available at: http://ssrn.com/abstract=1985485

The influence of VAR dimensions on estimator biases1

By Karim M. Abadir, Kaddour Hadri, and Elias Tzavalis

1 INTRODUCTION

Vector AutoRegressions (VARs) have now become the most popular tool of Time Series

analysis amongst econometricians. Unfortunately, little is known about the analytic finite-

sample properties of parameter estimators for such systems. The asymptotic analysis of

VARs published to date does not address questions regarding the influence of the number

and nature of the system’s variates on parameter estimates. Clearly, both questions will have

repercussions on the way VARs are used, and we intend to address them here.

We consider the implications of varying the dimensions of VARs on the biases of Maximum

Likelihood and Least Squares Estimators (MLE and LSE, respectively). In the purely nonsta-

tionary case (-dimensional random walk), estimator biases are approximately equal to the

dimension of the system () times the univariate bias, even when the variates are generated

independently of each other. We show that the variance too increases with the dimension of

the system, hence also raising the Mean Squared Error (MSE) of the estimator. When some

stable linear combinations exist, the biases are generally smaller and are asymptotically pro-

portional to the sum of the characteristic roots of the VAR. One source of such combinations

is meaningful economic relations that are represented by the cointegration of some of the

components of the VAR. Adding economically-irrelevant variables to a VAR is thus shown to

have more serious negative consequences in integrated time series than in classical ergodic or

cross section analyses. The findings strengthen the case for parsimonious modelling and for

the reduction step of the general-to-specific marginalization method. They also support the

use of seasonally unadjusted data whenever possible.

Let be a × 1 discrete time series which is sampled over = 1 , and which

follows the first order VAR [a VAR(1)]

∇ = −1 + ≡ (− )−1 + ∼ IN (0Ω) (1)

where is the identity matrix of order , ∇ is the backward difference operator such that

∇ ≡ −−1, and we condition on 0. A first order autoregressive system is chosen for

ease of exposition, though any VAR() can be reformulated as a VAR(1) of dimensions aug-

mented to where becomes the companion matrix of the VAR() and Ω becomes singular.

In such a case, the derivations that follow give the (dominant) leading-term approximation

for the biases.

In (1), we let Ω be positive definite, and ≡ − be a × matrix with zero

eigenvalues such that 1 ≤ ≤ and the remaining − eigenvalues are inside a unit

circle centred around −1. Though (1) could allow vectors that are integrated of order

[denoted ∼ ()] with individual components that are at most (), we restrict

further to have rank − so that ∼ (1). This rules out cases like

∇"1

2

#

=

"0 1

0 0

# "1

2

#−1+ (2)

where the rank of the nilpotent matrix is 1 (in spite of − = 0), with the effect that

2 ∼ (1) and 1 ∼ (2). Generally, economic series seem to be at most (1), so we focus

our analysis on this case only.

1We are very grateful to Giovanni Forchini and Patrick Marsh for their extensive comments. We would

also like to thank Les Godfrey, David Hendry, Grant Hillier, Bent Nielsen, Paolo Paruolo, Peter Robinson and

Tony Sudbery for useful comments on an earlier version of this paper. This research was supported by ESRC

(UK) grant R000236627.

1

Electronic copy available at: http://ssrn.com/abstract=1985485

The MLE of is

=³X∇

0−1´ ³X

−1 0−1´−1

= +³X

0−1´ ³X

−1 0−1´−1

(3)

where all summations run from = 1 , unless stated otherwise. Given the independent

Normality of , the LSE of coincides with the MLE here. The bias of is defined as

≡ E [ − ] = E [ − ], where E [] is the expected value operator. In the univariatecase, becomes a scalar and we write for it. We have chosen to formulate the system

in terms of rather than in order to facilitate the decompositions of Section 3 and the

interpretation of the results in terms of co-integration. This formulation is known as the Error

Correction Mechanism (ECM) in econometrics, and as the proportional component of control

systems in mathematics.

The purpose of our paper is to examine the effect that and have on the bias of .

We do so for two cases. First, in Section 2, we let = 0 (i.e. = ), and call this the

purely nonstationary case. It corresponds to a VAR that is purely made up of a -dimensional

random walk. We contrast the effect of on the bias of estimators here and when variables

are either stationary or have no time connotation. Then in Section 3, we introduce a new

decomposition of VARs into stable and unstable components. We let some of the eigenvalues

of be nonzero (i.e. 1 ≤ ) in order to obtain cointegration in the system, and consider

the behaviour of the asymptotic bias in terms of and . We use the term ‘asymptotic bias’

to signify the leading (dominant) term in the large- expansion of the bias. We compare

the outcome with the two extremes of purely nonstationary and stationary VARs. Section 4

rounds off the discussion by considering the implications of the theoretical results for practical

econometric modelling. The proofs of Theorems 1 and 2 are given in Appendices A and B,

respectively.

2 THE BIAS IN THE PURELY NONSTATIONARY VAR

We start by deriving the bias of when = 0 and 0 = 0 in the VAR of (1). The result

has two main features. One is quantitative: using the moment generating function (mgf)

of Abadir and Larsson (1996), we get a formula for the bias under Normality. The other is

qualitative: the multivariate bias 0 is exactly equal to a scalar matrix (i.e. it is a scalar

times the identity matrix), with the scalar being approximately times the univariate bias

formula. In other words, 0 ' 0 . Much of the proof relies on arguments that only

require the density of to be symmetric about the origin, and not necessarily Gaussian.We conjecture that the qualitative feature extends to most of these densities as well, but only

prove the extension for the cases of elliptical densities which already represent a very wide

class. We start by proving the result for Gaussian residuals, then discuss some invariance

features of our results.

The following analysis shows that 0 is exactly a scalar matrix, and is of intrinsic

interest since it provides an intuitive explanation of why the qualitative feature arises. For

a function () of the components of , we have E [ ()] = 12E [ () + (−)], by

symmetry of the density of . To prove that a finite E [ ()] = 0, as in the case of

the off-diagonal elements of 0 , we only need to show that (−) = − (), i.e. that () is an odd function of . Formally, by the Cholesky decomposition Ω = 0 where is a lower triangular matrix, define ≡ −1 ∼ IN(0 ) and ≡ −1, and rewrite

− = (P

0−1)(

P−1 0

−1)−1 = (

P

0−1)(

P−1 0−1)

−1−1. Taking expectationsand substituting = 0,

0 = E∙³X

0−1´ ³X

−1 0−1´−1¸

−1(4)

2

Any two sequences and 6= , are standard Normals (hence symmetrically dis-

tributed about 0) and are independent of each other. Replacing by − and keepingthe rest of as before will therefore not affect the expectation in (4). But this leads to thenegative expectation of the off-diagonal elements of the th column and row of , which can

only be true if the expectation is zero. So, off-diagonal elements of 0 vanish.

More explicitly, let = 1 and define ≡ P

0−1, ≡

P−1 0−1,

³X

0−1´ ³X

−1 0−1´−1 ≡ "

1 p02p3 4

# "1 q02q2 4

#−1(5)

≡"1 p02p3 4

# "(1) q(2)

0

q(2) (4)

#=

"1

(1) + p02q(2) 1q

(2)0 + p02(4)

p3(1) + 4q

(2) p3q(2)0 + 4

(4)

#

Also define to be made of −1 and for = 2 . Then, the resulting is equal to with the exception of the first element 1 = −1. By the formula forpartitioned inverses, we get

³X

0−1´ ³X

−1 0−1´−1

=

"1 −p02−p3 4

# "(1) −q(2)0−q(2) (4)

#(6)

=

"1

(1) + p02q(2) −1q(2)0 − p02(4)

−p3(1) − 4q(2) p3q

(2)0 + 4(4)

#

which has the same expectation as (5). But since the corresponding off-diagonal blocks have

opposite signs in (5) and (6), then these have zero expectation. Applying the same logic to

all off-diagonal elements proves that 0 is diagonal. The diagonal elements are then equal

because of the invariance of the normalized (by −1) VAR specification to the reordering ofthe components of . This also shows that matrices drop out of the bias expression in (4),

and the bias is not a function of Ω here.

With these simplifications in mind, we now analyse the case of Normally-distributed .For this case, we determine the scalar of the 0 scalar matrix, and provide sufficient condi-

tions for the existence of the bias.

Theorem 1 For = 0 and + 2, the bias of the estimators in Section 1 is

0 ' 0 = −178143 exph−26 1

i

Proof. See Appendix A.

The accuracy of the Theorem is assessed in Table I. The Theorem shows that the negative

bias is proportional to the dimension of the VAR, even if Ω were diagonal. The result is

striking. It shows that adding irrelevant variables to a VAR increases the bias of all the

estimated parameters on the diagonal of . For example, doubling the size of a purely

nonstationary VAR will double the bias of all the estimates, even if the original variables

and equations are unrelated to the added ones. This is in sharp contrast with the theory for

stationary and spatial data where adding irrelevant variables or equations reduces efficiency

but does not affect biases. We now provide the asymptotic interpretation of this result.

Let () be the standard -dimensional Wiener process on [0 1], Ω = 0, and → denote

convergence in distribution. In the purely nonstationary VAR, the normalized sums

1

X−10

Z 1

0 () d ()

00(7)

1 2

X−1 0

−1→

Z 1

0 () ()

0d0

3

are random in the limit, and are nondegenerate even when the variates are unrelated. The

bias can then be written as

0 = 1Eh

i= 1

E∙³

1

X−10

´0 ³1 2

X−1 0

−1´−1¸

(8)

= 1 E

"µZ 1

0 () d ()

0¶0 µZ 1

0 () ()

0d

¶−1#−1 +

³1

´

First, consider the limiting distribution of −2P−1 0

−1. The off-diagonal elements ofR 10 () ()

0d are nondegenerate (in spite of having zero expectations) and are of similar

probabilistic orders of magnitude as the diagonal ones. Furthermore, by the Cauchy-Schwarz

inequality, this matrix integral is invertible. As a result, all the elements of the inverse matrix

are of the same probabilistic order of magnitude. The matrix is not diagonal in the limit, in

spite of the components of () being mutually independent. This is a manifestation of

the spurious correlation problem explained by Granger and Newbold (1974) and analysed by

Phillips (1986). In our Theorem, this result is extended analytically to finite samples, instead

of asymptotics only.

Second, the matrixR 10 ()d ()

0has no degenerate elements either. Again, the elements

of this random matrix are all of similar order of magnitude, even when Ω = . Because of

the infinite memory of , the limits of the off-diagonal elements of −1P−10 are not zero

(in spite of having zero expectations) and are similar in magnitude to diagonal ones, even in

the case of independently generated processes.

As a result of the two previous observations, the diagonal elements of the product

µZ 1

0 () d ()

0¶0 µZ 1

0 () ()

0d

¶−1(9)

are equally-weighted linear combinations of random elements of similar magnitude. These

combinations are times the orders of magnitude of the corresponding univariate functionals.

Had it been a diagonal matrix −2P−1 0

−1, this -fold increase would not have happened.In the case of a stable (analysed in Theorem 2 below) with a diagonal , the scaled sub-

Hessian −1P−1 0

−1 has a deterministic diagonal limit as increases, and does not causethis phenomenon. The large biases that arise here are mainly induced by the interaction of

the off-diagonal elements of the two matrices in (9).

This is the informal asymptotic explanation of what the Theorem has proved. It may

also be visualized by looking at the last expression of (5) and recalling that the individual

components of the inner products are all of the same order of magnitude. The Theorem

stands in contrast with the corresponding traditional result arising from the estimation of

a set of reduced forms in the case of stationary or spatial models. There, Ω = ensures

the large-sample independence of the biases across equations. Premultiplying the inverse of

the second moments matrix by a diagonal one will only scale its rows, and is approximately

equivalent to separate estimation of the equations.

Incidentally, the near-singularity ofP−1 0

−1, which was implied from the earlier discus-sion, means that one should be careful when numerically inverting it in Monte-Carlo studies.

A higher precision than usual is needed in simulating highly autoregressive models, thus fur-

ther supporting the recommendations of Abadir (1995) which are based on other types of

univariate precision problems. But most importantly, the near-singularity ofP−1 0

−1 isan indication that enlarging a purely nonstationary VAR increases the variance of parameter

estimates unnecessarily. For example compare the univariate 1P21−1 with the multivariate

(P−1 0

−1)−1. The former is typically small while the latter is large because of the spuri-

ous correlation (collinearity) of the different random walks. Furthermore, using partitioned

inverses (see Remarks 1 and 3 in Appendix A), the first (and typical diagonal) component of

4

the MSE in the multivariate case is approximately

E"

1

1 (1−2)

#= E

"1

1

#+ E

"2

1 (1−2)

#

where the first term after the equality represents the univariate MSE, and the second is

positive. The latter term will be substantial for large purely-nonstationary systems, because

of the spurious correlation problem. A similar (but more elaborate) reasoning applies to the

variance also.

We now list some invariance results. First, the Theorem shows that in the purely nonsta-

tionary case, the bias of is not affected by Ω, whatever its structure may be. This is not

hard to see from (1) when is any scalar matrix and one may normalize the process by Ω−12 ,

for any Ω positive definite, without affecting the outcome for any .

Second, the limiting density of does not depend on 0. Accordingly, (−1) terms in

the bias do not depend on 0 which can then have any stationary distribution (including the

possibility of a Dirac density leading to a constant subset of 0) that does not depend on the

parameters of the process in (1). Phillips (1987) uses a different reasoning that will lead to

the same outcome regarding 0. His derivations also mean that the asymptotic implications

of our results do not require Normality of the residuals , and that they are invariant to (−1) to a wide range of strong mixing residuals, though they would then relate to pseudo-(or quasi-) MLEs. We have to warn that the latter invariances are only asymptotic, and that

departures from the conditions for 0 and/or are better tolerated in large samples thanin small ones where substantial unusual effects can arise; see Abadir (1993), Abadir and Hadri

(1995).

Third, an invariance result can also be derived for the first relation of Theorem 1. The case

of following an elliptical distribution encompasses the Normal, contaminated Normal,t-distribution and many others. Again, we assume symmetry of the density about zero.

Additionally, we require the density of to exist, and its dependence on the Mahalanobisdistance of from the origin to be via an invertible function. Then, when it exists, the

bias of the estimators in Section 1 is 0 ' 0 . [The proof follows along the same lines

of derivations preceding Theorem 2.1 of Abadir and Larsson (1996), and the derivations of

Theorem 1 above.] The existence requirement for the bias varies with the particular elliptical

density function that is adopted. Also, the formula for the univariate bias 0 will depend

on the density. An explicit formula for 0 is unavailable in the literature, except for the

Gaussian case of Theorem 1.

3 THE BIAS IN THE CASE OF COINTEGRATION

We now turn to the case where cointegration exists, meaning that there are − linear

combinations of which are (0). The common stochastic trends [e.g. see Stock and

Watson (1988)] and at most − cointegrating relations amongst the (1) components of

will now be represented by the same generating process (1) that was used earlier. The

common trends are represented by unit roots, and the − stable roots of the stochastic

difference equation (1) represent the (0) components of as well as cointegrating relations

between the (1) variates of . This approach has the advantages of leaving open the

possibility of some components of being (0), and of not requiring a different treatment of

the estimation problem. It is also potentially useful in practice where the number of common

trends is unknown and requires estimation, though we do not focus on this particular problem

here. For an asymptotic treatment of such problems, see Phillips’ (1991, 1995) triangular

representation. Also, our derivations do not deal with reduced-rank regression estimators

which were shown by Phillips (1994) not to have finite moments for ∞.

5

An approach that is related to ours can be found in Chan and Wei (1988) or Tsay and Tiao

(1990), though their applications are different from ours, and they do not express explicitly

the possibility of a general number of common trends for (1) variates. For the latter result,

we now require a more explicit specification of the Jordan decomposition of . In this Section,

matrix has − nonzero eigenvalues which are inside a unit circle centred around−1. Theselead to − asymptotically stationary combinations. The aim of the following manipulations

is to show that because of the asymptotic independence of the normalized blocks of − ,

the asymptotic bias of in the case of cointegration can be represented by combining the

results of Theorem 1 with the traditional approach for stable series.

Matrix is of rank − with zero eigenvalues. Accordingly, it is derogatory with

linear elementary divisors corresponding to these eigenvalues, and the resulting hypercom-

panion matrices are null scalars. [In contrast, the matrix of (2) is non-derogatory and has a

quadratic elementary divisor with a 2-dimensional hypercompanion matrix]. When applying

this information to Jordan’s decomposition theorem, we get

= Λ ≡ diag (0Λ2)(10)

where is the reciprocal of the modal matrix, 0 is a square null matrix of order and Λ2 is

a square bidiagonal matrix of order − with the stable roots of system (1) on its diagonal.

For ease of exposition, we have let the zero eigenvalues of occur at the beginning of Λ.

If desired, this expository assumption can be relaxed by means of a permutation matrix.

Defining ≡ and ≡ , we can rewrite (1) as

∇ = Λ−1 + ∼ IN (0 Ω 0) (11)

where the series need not be real but is assumed so here to simplify the discussion(otherwise, conjugate transposes will be needed instead of simple transposes). By partitioning

≡ [ 01 0

2]0 and ≡ [01 02]

0 conformably with Λ into and − vectors each, we

can see that the common trends are represented by the system of equations in 1 which is

∇1 = 1(12)

The stable combinations and series are represented by the system for 2

∇2 = Λ22−1 + 2(13)

where the eigenvalues on the leading diagonal of Λ2 are inside a unit circle centred around

−1. By introducing the ( −)× exclusion matrix Ξ ≡h0 −

i, the transformation of

into the stable 2 is achieved by 2 = Ξ = Ξ. In other words, the rows of Ψ ≡ Ξ

(i.e. the last − rows of ) may be interpreted as cointegrating vectors for . We then

have the following result.

Theorem 2 For ≡ − of rank −, with − eigenvalues satisfying |1 + | 1where = + 1 , the bias of the estimators in Section 1 is

= − 1−1diag

³178143ΨΩ

0Ψ0 (ΨΩΨ0)−1 + (tr ()−) −´ +

³1(1 + )

´

Proof. See Appendix B.

Though their leading terms are different in the two cases, biases are of (−1) for au-toregressive characteristic roots (i.e. eigenvalues of ≡ + ) on or inside the unit circle.

So, any combination of them is of (−1) as well. Roots outside the unit circle are, however,ruled out because − is Cauchy-like and its moments do not exist; e.g. see Evans and

Savin (1981, p.761) for a univariate example.

6

One of the main components increasing the absolute value of the bias is the trace of

≡ + , the matrix of autoregressive parameters. This is particularly true for large

where biases are roughly of (−1tr ()). The implication is that, of two VARs with the samedimensions but with the first one incorporating more stable (e.g. cointegrating) relations, the

estimators’ biases for the first system will be overall lower. The difference can be substantial

for large systems. Stable roots have a dampening influence on the bias matrix to an extent

that depends on the cointegrating vectors Ψ ≡ Ξ in . In the extreme case, biases tends

to zero as → 0. But if some cointegrating relations lead to characteristic roots of opposite

signs such that tr()→ 0 (as in seasonal/cyclical cointegration for instance), then the biases

in the system can approach zero to (−1) in spite of 6= 0.In the statement of our Theorem, we have given the bias to [−1 (1 + )] since terms of

[−1 (1 + ) (1 + )] are smaller. For the full expansion to (−1), see (B3)-(B4). The

statement of Theorem 2 summarizes the general effect of the dimensions and characteristic

roots of on the asymptotic bias of . For more specific details of individual cases, one

may manipulate the bias expressions further. For example, in the special case of being a

symmetric matrix, becomes an orthogonal matrix (−1 = 0) and simplifications may besought.

This Theorem encompasses the case of singular Ω, such as arising when using a companion

form for a VAR(); see the introductory discussion of Section 1. If this were the case, then

a reflexive generalized inverse is needed instead of the inverse (ΨΩΨ0)−1 that is stated in theTheorem.

4 ECONOMETRIC MODELLING USING VAR PROCESSES

We have shown that in a purely nonstationary VAR, the biases of the MLE and LSE are

proportional to the dimension of the system, even when the regressors are generated inde-

pendently of each other. When some stable linear combinations exist, as when the variables

are cointegrated, these biases are in general asymptotically proportional to the sum of the

characteristic roots of the VAR. Adding irrelevant variables to a VAR was thus shown to have

more serious negative consequences in integrated time series than in classical ergodic or cross

section analyses. On the other hand, incorporating stable relations in the VAR may have

beneficial implications on the asymptotic biases of the parameters in general, especially if the

associated characteristic root is of the opposite sign compared to the existing roots.

One of the implications of Theorem 1 is to encourage parsimonious modelling, thus comple-

menting the penalty applied by Information Criteria on unnecessarily large models, though our

reasoning (biases) and setting (nonstationary time series) are different. The usual asymptotic

Information Criteria approaches do not account for the finite-sample bias problems caused by

adding economically-irrelevant nonstationary series to the model. We have also pointed out

that such practice induces an unusual multicollinearity, because of the spurious correlation

problem, and this raises the variance (hence MSE) as well. This is in addition to the usual

concerns about loss of degrees of freedom. We have to stress that we do not advocate small

models over large ones in all cases. When economic theory provides stable combinations of

unstable variates, then these should be modelled because they have a dampening influence on

biases as shown in Section 3. What we warn against is, for example, multi-equation modelling

of the components of national investment spending given that economic theory has not yet

provided a satisfactory behavioural explanation of the disaggregated components.

A main component of the Hendry-Sargan methodology [e.g. Hendry (1995)] is the mar-

ginalization of a model with respect to the irrelevant variates, having started with an ade-

quately large model. This is known in econometrics as general-to-specific modelling. Here too,

our results have some implications. Though we show that substantial biases can arise from

starting with a model that is initially unknowingly too large, the marginalization step must

7

be undertaken. Our results do not imply support for specific-to-general modelling because of

omitted-variables biases and (more fundamentally) the inability to recover a joint or a condi-

tional density from a marginal one. However, exact inference (which takes account inter alia

of finite-sample location distortions caused by biases) must be used whenever possible in order

not to throw away important variables, inadvertently, when marginalizing. Unfortunately, it

is common practice in the VAR literature that only asymptotic inference is undertaken when

finite samples are used. Some dramatic consequences (sizes of 20-60% instead of 5%) using

real-life data are illustrated in Nielsen (1994). Remedies to this problem are being provided

in Jacobson and Larsson (1996), Nielsen (1997) and Pere (1997).

Our results are also applicable to roots of on the unit circle but not necessarily equal to

+1, though such series fall outside the realm of the definition of integration. Such roots can

arise in seasonal times series, and our work [namely Theorem 2 which implies that biases are

roughly of (−1tr ())] suggests that it is possible for estimator biases to be lower whenseasonally unadjusted data is used than either when adjustments are made prior to estimation

or when non-seasonal (e.g. annual) data is used. This approximation follows from the implied

roots of which is, here, the companion matrix of the seasonal system. The result provides

theoretical support for the approaches of Harvey and Scott (1994) and Franses (1995), and it

has been refined by Pitarakis (1997) who derives the exact univariate bias in a seasonal AR.

Another separate result we could show by the method of Theorem 2 is that"1

2

#

=

"1 0

0 −1# "

1

2

#−1+ (14)

does not lead to spurious correlation problems, to the order stated in the Theorem, even

though both series have a unit root. This contrasts with the spurious relation of two or more

(1) series discussed earlier. The reason for this difference becomes more evident once one

pictures both types of series in (14), and sees how different the characteristics of their time

paths are.

As a by-product of our analysis in Section 3, we have obtained another new representation

to decompose series into their stable (asymptotically stationary) and unstable (unit root)

components. It is based on using the ECM format in order to get a matrix of reduced rank.

The novelty of our approach is then to exploit the rank restriction on the matrix to make

it amenable to a more explicit specification of the structure of its Jordan’s decomposition,

when the series are restricted to being integrated of at most a certain order. This allows us to

derive our theoretical distributional results of Theorem 2, and could be used in solving other

problems as well.

Department of Mathematics and Department of Economics, University of York, Hesling-

ton, York YO1 5DD, UK;

School of Business and Economics, University of Exeter, Streatham Court, Rennes Drive,

Exeter EX4 4PU, UK; and

School of Business and Economics, University of Exeter, Streatham Court, Rennes Drive,

Exeter EX4 4PU, UK.

8

APPENDIX A

Proof of Theorem 1. The proof is divided into three parts. We start by deriving a suf-

ficient condition for the existence of the bias. We then show that Eh1

(1)i' E [11], in the

notation of (5)-(6). Finally, we show that the inner product E [p02q(2)] is made up of −1 identi-cal elements, each of them approximated by the univariate bias E [11]. The formula statedin the theorem then follows from the univariate bias E [11] ' −178143 exp [−26 ] obtained from Abadir (1993), Vinod and Shenton (1996) or Nielsen (1997). In the proof, we

occasionally employ group-theoretic ideas without using its jargon.

Part 1. Define ≡ − 1,

⎡⎢⎢⎣1...

⎤⎥⎥⎦ ≡h1 2 3

i≡h

1 2... 3

i≡h1... 2

i

⎡⎢⎢⎢⎢⎢⎣1 0 · · · 01 1

. . ....

....... . . 0

1 1 · · · 1

⎤⎥⎥⎥⎥⎥⎦ ≡ ≡

⎡⎢⎢⎣1...

⎤⎥⎥⎦ ≡h1 2

i 1 ≡

⎡⎢⎢⎣1...

1

⎤⎥⎥⎦

where = 1 2 , unless stated otherwise. Then, we have ≡ 0 ≡ 00. It followsfrom the derivations immediately preceding Theorem 1 that E [−1] = E [ 0−1], so thatone may use

≡ 1

2( + 0)

=1

2

⎛⎜⎜⎝⎡⎢⎢⎣1...

⎤⎥⎥⎦⎡⎢⎢⎣1...

⎤⎥⎥⎦0

+

⎡⎢⎢⎣1...

⎤⎥⎥⎦⎡⎢⎢⎣1...

⎤⎥⎥⎦0⎞⎟⎟⎠+ 120 (110 − )

instead of in this proof. (We have made use of the earlier assumption that 0 = 0.)

Furthermore, since is independent of the past (i.e. of matrix ) and has E [ ] = 0, weget E [−1] = E [1

20 (110 − ) (

00)−1]. A sufficient condition for this expectation toexist is that E [(00)−1] be finite.Let us exploit the symmetry of the structure again. All the variables 1 are gen-

erated from 1 by the same matrix, so that we end up with the symmetric 0. It

can be decomposed as

(A1) 0 ≡ 0Λ

where is a × orthogonal matrix and Λ is diagonal. Consider the transformation

replacing by 0 (inverse mapping) denoted by ↔ 0

. Because spherical densities

are invariant to rotations (they depend on only through 0), and the Jacobian of thetransformation is |det ()| = 1, we have E [(00)−1] = E [(0Λ)

−1]. By a bounding

argument on Λ generalizing the one in Evans and Savin (1981, pp.767-768), the Inverted

Wishart [e.g. Muirhead (1982, p.97)] provides the required sufficient condition + 1 for

the existence of the bias in the Gaussian case.

Part 2. By the formula for the partitioned inverse,

(A2) Eh1

(1)i≡ E

"1

1 − q02−14 q2

#=1

2E"

01 (110 − ) 1

0101 − 0102 (02

02)−1

02

01

#

9

=1

2tr

Ã(110 − ) E

"1

01

0101 − 0102 (02

02)−1

02

01

#!

≡ 1

2tr ((110 − ) E [ ])

≡ sum of the off-diagonal elements of1

2E [ ]

First, decompose 0 as in (A1), and transform ↔ 0. Then,

(A3) E [ ] = 0E"

101

01Λ1 − 01Λ2 (02Λ2)

−102Λ1

#

=∞X=0

0E⎡⎣ 1

01

01Λ1

Ã01Λ2 (

02Λ2)

−102Λ1

01Λ1

!⎤⎦

≡∞X=0

0

qΛ−1 E

hΠ1 (Π2Π1)

iq

Λ−1

where we have the projectors (idempotents)

Π1 ≡√Λ1

01

√Λ

01Λ1

Π2 ≡qΛ2 (

02Λ2)

−102

The new variates are still independent spherical variates, and we now exploit this fact.

Second, the projector Π2 can be expressed as a power series in√Λ2

02

√Λ; e.g. see Rao

and Mitra (1971, p.62). The matrix Π2 has off-diagonal elements that are odd functions of the

independent elements of the standard Normal 2. Similarly, Π1 has off-diagonal elements that

are odd functions of the independent elements of the standard Normal 1, and the expectation

of the product Π1 (Π2Π1)is a diagonal matrix. By comparing the expectation of the expanded

and unexpanded forms in (A3), we get that the off-diagonal elements of

101

01Λ1 − 01Λ2 (02Λ2)

−102Λ1

must be odd functions of at least one of the elements of . For calculating the expectation, this

means that there exists a transformation carrying the idempotent√Λ2 (

02Λ2)

−102

√Λ

into a diagonal matrix Λ2 which is a permutation of the diagonal elements of diag(−10).Transforming 1 ↔ 1 back, then using

011 = (

2) and ≡ √Λ (with 0 = 0),

(A4) E [ ] = E⎡⎣ 1

01

011³1− 010Λ21011

´⎤⎦ ' E "101

011

#

where the omitted term is of maximal order (−1). Substituting (A4) into (A2), E

h1

(1)i'

E [11]. Note that Normality has not been assumed in this part of the proof.Part 3. All −1 terms of the inner product E [p02q(2)] are identical, since each of the variates

is generated by independent standard random walks. Let us take the first (and typical) term

of E [p02q(2)]. By the same method of part 2 of this proof, we may write it as

Ehp02diag (10)q

(2)i≡ E

"−p

02diag (10)

−14 q2

1 − q02−14 q2

#

= −12E"01 (11

0 − )2diag (10) (02

02)−1

02

010101 − 0102 (0

202)

−102

01

#

10

= −12E"01 (11

0 − )02diag (10) (

02Λ2)

−102Λ1

01Λ1 − 01Λ2 (02Λ2)

−102Λ1

#

≡ sum of the off-diagonal elements of1

2Ehi

by the same decomposition (A1) and transformation ↔ 0 as before. Using

(02Λ2)

−1 ≡ 1

02√Λ ( −Π3)

√Λ2

"1 −02Λ3 (

03Λ3)

−1

− (03Λ3)

−103Λ2

#

where

≡ 02qΛ ( −Π3)

qΛ2 (

03Λ3)

−1+ (0

3Λ3)−1

03Λ2

02Λ3 (

03Λ3)

−1

Π3 ≡qΛ3 (

03Λ3)

−103

we have

Ehi= −0

E⎡⎣ 2

02

√Λ ( −Π3)

√Λ1

01³

01Λ1 − 01Λ2 (02Λ2)

−102Λ1

´02√Λ ( −Π3)

√Λ2

⎤⎦

= −0E⎡⎢⎣ 2

02

√Λ ( −Π3)

√Λ1

01

01√Λ ( −Π3)

√Λ1

02

√Λ ( −Π3)

√Λ2 −

³01√Λ ( −Π3)

√Λ2

´2⎤⎥⎦

By the same type of approximation as before,

Ehi' −0

E"

202Λ1

01

01Λ102Λ2 − 01Λ2

02Λ1

#

which is a reduction of the problem to = 2. We therefore need to show that

(A5) E" −22det ()

#' E

"14

det ()

#

in the remainder of the proof, with det () ≡ 14 − 22. Because of the interaction of the

numerator with the denominator, it is easiest to resort to an indirect proof of the relation,

rather than a direct derivation of the expectation. For this, we use Theorem 2.1 of Abadir

and Larsson (1996) where the joint mgf (hence moments) of the quadratic forms and (in

our notation) have been worked out.

Let ( ) be the matrices of mgf parameters corresponding to our quadratic forms

(P

0−1

P−1 0−1), and the mgf be

( ) ≡ Ehexp

³X 0−1 +

X 0−1 −1

´i≡ E

hetr

³X

0−1 +

X−1 0−1

´i

MatrixP−1 0−1 is symmetric and has only

12 ( + 1) distinct elements, so that is also

symmetric with typical diagonal element and off-diagonal element12 (from the quadratic

formP 0−1 −1). For , the typical element is , and since any matrix can be decomposed

into symmetric (Jordan) and skew-symmetric (Lie) components,

≡ + ≡ 12( + 0) +

1

2( − 0)

The latter component has no effect on the required bias (see the discussion of in Part 1 of

this proof), and we set it to zero in the mgf for the purpose of deriving the bias. Then, the

11

joint mgf of Abadir and Larsson (1996, pp.685-686) simplifies to

(− ) ≡ (− )|=0

= det³´ 1−

2det

⎛⎝h 0i " 2 ³ −1 +

´−

0

#−1 "

#⎞⎠−12

where ≡ + . Since the larger (2-square) matrix is a function of a single sub-matrix

−1 which naturally commutes with it own polynomials, we can write

(− ) = det ( )1−2 det

⎛⎜⎝h 0i⎡⎣ 2 ³ + −1

´− −1

−1 0

⎤⎦−1 "

#⎞⎟⎠− 12

= det ( )1−2 det (Λ2 − Λ1)

12 ×

det

Ãh 0

i " −Λ1 Λ2− −1 −1

# "Λ−11 0

0 Λ−12

# " −Λ2 −1 −Λ1 −1

# "

#!−12

= det ( )1−2 det (Λ2 − Λ1)

12 det

³−Λ

1 ( − Λ2 ) + Λ2 ( − Λ1 )

´− 12

where Λ12 ≡ +−1∓

q + 2 −1 by a Jordan block-decomposition of the larger matrix,

and the problem of Abadir and Rockinger (1997) does not arise here.

Having simplified the mgf to the current setting, let us use (A5) to set = 2 and formulate

the mgf in terms of scalar quantities. It is clear that the same operation on the mgf is required

to obtain 1det () on both sides of (A5), it is sufficient to show that

(A50) −

2

"

2 (− )

¯¯=0

#'

4

"

1 (− )

¯¯=0

#

where

=

"1

122

122 4

#

=

"1 22 0

#

and 2 ≡ 12(12 + 21) corresponds to the off-diagonal element of . To approximate the bias

to fixed precision for ∞, expand () for both and in the same neighbourhood of

zero, say 1 . Then,

Λ12 = 2 + (2 + )−1 ∓

q2 + 2 (2 + ) −1 ' 2 + (2 + )

−1 ∓√2 −1

and Λ12 ' 2 ∓√2 , giving

(A6) (− ) = det ( )1−2 det (Λ2 − Λ1)

12 det

³−Λ

1 (2 − Λ2 ) + Λ2 (2 − Λ1 )

´− 12

' det ( )1−2 det

³8 −1

´ 14

det

µ³2 + (2 + )

−1 −√2 −1

´ √2 +

³2 + (2 + )

−1 +√2 −1

´ √2

¶− 12

= 2det ( )2 det

µ³³2 + (2 + )

−1 −√2 −1

´det

´

12

+³³2 + (2 + )

−1 +√2 −1

´det

´¶−12

' 2 det

µexp

2( − 2)

¸¶det

³exp

h³³2 + (2 + )

−1 −√2 −1

´det − 2

´i+exp

h³³2 + (2 + )

−1 +√2 −1

´det − 2

´i´− 12

= exp

∙−2tr³

−1´det ¸exp

2

³tr − tr

³2 + −1

´det

´¸det

³cosh

³ (det )

√2 −1

´´− 12

= exp

∙−2(14 − 22)

¸exp [− det ] det

³cosh

³ (det )

√2 −1

´´− 12

≡ (− )

Differentiating as in (A50),

2 (− )

¯¯=0

= −22

(0− )

and

1 (− )

¯¯=0

= −42

(0− )

Then, differentiating with respect to 2 and 4, respectively,

2

"

2 (− )

¯¯=0

#= −

2 (0− )−

22

(0− )2

and

4

"

1 (− )

¯¯=0

#= −

2 (0− )−

24

(0− )4

The dominant term is the first one, in both cases, since is in the neighbourhood of zero and

by the definition of (0− ) in (A6). (Note that

(0− ) is made of det() terms whichare quadratic forms in the elements of , and cosh() is an even function.) This establishes

(A50).Remark 1. Let 2 ≡ q02−14 q21 be the squared multiple correlation coefficient (with

zero means imposed) of 1−1 with the remaining components of −1. Statistically, themeaning of (A4),

Eh1

(1)i≡ E

"1

1 − q02−14 q2

#≡ E

"1

1 (1−2)

#' E

"1

1

#

is that realisations in the upper tail (11 ' 0) of the negatively-skewed unit root distributioncoincide with high spurious correlation, and realisations in the lower tail (‘stationary’ side)

coincide with 2 ' 0.Remark 2. 2 is not small. The approximation in (A4) was the outcome of a change

of variable allowed by the interaction of the numerator (1) with the expanded denominator

(2) in (A3). After this change of variable Π2 ↔ Λ2 in (A4), the transformed 2 may then

be viewed as the squared correlation between a unit-root series and − 1 independent stableones.

Remark 3. In the multivariate case, the first (and typical diagonal) component of the

estimators’ MSE is approximately

E"

1

1 (1−2)

#≡ E

"1

1

#+

∞X=1

E"1

1tr³(Π1Π2)

´#

13

= E"1

1

#+

∞X=1

E⎡⎣ 11

⎛⎝Ã − 1

!⎞⎠⎤⎦ =

Ã × E

"1

1

#!

where E [2] of Muirhead (1982, pp.146,169-171) has been used as an approximation for thediagonal components of (Π1Π2)

, in the light of Remark 2. Unlike the expression for the bias,

this formula does not calculate the MSE: it provides only an order of magnitude.

14

APPENDIX B

Proof of Theorem 2. From (3),

− =³X

0−1´ ³X

−1 0−1´−1

= −1³X

0−1´ ³X

−1 0−1´−1

Introducing the (block-)diagonal weighting matrix Γ for ,

Γ ≡ diag³1

1√−

´

we can write

(B1) − = −1³X

0−1Γ

´ ³XΓ−1 0

−1Γ´−1

Γ

The matrixP

Γ−1 0−1Γ is asymptotically block-diagonal. Furthermore, the elements of the

off-diagonal submatrices are ³−

12

´. So, inverting the matrix in (B1) gives

(B2) −

= −1³X

0−1Γ

´diag

∙³1 2

X1−1 0

1−1´−1

³1

X2−1 0

2−1´−1¸

Γ + ³1

´= −1

³X

0−1Γ

´diag

∙1

³12

X1−1 0

1−1´−1

1√

³1

X2−1 0

2−1´−1¸

+ ³1

´= −1diag

∙1

³1

X1

01−1

´ ³12

X1−1 0

1−1´−1

1√

³1√

X2

02−1

´ ³1

X2−1 0

2−1´−1¸

+ ³1

´= −1diag

∙³X1

01−1

´ ³X1−1 0

1−1´−1

³X

202−1

´ ³X2−1 0

2−1´−1¸

+ ³1

´

To find the bias, we apply the expectation operator to (B2). Our Theorem 1 can be applied

to the first block. For the second block, Tjøstheim and Paulsen (1983) and Nicholls and Pope

(1988) have derived the relevant expansions. Theorem 2 of Nicholls and Pope (1988) can be

used after a simple modification. Their term in ≡ var drops out because we estimate

no intercepts in (1) and (3). This makes the bias an odd function of , as is discussed in the

univariate case by Abadir (1993) and the multivariate moment generating function by Abadir

and Larsson (1996). We then get

(B3) =1−1diag (−178143ΨΩΨ

0 0) + ³1

´where

≡ (E [2 02])

−1⎛⎝³− + Λ−12

´(2− + Λ2)

−1+

X=+1

(1 + ) [(1 + )Λ2 + −]−1⎞⎠

Since |1 + | 1, the defining bilinear forms for E [2 02] are expanded as

(B4) E [2 02] ≡ ΨΩΨ0 + (− + Λ2) E [2 0

2] (− + Λ02) = ΨΩΨ0 + (1 + )

We expand accordingly as

(B5) = − (ΨΩΨ0)−1 [(− + Λ2) + tr (− + Λ2) −] + (1 + )

= − (ΨΩΨ0)−1 [(− + Λ2) + (tr ()−) −] + (1 + )

15

since tr(Λ2) = tr() = tr()−. Substituting (B5) into (B3) gives the required result because

ΨΩΨ0 (− + Λ02) (ΨΩΨ0)−1 = − +ΨΩ (Λ2Ψ)

0(ΨΩΨ0)−1

= − +ΨΩ (Λ2Ξ )0(ΨΩΨ0)−1 = − +ΨΩ (ΞΛ )

0(ΨΩΨ0)−1

= − +ΨΩ (Ξ)0(ΨΩΨ0)−1 = − +ΨΩ (Ψ)

0(ΨΩΨ0)−1

= ΨΩ0Ψ0 (ΨΩΨ0)−1

by Ψ ≡ Ξ and (10).

REFERENCES

Abadir, K.M. (1993) OLS bias in a nonstationary autoregression. Econometric Theory 9,

81-93.

(1995) On efficient simulations in dynamic models. University of Exeter

Discussion Paper in Economics, 95/21.

Abadir, K.M. and K. Hadri (1995) Bias nonmonotonicity in stochastic difference equations.

University of Exeter Discussion Paper in Economics, 95/12.

Abadir, K.M. and R. Larsson (1996) The joint moment generating function of quadratic

forms in multivariate autoregressive series. Econometric Theory 12, 682-704.

Abadir, K.M. and Rockinger, M. (1997) The “devil’s horns” problem of inverting confluent

characteristic functions, Econometrica 65, 1221-1225.

Chan, N. and C.Z. Wei (1988) Limiting distributions of least squares estimates of unstable

autoregressive processes. Annals of Statistics 16, 367-401.

Evans, G.B.A. and N.E. Savin (1981) Testing for unit roots: 1. Econometrica 49, 753-779.

Franses, P.H. (1996) Recent advances in modelling seasonality. Journal of Economic

Surveys 10, 299-345.

Granger, C.W.J. and P. Newbold (1974) Spurious regressions in econometrics. Journal of

Econometrics 2, 111-120.

Harvey, A.C. and A. Scott (1994) Seasonality in dynamic regression models. Economic

Journal 104, 1324-1345.

Hendry, D.F. (1995) Dynamic Econometrics. Oxford: Oxford University Press.

Jacobson, T. and R. Larsson (1996) Bartlett correction for a likelihood ratio cointegration

test. Mimeo., Department of Economic Statistics, Stockholm School of Economics.

Muirhead, R.J. (1982) Aspects of multivariate statistical theory. New York: John Wiley

& sons.

Nicholls, D.F. and A.L. Pope (1988) Bias in the estimation of multivariate autoregressions.

Australian Journal of Statistics 30A, 296-309.

Nielsen, B. (1994) Bartlett correction in the cointegration model. Mimeo., Institute of

Mathematical Statistics, University of Copenhagen.

Nielsen, B. (1997) On the distribution of cointegration tests. Ph.D. Thesis, Institute of

Mathematical Statistics, University of Copenhagen.

Pere, P. (1997) Adjusted profile likelihood applied to estimation and testing of unit roots.

D.Phil. Thesis, University of Oxford.

Phillips, P.C.B. (1986) Understanding spurious regressions in econometrics. Journal of

Econometrics 33, 311-340.

(1987) Asymptotic expansions in nonstationary vector autoregressions. Econo-

metric Theory 3, 45-68.

(1991) Optimal inference in cointegrated systems. Econometrica 59, 283-

306.

16

(1994) Some exact distribution theory for maximum likelihood estimators

of cointegrating coefficients in error correction models. Econometrica 62, 73-93.

(1995) Fully modified least squares and vector autoregression. Economet-

rica 63, 1023-1078.

Pitarakis, J.-Y. (1997) Moment generating function and further exact results for autore-

gressions with multiple frequency unit roots. Econometric Theory, forthcoming.

Rao, C.R. and S.K. Mitra (1971) Generalized inverse of matrices and its applications. New

York: John Wiley & sons.

Stock, J.H. and M.W. Watson (1988) Testing for common trends. Journal of the American

Statistical Association 83, 1097-1107.

Tjøstheim, D. and J. Paulsen (1983) Bias of some commonly-used time series estimates.

Biometrika 70, 389-399 [Correction (1984) 71, 656].

Tsay, R.S. and G.C. Tiao (1990) Asymptotic properties of multivariate nonstationary

processes with applications to autoregressions. Annals of Statistics 18, 220-250.

Vinod, H.D. and L.R. Shenton (1996) Exact moments for autoregressive and random walk

models for a zero or stationary initial value. Econometric Theory 12, 481-499.

17

TABLE I

Simulated values of −100×0 with 100,000 replications

(formula of Theorem 1 in parentheses)

= 1 = 2 = 3 = 4 = 5

= 2564

(64)

134

(128)

201

(193)

261

(257)

321

(318)

= 5034

(34)

71

(68)

108

(101)

143

(135)

177

(169)

= 10018

(17)

37

(35)

56

(52)

74

(69)

93

(87)

= 20009

(09)

19

(18)

29

(26)

38

(35)

48

(44)

18


Recommended