Date post: | 20-Nov-2023 |
Category: |
Documents |
Upload: | independent |
View: | 0 times |
Download: | 0 times |
Electronic copy available at: http://ssrn.com/abstract=1985485
The influence of VAR dimensions on estimator biases1
By Karim M. Abadir, Kaddour Hadri, and Elias Tzavalis
1 INTRODUCTION
Vector AutoRegressions (VARs) have now become the most popular tool of Time Series
analysis amongst econometricians. Unfortunately, little is known about the analytic finite-
sample properties of parameter estimators for such systems. The asymptotic analysis of
VARs published to date does not address questions regarding the influence of the number
and nature of the system’s variates on parameter estimates. Clearly, both questions will have
repercussions on the way VARs are used, and we intend to address them here.
We consider the implications of varying the dimensions of VARs on the biases of Maximum
Likelihood and Least Squares Estimators (MLE and LSE, respectively). In the purely nonsta-
tionary case (-dimensional random walk), estimator biases are approximately equal to the
dimension of the system () times the univariate bias, even when the variates are generated
independently of each other. We show that the variance too increases with the dimension of
the system, hence also raising the Mean Squared Error (MSE) of the estimator. When some
stable linear combinations exist, the biases are generally smaller and are asymptotically pro-
portional to the sum of the characteristic roots of the VAR. One source of such combinations
is meaningful economic relations that are represented by the cointegration of some of the
components of the VAR. Adding economically-irrelevant variables to a VAR is thus shown to
have more serious negative consequences in integrated time series than in classical ergodic or
cross section analyses. The findings strengthen the case for parsimonious modelling and for
the reduction step of the general-to-specific marginalization method. They also support the
use of seasonally unadjusted data whenever possible.
Let be a × 1 discrete time series which is sampled over = 1 , and which
follows the first order VAR [a VAR(1)]
∇ = −1 + ≡ (− )−1 + ∼ IN (0Ω) (1)
where is the identity matrix of order , ∇ is the backward difference operator such that
∇ ≡ −−1, and we condition on 0. A first order autoregressive system is chosen for
ease of exposition, though any VAR() can be reformulated as a VAR(1) of dimensions aug-
mented to where becomes the companion matrix of the VAR() and Ω becomes singular.
In such a case, the derivations that follow give the (dominant) leading-term approximation
for the biases.
In (1), we let Ω be positive definite, and ≡ − be a × matrix with zero
eigenvalues such that 1 ≤ ≤ and the remaining − eigenvalues are inside a unit
circle centred around −1. Though (1) could allow vectors that are integrated of order
[denoted ∼ ()] with individual components that are at most (), we restrict
further to have rank − so that ∼ (1). This rules out cases like
∇"1
2
#
=
"0 1
0 0
# "1
2
#−1+ (2)
where the rank of the nilpotent matrix is 1 (in spite of − = 0), with the effect that
2 ∼ (1) and 1 ∼ (2). Generally, economic series seem to be at most (1), so we focus
our analysis on this case only.
1We are very grateful to Giovanni Forchini and Patrick Marsh for their extensive comments. We would
also like to thank Les Godfrey, David Hendry, Grant Hillier, Bent Nielsen, Paolo Paruolo, Peter Robinson and
Tony Sudbery for useful comments on an earlier version of this paper. This research was supported by ESRC
(UK) grant R000236627.
1
Electronic copy available at: http://ssrn.com/abstract=1985485
The MLE of is
=³X∇
0−1´ ³X
−1 0−1´−1
= +³X
0−1´ ³X
−1 0−1´−1
(3)
where all summations run from = 1 , unless stated otherwise. Given the independent
Normality of , the LSE of coincides with the MLE here. The bias of is defined as
≡ E [ − ] = E [ − ], where E [] is the expected value operator. In the univariatecase, becomes a scalar and we write for it. We have chosen to formulate the system
in terms of rather than in order to facilitate the decompositions of Section 3 and the
interpretation of the results in terms of co-integration. This formulation is known as the Error
Correction Mechanism (ECM) in econometrics, and as the proportional component of control
systems in mathematics.
The purpose of our paper is to examine the effect that and have on the bias of .
We do so for two cases. First, in Section 2, we let = 0 (i.e. = ), and call this the
purely nonstationary case. It corresponds to a VAR that is purely made up of a -dimensional
random walk. We contrast the effect of on the bias of estimators here and when variables
are either stationary or have no time connotation. Then in Section 3, we introduce a new
decomposition of VARs into stable and unstable components. We let some of the eigenvalues
of be nonzero (i.e. 1 ≤ ) in order to obtain cointegration in the system, and consider
the behaviour of the asymptotic bias in terms of and . We use the term ‘asymptotic bias’
to signify the leading (dominant) term in the large- expansion of the bias. We compare
the outcome with the two extremes of purely nonstationary and stationary VARs. Section 4
rounds off the discussion by considering the implications of the theoretical results for practical
econometric modelling. The proofs of Theorems 1 and 2 are given in Appendices A and B,
respectively.
2 THE BIAS IN THE PURELY NONSTATIONARY VAR
We start by deriving the bias of when = 0 and 0 = 0 in the VAR of (1). The result
has two main features. One is quantitative: using the moment generating function (mgf)
of Abadir and Larsson (1996), we get a formula for the bias under Normality. The other is
qualitative: the multivariate bias 0 is exactly equal to a scalar matrix (i.e. it is a scalar
times the identity matrix), with the scalar being approximately times the univariate bias
formula. In other words, 0 ' 0 . Much of the proof relies on arguments that only
require the density of to be symmetric about the origin, and not necessarily Gaussian.We conjecture that the qualitative feature extends to most of these densities as well, but only
prove the extension for the cases of elliptical densities which already represent a very wide
class. We start by proving the result for Gaussian residuals, then discuss some invariance
features of our results.
The following analysis shows that 0 is exactly a scalar matrix, and is of intrinsic
interest since it provides an intuitive explanation of why the qualitative feature arises. For
a function () of the components of , we have E [ ()] = 12E [ () + (−)], by
symmetry of the density of . To prove that a finite E [ ()] = 0, as in the case of
the off-diagonal elements of 0 , we only need to show that (−) = − (), i.e. that () is an odd function of . Formally, by the Cholesky decomposition Ω = 0 where is a lower triangular matrix, define ≡ −1 ∼ IN(0 ) and ≡ −1, and rewrite
− = (P
0−1)(
P−1 0
−1)−1 = (
P
0−1)(
P−1 0−1)
−1−1. Taking expectationsand substituting = 0,
0 = E∙³X
0−1´ ³X
−1 0−1´−1¸
−1(4)
2
Any two sequences and 6= , are standard Normals (hence symmetrically dis-
tributed about 0) and are independent of each other. Replacing by − and keepingthe rest of as before will therefore not affect the expectation in (4). But this leads to thenegative expectation of the off-diagonal elements of the th column and row of , which can
only be true if the expectation is zero. So, off-diagonal elements of 0 vanish.
More explicitly, let = 1 and define ≡ P
0−1, ≡
P−1 0−1,
³X
0−1´ ³X
−1 0−1´−1 ≡ "
1 p02p3 4
# "1 q02q2 4
#−1(5)
≡"1 p02p3 4
# "(1) q(2)
0
q(2) (4)
#=
"1
(1) + p02q(2) 1q
(2)0 + p02(4)
p3(1) + 4q
(2) p3q(2)0 + 4
(4)
#
Also define to be made of −1 and for = 2 . Then, the resulting is equal to with the exception of the first element 1 = −1. By the formula forpartitioned inverses, we get
³X
0−1´ ³X
−1 0−1´−1
=
"1 −p02−p3 4
# "(1) −q(2)0−q(2) (4)
#(6)
=
"1
(1) + p02q(2) −1q(2)0 − p02(4)
−p3(1) − 4q(2) p3q
(2)0 + 4(4)
#
which has the same expectation as (5). But since the corresponding off-diagonal blocks have
opposite signs in (5) and (6), then these have zero expectation. Applying the same logic to
all off-diagonal elements proves that 0 is diagonal. The diagonal elements are then equal
because of the invariance of the normalized (by −1) VAR specification to the reordering ofthe components of . This also shows that matrices drop out of the bias expression in (4),
and the bias is not a function of Ω here.
With these simplifications in mind, we now analyse the case of Normally-distributed .For this case, we determine the scalar of the 0 scalar matrix, and provide sufficient condi-
tions for the existence of the bias.
Theorem 1 For = 0 and + 2, the bias of the estimators in Section 1 is
0 ' 0 = −178143 exph−26 1
i
Proof. See Appendix A.
The accuracy of the Theorem is assessed in Table I. The Theorem shows that the negative
bias is proportional to the dimension of the VAR, even if Ω were diagonal. The result is
striking. It shows that adding irrelevant variables to a VAR increases the bias of all the
estimated parameters on the diagonal of . For example, doubling the size of a purely
nonstationary VAR will double the bias of all the estimates, even if the original variables
and equations are unrelated to the added ones. This is in sharp contrast with the theory for
stationary and spatial data where adding irrelevant variables or equations reduces efficiency
but does not affect biases. We now provide the asymptotic interpretation of this result.
Let () be the standard -dimensional Wiener process on [0 1], Ω = 0, and → denote
convergence in distribution. In the purely nonstationary VAR, the normalized sums
1
X−10
→
Z 1
0 () d ()
00(7)
1 2
X−1 0
−1→
Z 1
0 () ()
0d0
3
are random in the limit, and are nondegenerate even when the variates are unrelated. The
bias can then be written as
0 = 1Eh
i= 1
E∙³
1
X−10
´0 ³1 2
X−1 0
−1´−1¸
(8)
= 1 E
"µZ 1
0 () d ()
0¶0 µZ 1
0 () ()
0d
¶−1#−1 +
³1
´
First, consider the limiting distribution of −2P−1 0
−1. The off-diagonal elements ofR 10 () ()
0d are nondegenerate (in spite of having zero expectations) and are of similar
probabilistic orders of magnitude as the diagonal ones. Furthermore, by the Cauchy-Schwarz
inequality, this matrix integral is invertible. As a result, all the elements of the inverse matrix
are of the same probabilistic order of magnitude. The matrix is not diagonal in the limit, in
spite of the components of () being mutually independent. This is a manifestation of
the spurious correlation problem explained by Granger and Newbold (1974) and analysed by
Phillips (1986). In our Theorem, this result is extended analytically to finite samples, instead
of asymptotics only.
Second, the matrixR 10 ()d ()
0has no degenerate elements either. Again, the elements
of this random matrix are all of similar order of magnitude, even when Ω = . Because of
the infinite memory of , the limits of the off-diagonal elements of −1P−10 are not zero
(in spite of having zero expectations) and are similar in magnitude to diagonal ones, even in
the case of independently generated processes.
As a result of the two previous observations, the diagonal elements of the product
µZ 1
0 () d ()
0¶0 µZ 1
0 () ()
0d
¶−1(9)
are equally-weighted linear combinations of random elements of similar magnitude. These
combinations are times the orders of magnitude of the corresponding univariate functionals.
Had it been a diagonal matrix −2P−1 0
−1, this -fold increase would not have happened.In the case of a stable (analysed in Theorem 2 below) with a diagonal , the scaled sub-
Hessian −1P−1 0
−1 has a deterministic diagonal limit as increases, and does not causethis phenomenon. The large biases that arise here are mainly induced by the interaction of
the off-diagonal elements of the two matrices in (9).
This is the informal asymptotic explanation of what the Theorem has proved. It may
also be visualized by looking at the last expression of (5) and recalling that the individual
components of the inner products are all of the same order of magnitude. The Theorem
stands in contrast with the corresponding traditional result arising from the estimation of
a set of reduced forms in the case of stationary or spatial models. There, Ω = ensures
the large-sample independence of the biases across equations. Premultiplying the inverse of
the second moments matrix by a diagonal one will only scale its rows, and is approximately
equivalent to separate estimation of the equations.
Incidentally, the near-singularity ofP−1 0
−1, which was implied from the earlier discus-sion, means that one should be careful when numerically inverting it in Monte-Carlo studies.
A higher precision than usual is needed in simulating highly autoregressive models, thus fur-
ther supporting the recommendations of Abadir (1995) which are based on other types of
univariate precision problems. But most importantly, the near-singularity ofP−1 0
−1 isan indication that enlarging a purely nonstationary VAR increases the variance of parameter
estimates unnecessarily. For example compare the univariate 1P21−1 with the multivariate
(P−1 0
−1)−1. The former is typically small while the latter is large because of the spuri-
ous correlation (collinearity) of the different random walks. Furthermore, using partitioned
inverses (see Remarks 1 and 3 in Appendix A), the first (and typical diagonal) component of
4
the MSE in the multivariate case is approximately
E"
1
1 (1−2)
#= E
"1
1
#+ E
"2
1 (1−2)
#
where the first term after the equality represents the univariate MSE, and the second is
positive. The latter term will be substantial for large purely-nonstationary systems, because
of the spurious correlation problem. A similar (but more elaborate) reasoning applies to the
variance also.
We now list some invariance results. First, the Theorem shows that in the purely nonsta-
tionary case, the bias of is not affected by Ω, whatever its structure may be. This is not
hard to see from (1) when is any scalar matrix and one may normalize the process by Ω−12 ,
for any Ω positive definite, without affecting the outcome for any .
Second, the limiting density of does not depend on 0. Accordingly, (−1) terms in
the bias do not depend on 0 which can then have any stationary distribution (including the
possibility of a Dirac density leading to a constant subset of 0) that does not depend on the
parameters of the process in (1). Phillips (1987) uses a different reasoning that will lead to
the same outcome regarding 0. His derivations also mean that the asymptotic implications
of our results do not require Normality of the residuals , and that they are invariant to (−1) to a wide range of strong mixing residuals, though they would then relate to pseudo-(or quasi-) MLEs. We have to warn that the latter invariances are only asymptotic, and that
departures from the conditions for 0 and/or are better tolerated in large samples thanin small ones where substantial unusual effects can arise; see Abadir (1993), Abadir and Hadri
(1995).
Third, an invariance result can also be derived for the first relation of Theorem 1. The case
of following an elliptical distribution encompasses the Normal, contaminated Normal,t-distribution and many others. Again, we assume symmetry of the density about zero.
Additionally, we require the density of to exist, and its dependence on the Mahalanobisdistance of from the origin to be via an invertible function. Then, when it exists, the
bias of the estimators in Section 1 is 0 ' 0 . [The proof follows along the same lines
of derivations preceding Theorem 2.1 of Abadir and Larsson (1996), and the derivations of
Theorem 1 above.] The existence requirement for the bias varies with the particular elliptical
density function that is adopted. Also, the formula for the univariate bias 0 will depend
on the density. An explicit formula for 0 is unavailable in the literature, except for the
Gaussian case of Theorem 1.
3 THE BIAS IN THE CASE OF COINTEGRATION
We now turn to the case where cointegration exists, meaning that there are − linear
combinations of which are (0). The common stochastic trends [e.g. see Stock and
Watson (1988)] and at most − cointegrating relations amongst the (1) components of
will now be represented by the same generating process (1) that was used earlier. The
common trends are represented by unit roots, and the − stable roots of the stochastic
difference equation (1) represent the (0) components of as well as cointegrating relations
between the (1) variates of . This approach has the advantages of leaving open the
possibility of some components of being (0), and of not requiring a different treatment of
the estimation problem. It is also potentially useful in practice where the number of common
trends is unknown and requires estimation, though we do not focus on this particular problem
here. For an asymptotic treatment of such problems, see Phillips’ (1991, 1995) triangular
representation. Also, our derivations do not deal with reduced-rank regression estimators
which were shown by Phillips (1994) not to have finite moments for ∞.
5
An approach that is related to ours can be found in Chan and Wei (1988) or Tsay and Tiao
(1990), though their applications are different from ours, and they do not express explicitly
the possibility of a general number of common trends for (1) variates. For the latter result,
we now require a more explicit specification of the Jordan decomposition of . In this Section,
matrix has − nonzero eigenvalues which are inside a unit circle centred around−1. Theselead to − asymptotically stationary combinations. The aim of the following manipulations
is to show that because of the asymptotic independence of the normalized blocks of − ,
the asymptotic bias of in the case of cointegration can be represented by combining the
results of Theorem 1 with the traditional approach for stable series.
Matrix is of rank − with zero eigenvalues. Accordingly, it is derogatory with
linear elementary divisors corresponding to these eigenvalues, and the resulting hypercom-
panion matrices are null scalars. [In contrast, the matrix of (2) is non-derogatory and has a
quadratic elementary divisor with a 2-dimensional hypercompanion matrix]. When applying
this information to Jordan’s decomposition theorem, we get
= Λ ≡ diag (0Λ2)(10)
where is the reciprocal of the modal matrix, 0 is a square null matrix of order and Λ2 is
a square bidiagonal matrix of order − with the stable roots of system (1) on its diagonal.
For ease of exposition, we have let the zero eigenvalues of occur at the beginning of Λ.
If desired, this expository assumption can be relaxed by means of a permutation matrix.
Defining ≡ and ≡ , we can rewrite (1) as
∇ = Λ−1 + ∼ IN (0 Ω 0) (11)
where the series need not be real but is assumed so here to simplify the discussion(otherwise, conjugate transposes will be needed instead of simple transposes). By partitioning
≡ [ 01 0
2]0 and ≡ [01 02]
0 conformably with Λ into and − vectors each, we
can see that the common trends are represented by the system of equations in 1 which is
∇1 = 1(12)
The stable combinations and series are represented by the system for 2
∇2 = Λ22−1 + 2(13)
where the eigenvalues on the leading diagonal of Λ2 are inside a unit circle centred around
−1. By introducing the ( −)× exclusion matrix Ξ ≡h0 −
i, the transformation of
into the stable 2 is achieved by 2 = Ξ = Ξ. In other words, the rows of Ψ ≡ Ξ
(i.e. the last − rows of ) may be interpreted as cointegrating vectors for . We then
have the following result.
Theorem 2 For ≡ − of rank −, with − eigenvalues satisfying |1 + | 1where = + 1 , the bias of the estimators in Section 1 is
= − 1−1diag
³178143ΨΩ
0Ψ0 (ΨΩΨ0)−1 + (tr ()−) −´ +
³1(1 + )
´
Proof. See Appendix B.
Though their leading terms are different in the two cases, biases are of (−1) for au-toregressive characteristic roots (i.e. eigenvalues of ≡ + ) on or inside the unit circle.
So, any combination of them is of (−1) as well. Roots outside the unit circle are, however,ruled out because − is Cauchy-like and its moments do not exist; e.g. see Evans and
Savin (1981, p.761) for a univariate example.
6
One of the main components increasing the absolute value of the bias is the trace of
≡ + , the matrix of autoregressive parameters. This is particularly true for large
where biases are roughly of (−1tr ()). The implication is that, of two VARs with the samedimensions but with the first one incorporating more stable (e.g. cointegrating) relations, the
estimators’ biases for the first system will be overall lower. The difference can be substantial
for large systems. Stable roots have a dampening influence on the bias matrix to an extent
that depends on the cointegrating vectors Ψ ≡ Ξ in . In the extreme case, biases tends
to zero as → 0. But if some cointegrating relations lead to characteristic roots of opposite
signs such that tr()→ 0 (as in seasonal/cyclical cointegration for instance), then the biases
in the system can approach zero to (−1) in spite of 6= 0.In the statement of our Theorem, we have given the bias to [−1 (1 + )] since terms of
[−1 (1 + ) (1 + )] are smaller. For the full expansion to (−1), see (B3)-(B4). The
statement of Theorem 2 summarizes the general effect of the dimensions and characteristic
roots of on the asymptotic bias of . For more specific details of individual cases, one
may manipulate the bias expressions further. For example, in the special case of being a
symmetric matrix, becomes an orthogonal matrix (−1 = 0) and simplifications may besought.
This Theorem encompasses the case of singular Ω, such as arising when using a companion
form for a VAR(); see the introductory discussion of Section 1. If this were the case, then
a reflexive generalized inverse is needed instead of the inverse (ΨΩΨ0)−1 that is stated in theTheorem.
4 ECONOMETRIC MODELLING USING VAR PROCESSES
We have shown that in a purely nonstationary VAR, the biases of the MLE and LSE are
proportional to the dimension of the system, even when the regressors are generated inde-
pendently of each other. When some stable linear combinations exist, as when the variables
are cointegrated, these biases are in general asymptotically proportional to the sum of the
characteristic roots of the VAR. Adding irrelevant variables to a VAR was thus shown to have
more serious negative consequences in integrated time series than in classical ergodic or cross
section analyses. On the other hand, incorporating stable relations in the VAR may have
beneficial implications on the asymptotic biases of the parameters in general, especially if the
associated characteristic root is of the opposite sign compared to the existing roots.
One of the implications of Theorem 1 is to encourage parsimonious modelling, thus comple-
menting the penalty applied by Information Criteria on unnecessarily large models, though our
reasoning (biases) and setting (nonstationary time series) are different. The usual asymptotic
Information Criteria approaches do not account for the finite-sample bias problems caused by
adding economically-irrelevant nonstationary series to the model. We have also pointed out
that such practice induces an unusual multicollinearity, because of the spurious correlation
problem, and this raises the variance (hence MSE) as well. This is in addition to the usual
concerns about loss of degrees of freedom. We have to stress that we do not advocate small
models over large ones in all cases. When economic theory provides stable combinations of
unstable variates, then these should be modelled because they have a dampening influence on
biases as shown in Section 3. What we warn against is, for example, multi-equation modelling
of the components of national investment spending given that economic theory has not yet
provided a satisfactory behavioural explanation of the disaggregated components.
A main component of the Hendry-Sargan methodology [e.g. Hendry (1995)] is the mar-
ginalization of a model with respect to the irrelevant variates, having started with an ade-
quately large model. This is known in econometrics as general-to-specific modelling. Here too,
our results have some implications. Though we show that substantial biases can arise from
starting with a model that is initially unknowingly too large, the marginalization step must
7
be undertaken. Our results do not imply support for specific-to-general modelling because of
omitted-variables biases and (more fundamentally) the inability to recover a joint or a condi-
tional density from a marginal one. However, exact inference (which takes account inter alia
of finite-sample location distortions caused by biases) must be used whenever possible in order
not to throw away important variables, inadvertently, when marginalizing. Unfortunately, it
is common practice in the VAR literature that only asymptotic inference is undertaken when
finite samples are used. Some dramatic consequences (sizes of 20-60% instead of 5%) using
real-life data are illustrated in Nielsen (1994). Remedies to this problem are being provided
in Jacobson and Larsson (1996), Nielsen (1997) and Pere (1997).
Our results are also applicable to roots of on the unit circle but not necessarily equal to
+1, though such series fall outside the realm of the definition of integration. Such roots can
arise in seasonal times series, and our work [namely Theorem 2 which implies that biases are
roughly of (−1tr ())] suggests that it is possible for estimator biases to be lower whenseasonally unadjusted data is used than either when adjustments are made prior to estimation
or when non-seasonal (e.g. annual) data is used. This approximation follows from the implied
roots of which is, here, the companion matrix of the seasonal system. The result provides
theoretical support for the approaches of Harvey and Scott (1994) and Franses (1995), and it
has been refined by Pitarakis (1997) who derives the exact univariate bias in a seasonal AR.
Another separate result we could show by the method of Theorem 2 is that"1
2
#
=
"1 0
0 −1# "
1
2
#−1+ (14)
does not lead to spurious correlation problems, to the order stated in the Theorem, even
though both series have a unit root. This contrasts with the spurious relation of two or more
(1) series discussed earlier. The reason for this difference becomes more evident once one
pictures both types of series in (14), and sees how different the characteristics of their time
paths are.
As a by-product of our analysis in Section 3, we have obtained another new representation
to decompose series into their stable (asymptotically stationary) and unstable (unit root)
components. It is based on using the ECM format in order to get a matrix of reduced rank.
The novelty of our approach is then to exploit the rank restriction on the matrix to make
it amenable to a more explicit specification of the structure of its Jordan’s decomposition,
when the series are restricted to being integrated of at most a certain order. This allows us to
derive our theoretical distributional results of Theorem 2, and could be used in solving other
problems as well.
Department of Mathematics and Department of Economics, University of York, Hesling-
ton, York YO1 5DD, UK;
School of Business and Economics, University of Exeter, Streatham Court, Rennes Drive,
Exeter EX4 4PU, UK; and
School of Business and Economics, University of Exeter, Streatham Court, Rennes Drive,
Exeter EX4 4PU, UK.
8
APPENDIX A
Proof of Theorem 1. The proof is divided into three parts. We start by deriving a suf-
ficient condition for the existence of the bias. We then show that Eh1
(1)i' E [11], in the
notation of (5)-(6). Finally, we show that the inner product E [p02q(2)] is made up of −1 identi-cal elements, each of them approximated by the univariate bias E [11]. The formula statedin the theorem then follows from the univariate bias E [11] ' −178143 exp [−26 ] obtained from Abadir (1993), Vinod and Shenton (1996) or Nielsen (1997). In the proof, we
occasionally employ group-theoretic ideas without using its jargon.
Part 1. Define ≡ − 1,
≡
⎡⎢⎢⎣1...
⎤⎥⎥⎦ ≡h1 2 3
i≡h
1 2... 3
i≡h1... 2
i
≡
⎡⎢⎢⎢⎢⎢⎣1 0 · · · 01 1
. . ....
....... . . 0
1 1 · · · 1
⎤⎥⎥⎥⎥⎥⎦ ≡ ≡
⎡⎢⎢⎣1...
⎤⎥⎥⎦ ≡h1 2
i 1 ≡
⎡⎢⎢⎣1...
1
⎤⎥⎥⎦
where = 1 2 , unless stated otherwise. Then, we have ≡ 0 ≡ 00. It followsfrom the derivations immediately preceding Theorem 1 that E [−1] = E [ 0−1], so thatone may use
≡ 1
2( + 0)
=1
2
⎛⎜⎜⎝⎡⎢⎢⎣1...
⎤⎥⎥⎦⎡⎢⎢⎣1...
⎤⎥⎥⎦0
+
⎡⎢⎢⎣1...
⎤⎥⎥⎦⎡⎢⎢⎣1...
⎤⎥⎥⎦0⎞⎟⎟⎠+ 120 (110 − )
instead of in this proof. (We have made use of the earlier assumption that 0 = 0.)
Furthermore, since is independent of the past (i.e. of matrix ) and has E [ ] = 0, weget E [−1] = E [1
20 (110 − ) (
00)−1]. A sufficient condition for this expectation toexist is that E [(00)−1] be finite.Let us exploit the symmetry of the structure again. All the variables 1 are gen-
erated from 1 by the same matrix, so that we end up with the symmetric 0. It
can be decomposed as
(A1) 0 ≡ 0Λ
where is a × orthogonal matrix and Λ is diagonal. Consider the transformation
replacing by 0 (inverse mapping) denoted by ↔ 0
. Because spherical densities
are invariant to rotations (they depend on only through 0), and the Jacobian of thetransformation is |det ()| = 1, we have E [(00)−1] = E [(0Λ)
−1]. By a bounding
argument on Λ generalizing the one in Evans and Savin (1981, pp.767-768), the Inverted
Wishart [e.g. Muirhead (1982, p.97)] provides the required sufficient condition + 1 for
the existence of the bias in the Gaussian case.
Part 2. By the formula for the partitioned inverse,
(A2) Eh1
(1)i≡ E
"1
1 − q02−14 q2
#=1
2E"
01 (110 − ) 1
0101 − 0102 (02
02)−1
02
01
#
9
=1
2tr
Ã(110 − ) E
"1
01
0101 − 0102 (02
02)−1
02
01
#!
≡ 1
2tr ((110 − ) E [ ])
≡ sum of the off-diagonal elements of1
2E [ ]
First, decompose 0 as in (A1), and transform ↔ 0. Then,
(A3) E [ ] = 0E"
101
01Λ1 − 01Λ2 (02Λ2)
−102Λ1
#
=∞X=0
0E⎡⎣ 1
01
01Λ1
Ã01Λ2 (
02Λ2)
−102Λ1
01Λ1
!⎤⎦
≡∞X=0
0
qΛ−1 E
hΠ1 (Π2Π1)
iq
Λ−1
where we have the projectors (idempotents)
Π1 ≡√Λ1
01
√Λ
01Λ1
Π2 ≡qΛ2 (
02Λ2)
−102
qΛ
The new variates are still independent spherical variates, and we now exploit this fact.
Second, the projector Π2 can be expressed as a power series in√Λ2
02
√Λ; e.g. see Rao
and Mitra (1971, p.62). The matrix Π2 has off-diagonal elements that are odd functions of the
independent elements of the standard Normal 2. Similarly, Π1 has off-diagonal elements that
are odd functions of the independent elements of the standard Normal 1, and the expectation
of the product Π1 (Π2Π1)is a diagonal matrix. By comparing the expectation of the expanded
and unexpanded forms in (A3), we get that the off-diagonal elements of
101
01Λ1 − 01Λ2 (02Λ2)
−102Λ1
must be odd functions of at least one of the elements of . For calculating the expectation, this
means that there exists a transformation carrying the idempotent√Λ2 (
02Λ2)
−102
√Λ
into a diagonal matrix Λ2 which is a permutation of the diagonal elements of diag(−10).Transforming 1 ↔ 1 back, then using
011 = (
2) and ≡ √Λ (with 0 = 0),
(A4) E [ ] = E⎡⎣ 1
01
011³1− 010Λ21011
´⎤⎦ ' E "101
011
#
where the omitted term is of maximal order (−1). Substituting (A4) into (A2), E
h1
(1)i'
E [11]. Note that Normality has not been assumed in this part of the proof.Part 3. All −1 terms of the inner product E [p02q(2)] are identical, since each of the variates
is generated by independent standard random walks. Let us take the first (and typical) term
of E [p02q(2)]. By the same method of part 2 of this proof, we may write it as
Ehp02diag (10)q
(2)i≡ E
"−p
02diag (10)
−14 q2
1 − q02−14 q2
#
= −12E"01 (11
0 − )2diag (10) (02
02)−1
02
010101 − 0102 (0
202)
−102
01
#
10
= −12E"01 (11
0 − )02diag (10) (
02Λ2)
−102Λ1
01Λ1 − 01Λ2 (02Λ2)
−102Λ1
#
≡ sum of the off-diagonal elements of1
2Ehi
by the same decomposition (A1) and transformation ↔ 0 as before. Using
(02Λ2)
−1 ≡ 1
02√Λ ( −Π3)
√Λ2
"1 −02Λ3 (
03Λ3)
−1
− (03Λ3)
−103Λ2
#
where
≡ 02qΛ ( −Π3)
qΛ2 (
03Λ3)
−1+ (0
3Λ3)−1
03Λ2
02Λ3 (
03Λ3)
−1
Π3 ≡qΛ3 (
03Λ3)
−103
qΛ
we have
Ehi= −0
E⎡⎣ 2
02
√Λ ( −Π3)
√Λ1
01³
01Λ1 − 01Λ2 (02Λ2)
−102Λ1
´02√Λ ( −Π3)
√Λ2
⎤⎦
= −0E⎡⎢⎣ 2
02
√Λ ( −Π3)
√Λ1
01
01√Λ ( −Π3)
√Λ1
02
√Λ ( −Π3)
√Λ2 −
³01√Λ ( −Π3)
√Λ2
´2⎤⎥⎦
By the same type of approximation as before,
Ehi' −0
E"
202Λ1
01
01Λ102Λ2 − 01Λ2
02Λ1
#
which is a reduction of the problem to = 2. We therefore need to show that
(A5) E" −22det ()
#' E
"14
det ()
#
in the remainder of the proof, with det () ≡ 14 − 22. Because of the interaction of the
numerator with the denominator, it is easiest to resort to an indirect proof of the relation,
rather than a direct derivation of the expectation. For this, we use Theorem 2.1 of Abadir
and Larsson (1996) where the joint mgf (hence moments) of the quadratic forms and (in
our notation) have been worked out.
Let ( ) be the matrices of mgf parameters corresponding to our quadratic forms
(P
0−1
P−1 0−1), and the mgf be
( ) ≡ Ehexp
³X 0−1 +
X 0−1 −1
´i≡ E
hetr
³X
0−1 +
X−1 0−1
´i
MatrixP−1 0−1 is symmetric and has only
12 ( + 1) distinct elements, so that is also
symmetric with typical diagonal element and off-diagonal element12 (from the quadratic
formP 0−1 −1). For , the typical element is , and since any matrix can be decomposed
into symmetric (Jordan) and skew-symmetric (Lie) components,
≡ + ≡ 12( + 0) +
1
2( − 0)
The latter component has no effect on the required bias (see the discussion of in Part 1 of
this proof), and we set it to zero in the mgf for the purpose of deriving the bias. Then, the
11
joint mgf of Abadir and Larsson (1996, pp.685-686) simplifies to
(− ) ≡ (− )|=0
= det³´ 1−
2det
⎛⎝h 0i " 2 ³ −1 +
´−
0
#−1 "
#⎞⎠−12
where ≡ + . Since the larger (2-square) matrix is a function of a single sub-matrix
−1 which naturally commutes with it own polynomials, we can write
(− ) = det ( )1−2 det
⎛⎜⎝h 0i⎡⎣ 2 ³ + −1
´− −1
−1 0
⎤⎦−1 "
#⎞⎟⎠− 12
= det ( )1−2 det (Λ2 − Λ1)
12 ×
det
Ãh 0
i " −Λ1 Λ2− −1 −1
# "Λ−11 0
0 Λ−12
# " −Λ2 −1 −Λ1 −1
# "
#!−12
= det ( )1−2 det (Λ2 − Λ1)
12 det
³−Λ
1 ( − Λ2 ) + Λ2 ( − Λ1 )
´− 12
where Λ12 ≡ +−1∓
q + 2 −1 by a Jordan block-decomposition of the larger matrix,
and the problem of Abadir and Rockinger (1997) does not arise here.
Having simplified the mgf to the current setting, let us use (A5) to set = 2 and formulate
the mgf in terms of scalar quantities. It is clear that the same operation on the mgf is required
to obtain 1det () on both sides of (A5), it is sufficient to show that
(A50) −
2
"
2 (− )
¯¯=0
#'
4
"
1 (− )
¯¯=0
#
where
=
"1
122
122 4
#
=
"1 22 0
#
and 2 ≡ 12(12 + 21) corresponds to the off-diagonal element of . To approximate the bias
to fixed precision for ∞, expand () for both and in the same neighbourhood of
zero, say 1 . Then,
Λ12 = 2 + (2 + )−1 ∓
q2 + 2 (2 + ) −1 ' 2 + (2 + )
−1 ∓√2 −1
and Λ12 ' 2 ∓√2 , giving
(A6) (− ) = det ( )1−2 det (Λ2 − Λ1)
12 det
³−Λ
1 (2 − Λ2 ) + Λ2 (2 − Λ1 )
´− 12
' det ( )1−2 det
³8 −1
´ 14
det
µ³2 + (2 + )
−1 −√2 −1
´ √2 +
³2 + (2 + )
−1 +√2 −1
´ √2
¶− 12
= 2det ( )2 det
µ³³2 + (2 + )
−1 −√2 −1
´det
´
12
+³³2 + (2 + )
−1 +√2 −1
´det
´¶−12
' 2 det
µexp
∙
2( − 2)
¸¶det
³exp
h³³2 + (2 + )
−1 −√2 −1
´det − 2
´i+exp
h³³2 + (2 + )
−1 +√2 −1
´det − 2
´i´− 12
= exp
∙−2tr³
−1´det ¸exp
∙
2
³tr − tr
³2 + −1
´det
´¸det
³cosh
³ (det )
√2 −1
´´− 12
= exp
∙−2(14 − 22)
¸exp [− det ] det
³cosh
³ (det )
√2 −1
´´− 12
≡ (− )
Differentiating as in (A50),
−
2 (− )
¯¯=0
= −22
(0− )
and
1 (− )
¯¯=0
= −42
(0− )
Then, differentiating with respect to 2 and 4, respectively,
−
2
"
2 (− )
¯¯=0
#= −
2 (0− )−
22
(0− )2
and
4
"
1 (− )
¯¯=0
#= −
2 (0− )−
24
(0− )4
The dominant term is the first one, in both cases, since is in the neighbourhood of zero and
by the definition of (0− ) in (A6). (Note that
(0− ) is made of det() terms whichare quadratic forms in the elements of , and cosh() is an even function.) This establishes
(A50).Remark 1. Let 2 ≡ q02−14 q21 be the squared multiple correlation coefficient (with
zero means imposed) of 1−1 with the remaining components of −1. Statistically, themeaning of (A4),
Eh1
(1)i≡ E
"1
1 − q02−14 q2
#≡ E
"1
1 (1−2)
#' E
"1
1
#
is that realisations in the upper tail (11 ' 0) of the negatively-skewed unit root distributioncoincide with high spurious correlation, and realisations in the lower tail (‘stationary’ side)
coincide with 2 ' 0.Remark 2. 2 is not small. The approximation in (A4) was the outcome of a change
of variable allowed by the interaction of the numerator (1) with the expanded denominator
(2) in (A3). After this change of variable Π2 ↔ Λ2 in (A4), the transformed 2 may then
be viewed as the squared correlation between a unit-root series and − 1 independent stableones.
Remark 3. In the multivariate case, the first (and typical diagonal) component of the
estimators’ MSE is approximately
E"
1
1 (1−2)
#≡ E
"1
1
#+
∞X=1
E"1
1tr³(Π1Π2)
´#
13
= E"1
1
#+
∞X=1
E⎡⎣ 11
⎛⎝Ã − 1
!⎞⎠⎤⎦ =
Ã × E
"1
1
#!
where E [2] of Muirhead (1982, pp.146,169-171) has been used as an approximation for thediagonal components of (Π1Π2)
, in the light of Remark 2. Unlike the expression for the bias,
this formula does not calculate the MSE: it provides only an order of magnitude.
14
APPENDIX B
Proof of Theorem 2. From (3),
− =³X
0−1´ ³X
−1 0−1´−1
= −1³X
0−1´ ³X
−1 0−1´−1
Introducing the (block-)diagonal weighting matrix Γ for ,
Γ ≡ diag³1
1√−
´
we can write
(B1) − = −1³X
0−1Γ
´ ³XΓ−1 0
−1Γ´−1
Γ
The matrixP
Γ−1 0−1Γ is asymptotically block-diagonal. Furthermore, the elements of the
off-diagonal submatrices are ³−
12
´. So, inverting the matrix in (B1) gives
(B2) −
= −1³X
0−1Γ
´diag
∙³1 2
X1−1 0
1−1´−1
³1
X2−1 0
2−1´−1¸
Γ + ³1
´= −1
³X
0−1Γ
´diag
∙1
³12
X1−1 0
1−1´−1
1√
³1
X2−1 0
2−1´−1¸
+ ³1
´= −1diag
∙1
³1
X1
01−1
´ ³12
X1−1 0
1−1´−1
1√
³1√
X2
02−1
´ ³1
X2−1 0
2−1´−1¸
+ ³1
´= −1diag
∙³X1
01−1
´ ³X1−1 0
1−1´−1
³X
202−1
´ ³X2−1 0
2−1´−1¸
+ ³1
´
To find the bias, we apply the expectation operator to (B2). Our Theorem 1 can be applied
to the first block. For the second block, Tjøstheim and Paulsen (1983) and Nicholls and Pope
(1988) have derived the relevant expansions. Theorem 2 of Nicholls and Pope (1988) can be
used after a simple modification. Their term in ≡ var drops out because we estimate
no intercepts in (1) and (3). This makes the bias an odd function of , as is discussed in the
univariate case by Abadir (1993) and the multivariate moment generating function by Abadir
and Larsson (1996). We then get
(B3) =1−1diag (−178143ΨΩΨ
0 0) + ³1
´where
≡ (E [2 02])
−1⎛⎝³− + Λ−12
´(2− + Λ2)
−1+
X=+1
(1 + ) [(1 + )Λ2 + −]−1⎞⎠
Since |1 + | 1, the defining bilinear forms for E [2 02] are expanded as
(B4) E [2 02] ≡ ΨΩΨ0 + (− + Λ2) E [2 0
2] (− + Λ02) = ΨΩΨ0 + (1 + )
We expand accordingly as
(B5) = − (ΨΩΨ0)−1 [(− + Λ2) + tr (− + Λ2) −] + (1 + )
= − (ΨΩΨ0)−1 [(− + Λ2) + (tr ()−) −] + (1 + )
15
since tr(Λ2) = tr() = tr()−. Substituting (B5) into (B3) gives the required result because
ΨΩΨ0 (− + Λ02) (ΨΩΨ0)−1 = − +ΨΩ (Λ2Ψ)
0(ΨΩΨ0)−1
= − +ΨΩ (Λ2Ξ )0(ΨΩΨ0)−1 = − +ΨΩ (ΞΛ )
0(ΨΩΨ0)−1
= − +ΨΩ (Ξ)0(ΨΩΨ0)−1 = − +ΨΩ (Ψ)
0(ΨΩΨ0)−1
= ΨΩ0Ψ0 (ΨΩΨ0)−1
by Ψ ≡ Ξ and (10).
REFERENCES
Abadir, K.M. (1993) OLS bias in a nonstationary autoregression. Econometric Theory 9,
81-93.
(1995) On efficient simulations in dynamic models. University of Exeter
Discussion Paper in Economics, 95/21.
Abadir, K.M. and K. Hadri (1995) Bias nonmonotonicity in stochastic difference equations.
University of Exeter Discussion Paper in Economics, 95/12.
Abadir, K.M. and R. Larsson (1996) The joint moment generating function of quadratic
forms in multivariate autoregressive series. Econometric Theory 12, 682-704.
Abadir, K.M. and Rockinger, M. (1997) The “devil’s horns” problem of inverting confluent
characteristic functions, Econometrica 65, 1221-1225.
Chan, N. and C.Z. Wei (1988) Limiting distributions of least squares estimates of unstable
autoregressive processes. Annals of Statistics 16, 367-401.
Evans, G.B.A. and N.E. Savin (1981) Testing for unit roots: 1. Econometrica 49, 753-779.
Franses, P.H. (1996) Recent advances in modelling seasonality. Journal of Economic
Surveys 10, 299-345.
Granger, C.W.J. and P. Newbold (1974) Spurious regressions in econometrics. Journal of
Econometrics 2, 111-120.
Harvey, A.C. and A. Scott (1994) Seasonality in dynamic regression models. Economic
Journal 104, 1324-1345.
Hendry, D.F. (1995) Dynamic Econometrics. Oxford: Oxford University Press.
Jacobson, T. and R. Larsson (1996) Bartlett correction for a likelihood ratio cointegration
test. Mimeo., Department of Economic Statistics, Stockholm School of Economics.
Muirhead, R.J. (1982) Aspects of multivariate statistical theory. New York: John Wiley
& sons.
Nicholls, D.F. and A.L. Pope (1988) Bias in the estimation of multivariate autoregressions.
Australian Journal of Statistics 30A, 296-309.
Nielsen, B. (1994) Bartlett correction in the cointegration model. Mimeo., Institute of
Mathematical Statistics, University of Copenhagen.
Nielsen, B. (1997) On the distribution of cointegration tests. Ph.D. Thesis, Institute of
Mathematical Statistics, University of Copenhagen.
Pere, P. (1997) Adjusted profile likelihood applied to estimation and testing of unit roots.
D.Phil. Thesis, University of Oxford.
Phillips, P.C.B. (1986) Understanding spurious regressions in econometrics. Journal of
Econometrics 33, 311-340.
(1987) Asymptotic expansions in nonstationary vector autoregressions. Econo-
metric Theory 3, 45-68.
(1991) Optimal inference in cointegrated systems. Econometrica 59, 283-
306.
16
(1994) Some exact distribution theory for maximum likelihood estimators
of cointegrating coefficients in error correction models. Econometrica 62, 73-93.
(1995) Fully modified least squares and vector autoregression. Economet-
rica 63, 1023-1078.
Pitarakis, J.-Y. (1997) Moment generating function and further exact results for autore-
gressions with multiple frequency unit roots. Econometric Theory, forthcoming.
Rao, C.R. and S.K. Mitra (1971) Generalized inverse of matrices and its applications. New
York: John Wiley & sons.
Stock, J.H. and M.W. Watson (1988) Testing for common trends. Journal of the American
Statistical Association 83, 1097-1107.
Tjøstheim, D. and J. Paulsen (1983) Bias of some commonly-used time series estimates.
Biometrika 70, 389-399 [Correction (1984) 71, 656].
Tsay, R.S. and G.C. Tiao (1990) Asymptotic properties of multivariate nonstationary
processes with applications to autoregressions. Annals of Statistics 18, 220-250.
Vinod, H.D. and L.R. Shenton (1996) Exact moments for autoregressive and random walk
models for a zero or stationary initial value. Econometric Theory 12, 481-499.
17
TABLE I
Simulated values of −100×0 with 100,000 replications
(formula of Theorem 1 in parentheses)
= 1 = 2 = 3 = 4 = 5
= 2564
(64)
134
(128)
201
(193)
261
(257)
321
(318)
= 5034
(34)
71
(68)
108
(101)
143
(135)
177
(169)
= 10018
(17)
37
(35)
56
(52)
74
(69)
93
(87)
= 20009
(09)
19
(18)
29
(26)
38
(35)
48
(44)
18