+ All Categories
Home > Documents > A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum...

A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum...

Date post: 22-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
27
A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models * Catherine Doz, Universit´ e Cergy-Pontoise Domenico Giannone, European Central Bank, ECARES and CEPR Lucrezia Reichlin, European Central Bank, ECARES and CEPR March 17, 2008 Abstract Is maximum likelihood suitable for factor models in large cross-sections of time series? We answer this question from both an asymptotic and an empirical perspec- tive. We show that estimates of the common factors based on maximum likelihood are consistent for the size of the cross-section (n) and the sample size (T ) going to infinity along any path of n and T and that therefore maximum likelihood is viable for n large. The estimator is robust to misspecification of the cross-sectional and time series correlation of the the idiosyncratic components. In practice, the estima- tor can be easily implemented using the Kalman smoother and the EM algorithm as in traditional factor analysis. JEL Classification: C51, C32, C33. Keywords: Factor Model, large cross-sections, Quasi Maximum Likelihood. * We thank Marta Banbura, Massimo Franchi, Ursula Gather and Marco Lippi for helpful sug- gestions and seminar participants at the International Statistical Institute in Berlin 2003, the European Central Bank, 2003, the Statistical Institute at the Catholic University of Louvain la Neuve, 2004, the Institute for Advanced Studies in Vienna, 2004, the Department of Statis- tics at Carlos III University, Madrid, 2004, the Federal Reserve Board of Governors, 2006, the Department of Economics at New York University, 2006, the Department of Economics, Columbia University, 2006. The opinions in this paper are those of the authors and do not necessarily reflect the views of the European Central Bank. Please address any comments to Catherine Doz [email protected]; Domenico Giannone [email protected]; or Lucrezia Reichlin [email protected] 1
Transcript
Page 1: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

A Quasi Maximum Likelihood Approach for Large

Approximate Dynamic Factor Models∗

Catherine Doz, Universite Cergy-PontoiseDomenico Giannone, European Central Bank, ECARES and CEPRLucrezia Reichlin, European Central Bank, ECARES and CEPR

March 17, 2008

Abstract

Is maximum likelihood suitable for factor models in large cross-sections of timeseries? We answer this question from both an asymptotic and an empirical perspec-tive. We show that estimates of the common factors based on maximum likelihoodare consistent for the size of the cross-section (n) and the sample size (T ) going toinfinity along any path of n and T and that therefore maximum likelihood is viablefor n large. The estimator is robust to misspecification of the cross-sectional andtime series correlation of the the idiosyncratic components. In practice, the estima-tor can be easily implemented using the Kalman smoother and the EM algorithmas in traditional factor analysis.

JEL Classification: C51, C32, C33.

Keywords: Factor Model, large cross-sections, Quasi Maximum Likelihood.

∗We thank Marta Banbura, Massimo Franchi, Ursula Gather and Marco Lippi for helpful sug-gestions and seminar participants at the International Statistical Institute in Berlin 2003, theEuropean Central Bank, 2003, the Statistical Institute at the Catholic University of Louvainla Neuve, 2004, the Institute for Advanced Studies in Vienna, 2004, the Department of Statis-tics at Carlos III University, Madrid, 2004, the Federal Reserve Board of Governors, 2006,the Department of Economics at New York University, 2006, the Department of Economics,Columbia University, 2006. The opinions in this paper are those of the authors and do notnecessarily reflect the views of the European Central Bank. Please address any comments toCatherine Doz [email protected]; Domenico Giannone [email protected]; or Lucrezia [email protected]

1

Page 2: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

1 Introduction

Many problems in economics must be addressed by studying the dynamic behavior oflarge panels of time series. When data show strong cross-sectional correlation, as itis the case in macroeconomic and financial time series, factor models offer a realisticparsimonious representation since, in those models, few factors are assumed to drivethe essential covariation of the data. In the last decade, the econometric literaturehas borrowed this fundamental idea of factor analysis and developed estimators for thefactors suitable for large dimensional data. The fundamental result of this literaturehas been to show that principal components are consistent estimators of the commonfactors for both the cross-sectional dimension n and the sample size T going to infinity(Forni, Hallin, Lippi, and Reichlin, 2000, 2005; Stock and Watson, 2002a,b). This resulthas also been proved to hold for very general assumptions on the cross-correlation of theidiosyncratic components. Principal components have been considered as the solutionof a computational problem since they can be easily computed even if the cross-sectionaldimension n is large and possibly larger than the sample size T . More fundamentally,principal components are feasible since consistency can be achieved for any path of nand T (Bai, 2003; Bai and Ng, 2002; Forni, Giannone, Lippi, and Reichlin, 2007).

Maximum likelihood estimation, which was standard in traditional factor analysisapplied to small models (Geweke, 1977; Sargent and Sims, 1977; Geweke and Singleton,1980; Watson and Engle, 1983; Stock and Watson, 1989), was discarded from the startas a viable strategy for factor models in large panels. Bai (2003), pag. 138, for exam-ple, states that maximum likelihood is unfeasible for large dimensional data becauseit requires the estimation of too many parameters and uses this as a motivation forprincipal components.

However, maximum likelihood estimation is clearly more appealing than principalcomponents not only because it may lead to efficiency gains, but, most importantly,because it provides a framework for incorporating restrictions derived from economictheory in the model. Likelihood based methods for extracting common factors fromlarge cross-sections have been indeed used by Bernanke, Boivin, and Eliasz (2005);Boivin and Giannoni (2005); Kose, Otrok, and Whiteman (2003); Reis and Watson(2007).

For this reasons, establishing the properties of maximum likelihood for factor modelsin large panels of time series is a relevant question from both the theory and appliedpoint of view. This is the objective of this paper.

We estimate a model with orthogonal idiosyncratic elements (exact factor model)and derive the n, T rates of convergence for the maximum likelihood estimates of thecommon factors. Our consistency result shows that the expected value of the estimatedcommon factors converges to the true factors as n, T → ∞ along any path (we alsoprovide the consistency rates). The estimator is robust to misspecification of the cross-sectional and time series correlation of the idiosyncratic components.

The central idea of our analysis, is to treat the exact factor model as a misspecifiedapproximating model and analyze the properties, for n and T going to infinity, of themaximum likelihood estimator of the factors under misspecification, that is when thetrue probabilistic model is approximated by a more restricted model. This is a quasi

2

Page 3: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

maximum likelihood estimator (QML) in the sense of White (1982). QML estimatorsof a factor model have been already considered in the literature. Sentana and Shah(1994) advocate the use of Maximum Likelihood rather than principal components andconsider the gaussian QML estimation of a static exact factor models when the seriesare in fact non gaussian. Doz and Lenglart (1999) have studied the properties of QMLestimators in a dynamic exact factor model, when serial correlation is omitted in theapproximating model. Neither authors, however, consider large panels and, in theirasymptotic analysis, n is fixed while T tends to infinity.

Computationally, classical likelihood based methods present no problem in the largen case. Under standard parameterizations, the factor model can in fact be cast in astate space form and the likelihood can be maximized via the EM algorithm whichrequires at each iteration only one run of the Kalman smoother (Watson and Engle,1983). The computational complexity of the smoother depends essentially on the num-ber of common factors which is typically small. The intuition of why this works wasfirst suggested by Quah and Sargent (1992) who estimated a model with n = 60. Fur-thermore, principal components can be used to initialize the numerical algorithm formaximum likelihood estimation. Recent results by Jungbacker and Koopman (2008)show that computational efficiency can be further improved.

The paper is organized as follows. Section two states the assumptions for the datagenerating process (“the model generating the data”) and those for the model we willuse in estimation (“approximating model”). Section three states the basic propositionshowing consistency and rates for the quasi maximum likelihood estimator. Section fourdiscusses the relations between quasi maximum likelihood and principal componentsestimator. Section five illustrates the empirical results with a Monte-Carlo study andSection six concludes.

Notation

For any positive definite square matrix M , we will denote by λmax(M) (λmin(M))its largest (smallest) eigenvalue. Moreover, for any matrix M we will denote by‖M‖ the spectral norm defined as ‖M‖ =

√λmax(M ′M). Given a stochastic pro-

cess Xn,T ; T ∈ Z, n ∈ Z, and a real sequence an,T ; T ∈ Z, n ∈ Z we will say thatXn,T = OP

(1

anT

)as n, T → ∞, if the probability that an,T Xn,T is bounded tends to

one as n, T →∞.

2 Models

2.1 The approximate dynamic factor model

We suppose that an n-dimensional zero-mean stationary process xt is the sum of twounobservable components:

xt = Λ0ft + et (2.1)

3

Page 4: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

where ft = (f1t, ..., frt)′, the common factors, is an r-dimensional stationary processwith mean zero; Λ0, the factor loadings, is an n × r matrix; et = (e1t, ..., ent)′, theidiosyncratic components, is an n-dimensional stationary process with mean zero andcovariance matrix E(ete′t) = Ψ0, whose entries will be denoted by E(eitejt) = ψ0,ij . Thecommon factors ft and the idiosyncratic component et are assumed to be uncorrelatedat all leads and lags, that is E(fjteis) = 0 for all j = 1, ..., r, i = 1, ..., n and t, s ∈ Z.The number of common factors r is typically much smaller than the cross-sectionaldimension n.

Of course in equation (2.1) ft and Λ0 are defined up to an invertible matrix.However, it is possible to impose a normalization constraint, such as, for instance,E(ftf ′t) = Ir in which case this invertible matrix has to be an orthogonal one1. Due tothis indetermination, only the subspaces spanned by the columns of (f ′1, . . . , f ′T ), or bythe columns of Λ0, have in fact to be estimated.

Given a sample of size T , we will denote by capital cases the matrices collectingall the observations, that is X = (x1, ...,xT )′ is the T × n matrix of observables,F = (f1, ..., fT )′ is the T × r matrix of common factors and E = (e1, ..., eT )′. Allthese quantities depend on the size of the cross-section and on the sample size. Fornotational convenience we will not index them by n, T .

Our goal is to estimate the common factors F, given the observations X. For thispurpose we need to impose some assumptions and we will use those defining an ap-proximate dynamic factor structure. “Approximate” stands for a model that allowsfor limited cross-correlation among idiosyncratic components (Chamberlain and Roth-schild, 1983). This is to be distinguished from the “exact factor structure” whoseidiosyncratic elements are restricted to be cross-sectionally orthogonal.2 The modelis dynamic since we allow for weak serial correlations of the common factor and theidiosyncratic components, see Assumption B below. Approximate factor models fordynamic panels have been studied, under similar assumptions, by Bai and Ng (2002,2006); Forni, Giannone, Lippi, and Reichlin (2007); Forni, Hallin, Lippi, and Reichlin(2000, 2005); Stock and Watson (2002a,b).

Assumption A (Approximate factor model)

A1 0 < λ < lim infn→∞ 1nλmin (Λ′0Λ0) ≤ lim supn→∞

1nλmax (Λ′0Λ0) < λ < ∞

A2 0 < ψ < lim infn→∞ λmin (Ψ0) ≤ lim supn→∞ λmax (Ψ0) < ψ < ∞

Assumption B1In the framework we choose below, it will however appear that it is more convenient to choose

another normalization constraint.2Exact factor model have been studied and applied in econometrics by Watson and Engle (1983);

Geweke (1977); Kose, Otrok, and Whiteman (2003); Quah and Sargent (1992); Sargent and Sims (1977);Stock and Watson (1991), among others.

4

Page 5: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

There exists a positive constant M such that for all i, j ∈ N and for all T ∈ Z

i) E(√

T(

1T

∑Tt=1 eitejt − ψ0,ij

))2< M

ii) E∥∥∥ 1√

T

∑Tt=1 ftejt

∥∥∥2

< M

iii) E∥∥∥√

T(

1T

∑Tt=1 ftf

′t − Ir

)∥∥∥2

< M

Assumption A1 entails that for n sufficiently large Λ′0Λ0/n has full rank r. Underthis assumption the common factors are required to remain pervasive as we increasethe number of series in the data-set. Assumption A2 limits the cross-correlation ofthe idiosyncratic components. While it includes the case in which they are mutuallyorthogonal (“exact factor model”), it allows for a more general structure.

Assumption B requires that the entries the sample covariance matrix of the com-mon factors and the idiosyncratic component are

√T consistent to their population

counterpart, uniformly with respect to the cross-sectional dimension.

2.2 The approximating model

Our goal is to estimate the common factors, given the observations, by maximumlikelihood. For this purpose, we need to impose a parameterization that is sufficientlyparsimonious. In the exact factor model parsimony is achieved by restricting the cross-correlation among idiosyncratic components to be zero. Once this restriction is relaxed,as in Assumption A2, there is no obvious way to model the cross-sectional correlationamong idiosyncratic terms since in the cross-section there is no natural order.

To solve this problem, our strategy is to estimate the exact factor model consideredas a possibly misspecified approximation to model 2.1 and then prove that the effectsof misspecification due to the approximation vanish as n, T →∞, under AssumptionsA and B.

Let us define the approximating model.

Approximating model: exact dynamic factor model

R1 The common factors follow a finite order Gaussian VAR: A(L)ft = ut, withA(L) = I − A1L− ...− ApL

p an r × r filter of finite length p with roots outsidethe unit circle, and ut an r dimensional gaussian white noise, ut ∼ i.i.d N (0, Ir).

R2 The idiosyncratic components are cross-sectionally independent gaussian whitenoises: et ∼ i.i.d N (0, Ψd) where Ψd is a diagonal matrix.

5

Page 6: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Note that, in condition R1, we fix V ut = Ir but this is only a normalization con-dition3. The orthogonality restriction among the idiosyncratic components (R2) is keyto maintaining parsimony and identification.4

Under R1 and R2, the model can be cast in a state space form with the numberof states equal to the number of common factors r. For any set of parameters thelikelihood can then be evaluated recursively using the Kalman filter.

The model is characterized by the triplet Λ, Ψd, A(L). All the parameters will becollected into θ ∈ Θ, where Θ is the parameter space defined by R1 and R2.

Given the quasi maximum likelihood estimates of the parameters θ, the commonfactors can be approximated by their estimated expected value, which can be computedusing the Kalman smoother:5

Fθ = Eθ [F|X]

where Fθ =(fθ1, ..., fθT

)′.

In the next Section we study the properties of a maximum likelihood estimator inwhich the data follow a factor model that is dynamic and approximate (AssumptionsA), while we restrict the approximating model to be exact, with non serially correlatedidiosyncratic component and autoregressive common factors (R1 and R2). This is aQuasi Maximum Likelihood (QML) estimator in the sense of White (1982). Heuristi-cally, we will ask what is the price that one pays by using an estimation model whichis misspecified in the way we have described.

Notice that asymptotic properties of the estimator are known for n fixed and T →∞and under the assumption that data are generated from an “exact factor structure”(see Watson and Engle, 1983; Stock and Watson, 1991, for example). The analysisin the next Section extends previous studies by considering joint n, T asymptotic andunder the more general assumption that the data are generated from an “approximatedynamic factor structure”.

Let us remark that QML estimators of a factor model have been already consideredin the literature. Sentana and Shah (1994) advocate the use of this approach ratherthan PCA and consider the gaussian QML estimation of a static exact factor modelswhen the series are in fact non gaussian. Doz and Lenglart (1999) have studied theproperties of QML estimators in a dynamic exact factor model, when serial correlationis omitted in the approximating model. Neither authors, however, consider large panelsand in their framework n is fixed while T tends to infinity.

To our knowledge our paper is the first to explore maximum likelihood estimationwith joint n, T asymptotic.

3Indeed, if M is an invertible matrix, and if gt = Mft, then gt also follow a finite order GaussianVAR: B(L)gt = vt, with vt = Mut. If V vt is supposed to be equal to Ir then M has to be an orthogonalmatrix

4We could also take into account serial correlation of the idiosyncratic components without com-promising the parsimony of the model by modelling it as cross-sectionally orthogonal autoregressiveprocess. We do not consider this case in order not to compromise expositional simplicity.

5We write Eθ [F|X] to denote Eθ [F|X] computed at θ = θ.

6

Page 7: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

3 The asymptotic properties of the QML estimator of thecommon factors

Let us introduce some further technical assumptions. First, to avoid degenerate solu-tions for the maximum likelihood problem, we will impose the following constraints inthe maximization of the likelihood:

Constraints in the maximization of the likelihood

i) 0 < c ≤ ψii ≤ c < ∞ for all i ∈ N.

ii) |A(z)| 6= 0, ∀|z| ≤ 1

Let θ be the parameters estimated by maximum likelihood under the constraints(i) and (ii). We write Fθ for the implied estimates of the common factors.

Constraints (i) and (ii) define a new parameter space Θc ⊆ Θ. These constraints isnecessary to avoid situations in which the estimated parameters imply non-stationarityof the common factors and/or trivial situation in which the variance of the idiosyncraticnoise is either zero or infinite. Then, with Assumption C below we will insure that theconstraint on the size of the idiosyncratic component is never binding.

Assumption CThere exists δ > 0 such that c ≤ ψii − δ ≤ ψii + δ ≤ c for all i ∈ N, where c and c arethe constant terms defining the constrained maximization of the likelihood.

We are now ready to prove our main result.

Proposition 1 Under assumptions A, B and C we have:

trace(

1T

(F− FθH)′(F− FθH))

= Op

(1

∆nT

)as n, T →∞

where H =(F′

θFθ

)−1F′

θF is the coefficient of the OLS projection of F on Fθ and

∆nT = min√

T , nlog(n)

is the consistency rate.

Proof See the appendix.

The result above tells us that the common factors extracted using the Quasi Max-imum Likelihood estimates of the parameters converge to the true common factors as

7

Page 8: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

the sample size T and the cross-sectional dimension n go to infinity. No restriction ofthe relative path of divergence of T and n is needed in order to achieve consistency.In this sense the estimates are viable also when the size of the cross-section n is muchlarger than the sample size T .

Notice that since factors are identified only up to a rotation, the estimates convergeto a rotation of the common factors.

Remark 1: The result of Proposition 1 holds if the likelihood is maximized under anyadditional restrictions on A(L).

Remark 2: The result of Proposition 1 still holds if the number of factors in theapproximating model is larger than the true number of factors r.

The proof of Remark 1 and Remark 2 is in the appendix.

4 Quasi Maximum Likelihood and Principal Components

Common factors in large cross-sections have been traditionally estimated by principalcomponents. The latter are closely connected with the Quasi Maximum Likelihoodestimator we propose here. Let us replace restrictions R1 and R2 by the stronger ones,R1∗ and R2∗:

Approximating model: exact static factor model with spherical idiosyn-cratic component

R1∗ ft ∼ i.i.d. N (0, Ir)

R2∗ et ∼ i.i.d. N (0, σ2In).

In this case the log likelihood takes the form:

LX (X; θ) = −nT

2log 2π − T

2log |ΛΛ′ + σ2In| − T

2Tr

(ΛΛ′ + σ2In

)−1S

where S = 1T X′X is the sample covariance matrix of the observations. Under the

normalization that Λ′Λ is a diagonal matrix with diagonal entries in decreasing orderof magnitude, the maximum likelihood solution is:6

Λ = V(D − σ2Ir)1/2 and σ2 =1n

Trace(S − ΛΛ′)

6See, for instance, Lawley and Maxwell (1963), Chap. 4.

8

Page 9: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

where D is the r× r diagonal matrix containing the r largest eigenvalues of the samplecovariance matrix and V is the n × r matrix whose columns are the correspondingnormalized eigenvalues (V ′V = Ir), that is SV = VD. The estimator for the commonfactors is given by

Fθ = Eθ [F|X] = X(ΛΛ′ + σ2In

)−1Λ = XΛ

(Λ′Λ + σ2In

)−1= XV(D − σ2Ir)1/2D−1

which is proportional to the sample principal components Z = (z1, ..., zT )′, defined asZ = XVD−1/2. The result of Proposition 1 still holds in this case. Consistency ofthe principal components estimates is a particular case of Proposition 1. For a formalproof see the Appendix.7 This provides an alternative proof of the result in Bai andNg (2002) under a different set of assumptions.

5 Monte Carlo study

In this section we run a simulation study to asses the performances of the Quasi Max-imum Likelihood estimator. The model from which we simulate is standard in theliterature. A similar model has been used, for example, in Stock and Watson (2002a).Let us define it below.8

xit =∑r

j=1 Λijfjt + eit, in vector notation xt = Λft + et;

A(L)ft = ut, with ut i.i.d. N (0, Ir);

D(L)et = vt with vt i.i.d. N (0, T );

Aij(L) =

1− ρL if i = j

0 if i 6= jfor i, j = 1, ..., r;

Dij(L) =

1− dL if i = j

0 if i 6= jfor i, j = 1, ..., n;

Λij i.i.d. N (0, 1) for i = 1, ..., n, j = 1, .., r;

αi = βi1−βi

11−ρ2

∑rj=1 Λ2

ij with βi i.i.d. U([u, 1− u]);

Tij = τ |i−j| (1− d2)√

αiαj for i, j = 1, ..., n.

The model allows for cross-correlation between idiosyncratic elements. Since T is aToeplitz matrix, the cross-correlation among idiosyncratic elements is limited and it iseasily seen that Assumption A (ii) is satisfied. The coefficient τ controls for the amountof cross-correlation. The exact factor model corresponds to the case τ = 0.

7Further, traditional factor analysis with non serially correlated data corresponds to the case A(L) =Ir. Also under this restriction we have consistency of the common factors estimates.

8In what follows we denote by Mij the ij-th entry of the matrix M ,

9

Page 10: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

The coefficient βi is the ratio between the variance of the idiosyncratic component,eit, and the total variance of the corresponding variable, xit. In the simulation this ratiodrawn from a uniform distribution with an average of 50%. If u = .5 then the stan-dardized observations have cross-sectionally homoscedastic idiosyncratic components.

Notice that if τ = 0, d = 0, our approximating model is well specified and henceMaximum Likelihood provides the most efficient estimates. If τ = 0, d = 0, ρ = 0,we have a static exact factor model and iteratively reweighted principal componentsprovide the most efficient estimates. Finally, if τ = 0, d = 0, u = 1/2, then we have astatic factor model with spherical idiosyncratic components on standardized variables.In this case principal components on standardized variables provide the most efficientestimates.

We generate the model for different sizes of the cross-section: n = 5, 10, 25, 50, 100,and for sample size T = 50, 100.

Maximum likelihood estimates are computed using the EM algorithm as in Watsonand Engle (1983) and Quah and Sargent (1992). This algorithm has the advantageof requiring only one run of the Kalman smoother at each iteration. The computa-tional complexity of the Kalman smoother depends mainly on the number of stateswhich in our approximating model corresponds to the number of factors r and hence isindependent of the size of the cross-section n.

To initialize the algorithm, we compute the first r sample principal components zt

and estimate the parameters Λ(0), Ψ(0)d , A(0)(L) by OLS treating the principal compo-

nents as if they were the true common factors. Since these estimates have been provedto be consistent for large cross-sections (Bai, 2003; Forni, Giannone, Lippi, and Reich-lin, 2007; Doz, Giannone, and Reichlin, 2007), the initialization is quite good when thecross-section dimension is large. We hence expect the number of iterations required forconsistency to decrease as the cross-sectional dimension increases.

The two features highlighted above – small number of state variables and goodinitialization – make the algorithm feasible in a large cross-section.

To get the intuition of the EM algorithm, let us collect the initial values of theparameters in θ(0). We obtain a new updated estimate of the common factors byapplying the Kalman smoother:

fθ(0),t = Eθ(0)(ft|x1, ...,xT ).

If we stop here we have the two-step estimates of the common factors studied by Doz,Giannone, and Reichlin (2007) and applied by Giannone, Reichlin, and Sala (2004);Giannone, Reichlin, and Small (2005).

A new estimate of the parameters, to be collected in θ(1), can then be computedby treating fθ(0),t as if they were the true common factors. In this case the maximumlikelihood estimates can be computed by OLS. If the OLS regressions are modified inorder to take into account the fact that the common factors are estimated,9 then we

9This requires the computation of Eθ(m)(fθ(m),t − ft)(fθ(m),t−k − ft−k)′, k = 0, ..., p, which are alsocomputed by the Kalman smoother. See for example Watson and Engle (1983).

10

Page 11: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

have the EM algorithm which converges to the local maximum of the likelihood.10

We control convergence by looking at cm = LX(X;θ(m))−LX(X;θ(m−1))

(LX(X;θ(m))+LX(X;θ(m−1)))/2. We stop

after M iterations if cM < 10−4.

We simulate the model 500 times and, at each repetition, we apply the algorithm tostandardized data since the principal components used for initialization are not scaleinvariant.

We compute the following estimates of the common factors:

- principal components: fpc,t := zt;

- two-step estimates: f2s,t = fθ(0),t

- maximum likelihood estimates: fml,t := fθ(M),t.

We measure the performance of the different estimators by means of the followingtrace statistics:

Tr(F′F(F′F)−1F′F

)

Tr (F′F)

where F =(f1, ..., fT

)′, and ft is any of the three estimates of the common factors.

The trace statistics is a multivariate version of the R2 of the regression of the truefactors on the estimated factors. It is smaller than 1 and tends to 1 if the empiricalcanonical correlations between the true factors and their estimates tend to 1. It isthus an appropriate measure, since the common factors are only identified up to arotation. A number close to one indicates a good approximation of the true commonfactors. Denoting by TRpc, TR2s TRml the trace statistics for, respectively, principalcomponent, two-step and maximum likelihood estimates of the common factors, wecompute the relative trace statistics TRml/TRpc and TRml/TR2s. Numbers higherthan one indicate that Maximum Likelihood estimates of the common factors are moreaccurate than principal components and two-step estimates.

Table 1 reports the results of the Monte Carlo experiment for one common factor,r = 1, with serial correlation in the common factor, ρ = .9, and idiosyncratic compo-nents, d = .5. The model is approximate because of the weak cross-sectional correlationamong idiosyncratic components, τ = .5. Finally the idiosyncratic component is cross-sectionally heteroscedastic, u = .1. The numbers in the table refer to the average acrossexperiments.

We would like to stress the following results:

1. The precision of the common factors estimated by Maximum Likelihood improveswith the size of the cross-section n.

10A detailed derivation of the EM algorithm for dynamic factor model is provided by Ghahramaniand Hinton (1996).

11

Page 12: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Table 1: Simulation results for the model: ρ = .9, d = .5, τ = .5, u = .1, r = 1TRml

n = 5 n = 10 n = 25 n = 50 n = 100T = 50 0.52 0.68 0.74 0.75 0.76T = 100 0.64 0.78 0.84 0.85 0.86

Number of iterationsn = 5 n = 10 n = 25 n = 50 n = 100

T = 50 13 9 5 4 3T = 100 13 7 4 4 3

Computation time: secondsn = 5 n = 10 n = 25 n = 50 n = 100

T = 50 0.53 0.25 0.20 0.33 1.07T = 100 0.66 0.37 0.33 0.61 2.13

TRml/TRpc

n = 5 n = 10 n = 25 n = 50 n = 100T = 50 1.11 1.04 1.00 1.00 1.00T = 100 1.09 1.02 1.01 1.00 1.00

TRml/TR2s

n = 5 n = 10 n = 25 n = 50 n = 100T = 50 1.03 1.01 1.00 1.00 1.00T = 100 1.02 1.00 1.00 1.00 1.00

2. The number of iterations required for convergence is small and decreases with thesize of the cross-section. The computation time for convergence does not increasemuch with the cross-sectional dimension. As remarked above, this is due to thefact that, as n increases, the initialization provided by principal components isincreasingly accurate.

3. The Maximum Likelihood estimates always dominate simple principal compo-nents and to a less extent the two-step procedure. As both n, T become large theprecision of the estimated common factors improves and all methods tend to per-form similarly. This is not surprising, given that all methods provide consistentestimates for n and T large. Improvement of the ML estimates are significantfor n = 5 and the improvement is of the order of 10% with respect to principalcomponents and less than 5% for the two-step estimates. The two-step Kalmansmoother estimates already take appropriately into account the dynamics of thecommon factors and the cross-sectional heteroscedasticity of the idiosyncraticcomponent. Hence the gains from further iterations are small.11

Table 2 reports the results for r = 3 while the remaining parameters are the sameas those used the Table 1: ρ = .9, d = .5, τ = .5, u = .1. The simulations have been runfor n ≥ 10 only, because an exact factor model with n = 5 and r = 3 is not identified.Notice that although the main features outlined above are still present, the estimatesof the common factors are less precise with respect to the case of only one common

11See also Doz, Giannone, and Reichlin (2007).

12

Page 13: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Table 2: Simulation results for the model: ρ = .9, d = .5, τ = .5, u = .1, r = 3TRml

n = 10 n = 25 n = 50 n = 100T = 50 0.48 0.59 0.65 0.67T = 100 0.58 0.75 0.80 0.82

Number of iterationsn = 10 n = 25 n = 50 n = 100

T = 50 26 12 7 5T = 100 20 9 5 4

Computation time: secondsn = 10 n = 25 n = 50 n = 100

T = 50 0.72 0.46 0.56 1.44T = 100 1.08 0.68 0.87 2.31

TRml/TRpc

n = 10 n = 25 n = 50 n = 100T = 50 1.08 1.05 1.03 1.01T = 100 1.10 1.06 1.02 1.01

TRml/TR2s

n = 10 n = 25 n = 50 n = 100T = 50 1.05 1.02 1.01 1.00T = 100 1.07 1.03 1.00 1.00

factor (given the same set of data, it is more difficult to extract additional factors).Improvements by the maximum likelihood estimator are more sizable in this case. Thisindicates that efficiency improvements are larger, the harder is the factor extraction.

We finally study a case in which the approximating model is well specified, that isthe idiosyncratic components are neither serially nor cross-sectionally correlated (d =0, τ = 0). The remaining parameters are set as for the experiments reported in Table 1and 2. In this case, as one can see from Table 3 below, the efficiency gains from QuasiMaximum Likelihood estimates over principal components and two-step estimates aremore relevant.

Summarizing, Quasi Maximum Likelihood estimates of approximate factor mod-els work well in finite sample. Because of the explicit modelling of the dynamics andthe cross-sectional heteroscedasticity, the maximum likelihood estimates dominate theprincipal components and, to a less extent, the two-step procedure. Efficiency improve-ments are relevant when the factor extraction is difficult, that is, when there are morecommon factors to estimate.

6 Summary and conclusions

This paper is the first study of maximum likelihood based estimation for large factormodels.

We show consistency of the estimated factors under different sources of misspecifica-

13

Page 14: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Table 3: Simulation results for the model: ρ = .9, d = 0, τ = 0, u = .1, r = 3TRml

n = 10 n = 25 n = 50 n = 100T = 50 0.54 0.65 0.68 0.70T = 100 0.66 0.78 0.81 0.82

Number of iterationsn = 10 n = 25 n = 50 n = 100

T = 50 21 9 6 5T = 100 15 7 5 4

Computation time: secondsn = 10 n = 25 n = 50 n = 100

T = 50 0.58 0.36 0.49 1.30T = 100 0.83 0.54 0.84 2.29

TRml/TRpc

n = 10 n = 25 n = 50 n = 100T = 50 1.14 1.06 1.03 1.01T = 100 1.19 1.06 1.02 1.01

TRml/TR2s

n = 10 n = 25 n = 50 n = 100T = 50 1.07 1.02 1.01 1.00T = 100 1.10 1.01 1.00 1.00

tion with no constrains on the relative size of the cross-section n and the sample size T .This result implies that Quasi Maximum Likelihood is feasible for large cross-sections,also in cases in which the cross-sectional size is much larger that the sample size.

The estimator is easily implemented using the Kalman smoother and the EM al-gorithm as in traditional factor analysis and is viable in large panels since the compu-tational complexity of the Kalman smoother depends mainly on the number of states.The latter, in our approximating model, corresponds to the number of factors andhence is independent of the size of the cross-section. The empirical analysis showsthat efficiency improvements over principal components are relevant when the factorextraction is difficult, that is, when there are more common factors to estimate.

Beside the potential efficiency improvements over principal components, our para-metric approach provides a natural framework for structural analysis since it allows forimposing restrictions on the factor structure as done, for example, in Bernanke, Boivin,and Eliasz (2005); Boivin and Giannoni (2005); Forni and Reichlin (2001); Kose, Otrok,and Whiteman (2003); Reis and Watson (2007). These features are not studied in thispaper but they are natural extensions to be explored in further work.

14

Page 15: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

References

Bai, J. (2003): “Inferential Theory for Factor Models of Large Dimensions,” Econo-metrica, 71(1), 135–171.

Bai, J., and S. Ng (2002): “Determining the Number of Factors in ApproximateFactor Models,” Econometrica, 70(1), 191–221.

(2006): “Confidence Intervals for Diffusion Index Forecasts and Inference forFactor Augmented Regressions,” Econometrica, 74(1), 1133–1150.

Bernanke, B., J. Boivin, and P. Eliasz (2005): “Measuring Monetary Policy: AFactor Augmented Autoregressive (FAVAR) Approach,” Quarterly Journal of Eco-nomics, 120, 387–422.

Boivin, J., and M. P. Giannoni (2005): “DSGE Models in a Data-Rich Environ-ment,” Manuscript, Columbia University.

Chamberlain, G., and M. Rothschild (1983): “Arbitrage, factor structure andmean-variance analysis in large asset markets.,” Econometrica, 51, 1305–1324.

Doz, C., D. Giannone, and L. Reichlin (2007): “A Two-Step Estimator for LargeApproximate Dynamic Factor Models Based on Kalman Filtering,” CEPR DiscussionPapers 6043.

Doz, C., and F. Lenglart (1999): “Analyse factorielle Dynamique : test du nombrede facteurs, estimation et application l’enqute de conjoncture dans l’industrie,”Annales d’Economie et de Statistique, 97, 91–127.

Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2007): “Opening the blackbox - structural factor models with large gross-sections,” Working Paper Series 712,European Central Bank.

Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000): “The Generalized Dy-namic Factor Model: identification and estimation,” Review of Economics and Statis-tics, 82, 540–554.

(2005): “The Generalized Dynamic Factor Model: one-sided estimtion andforecasting,” Journal of the American Statistical Association, 100, 830–840.

Forni, M., and L. Reichlin (2001): “Federal Policies and Local Economies: Europeand the US,” European Economic Review, 45, 109–134.

Geweke, J. F. (1977): “The Dynamic Factor Analysis of Economic Time Series Mod-els,” in Latent Variables in Socioeconomic Models, ed. by D. Aigner, and A. Gold-berger, pp. 365–383. North-Holland.

Geweke, J. F., and K. J. Singleton (1980): “Maximum Likelihood “Confirmatory”Factor Analysis of Economic Time Series.,” International Economic Review, 22, 37–54.

15

Page 16: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Ghahramani, Z., and G. E. Hinton (1996): “Parameter estimation for linear dy-namical systems,” Discussion paper, Manuscript, University of Toronto, available athttp://www.gatsby.ucl.ac.uk/ zoubin.

Giannone, D., L. Reichlin, and L. Sala (2004): “Monetary Policy in Real Time,”in NBER Macroeconomics Annual, ed. by M. Gertler, and K. Rogoff, pp. 161–200.MIT Press.

Giannone, D., L. Reichlin, and D. Small (2005): “Nowcasting GDP and inflation:the real-time informational content of macroeconomic data releases,” Finance andEconomics Discussion Series 2005-42, Board of Governors of the Federal ReserveSystem (U.S.).

Jungbacker, B., and S. J. Koopman (2008): “Likelihood-based Analysis for Dy-namic Factor Models,” Tinbergen Institute Discussion Papers 08-007/4.

Kose, M. A., C. Otrok, and C. H. Whiteman (2003): “International BusinessCycles: World, Region, and Country-Specific Factors,” American Economic Review,93, 1216–1239.

Lawley, D. N., and A. E. Maxwell (1963): Factor Analysis as a Statistical Method.Butterworths.

Quah, D., and T. J. Sargent (1992): “A Dynamic Index Model for Large Cross-Section,” in Business Cycle, ed. by J. Stock, and M. Watson, pp. 161–200. Univeristyof Chicago Press.

Reis, R., and M. W. Watson (2007): “Relative Goods’ Prices and Pure Inflation,”Manuscript, Princeton University.

Sargent, T. J., and C. Sims (1977): “Business Cycle Modelling without Pretendingto have to much a-priori Economic Theory,” in New Methods in Business CycleResearch, ed. by C. Sims. Federal Reserve Bank of Minneapolis.

Sentana, E., and M. Shah (1994): “An Index of Co-Movements in Financial TimeSeries,” CEMFI Working Papers 9415.

Stock, J. H., and M. W. Watson (1989): “New Indexes of Coincident and LeadingEconomic Indicators,” in NBER Macroeconomics Annual, ed. by O. J. Blanchard,and S. Fischer, pp. 351–393. MIT Press.

Stock, J. H., and M. W. Watson (1991): “A probability model of the coincidenteconomic indicators,” in The Leading Economic Indicators: New Approaches andForecasting Records, ed. by G. Moore, and K. Lahiri, pp. 63–90. Cambridge UniversityPress.

Stock, J. H., and M. W. Watson (2002a): “Forecasting Using Principal Com-ponents from a Large Number of Predictors,” Journal of the American StatisticalAssociation, 97, 147–162.

16

Page 17: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

(2002b): “Macroeconomic Forecasting Using Diffusion Indexes.,” Journal ofBusiness and Economics Statistics, 20, 147–162.

Watson, M. W., and R. F. Engle (1983): “Alternative algorithms for the estima-tion of dynamic factor, mimic and varying coefficient regression models,” Journal ofEconometrics, 23(3), 385–400.

White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models.,” Econo-metrica, 50, 1–25.

17

Page 18: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

7 Appendix

We adopt the following notation to define the pseudo likelihood under the approximat-ing model which is completely characterized by the parameter θ:

- f(X,F)(X,F ; θ) is the joint density of the common factors and the observables,depending on the parameter θ,

- fX(X; θ) and fF(F ; θ) are the corresponding marginal densities,

- fX|F=F (X; θ) and fF|X=X(F ; θ) are the corresponding conditional densities

where F ∈ R(T×r) and X ∈ R(T×n). From Bayes formula, we know that, for any (X, F ):

fX(X; θ) =fX|F=F (X; θ)fF(F ; θ)

fF|X=X(F ; θ).

The pseudo log-likelihood of the data LX(X; θ) = log fX(X; θ) can then be decom-posed in the following way:

LX(X; θ) = LX|F(X|F ; θ) + LF(F ; θ)− LF|X(F |X; θ)

where LX|F(X|F ; θ) = log fX|F=F (X; θ), LF(F ; θ) = log fF(F ; θ) and LF|X(F |X; θ) =log fF|X=X(F ; θ).

Under the normality assumption, and denoting by X the actual observed values ofthe underlying process, we can write, for any value of F :

LX|F(X|F ; θ) = −nT2 log(2π)− T

2 log |Ψd| − 12Tr(X− FΛ′)Ψ−1

d (X− FΛ′)′

LF(F ; θ) = − rT2 log(2π)− 1

2 log |Φθ| − 12(vecF ′)′Φ−1

θ (vecF ′)

LF|X(F |X; θ) = − rT2 log(2π)− 1

2 log |Ωθ| − 12(vec(F − Fθ)′)′Ω−1

θ (vec(F − Fθ)′)

with- Fθ = Eθ [F|X] = (fθ,1, ..., fθ,T )′

- Φθ = Eθ [(vecF′)(vecF′)′],- Ωθ = Eθ

[(vec(F− Fθ)′)(vec(F− Fθ)′)′

].

18

Page 19: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

We hence have, for any value of F :

LX(X; θ) = −nT2 log(2π)− T

2 log |Ψd| − 12Tr(X− FΛ′)Ψ−1

d (X− FΛ′)′

−12 log |Φθ| − 1

2(vecF ′)′Φ−1θ (vecF ′) + 1

2 log |Ωθ|+ 12(vec(F − Fθ)′)′Ω−1

θ (vec(F − Fθ)′)(7.2)

As (7.2) is true for any value of F , it can be applied with F = Fθ, so that:

LX(X; θ) = −nT2 log(2π)− T

2 log |Ψd| − 12Tr(FΛ′0 − FθΛ′ + E)Ψ−1

d (FΛ′0 − FθΛ′ + E)′

−12 log |Φθ| − 1

2vec(F′θ)′Φ−1

θ vec(F′θ) + 12 log |Ωθ|

(7.3)

Let us now evaluate the likelihood at the following set of parameters:

θc0 := A(L) = Ir; Λ = Λ0; Ψ = Ψ0,d

where Ψ0,d is the diagonal matrix obtained by setting equal to zero all the out ofdiagonal elements of Ψ0.

For θ = θc0, we have Ωθc

0= IT⊗

(Ir − Λ′0 (Λ0Λ′0 + Ψ0,d)

−1 Λ0

)= IT⊗

(Ir + Λ′0Ψ

−10,dΛ0

)−1.12

Moreover, Φθc0

= IrT , so that we have:

LX(X; θc0) = −nT

2 log(2π)− T2 log |Ψ0,d| − 1

2Tr((F− Fθc

0)Λ′0 + E

)Ψ−1

0,d

((F− Fθc

0)Λ′0 + E

)′

−12TrF′θc

0Fθc

0− T

2 log∣∣∣Ir + Λ′0Ψ

−10,dΛ0

∣∣∣ .(7.5)

As n and T go to infinity (7.5) simplifies drastically since some of the terms areasymptotically negligible. This is shown as a corollary of the following Lemma.

12For the last equality, notice that

(Λ0Λ

′0 + Ψ0,d

)−1= Ψ−1

0,d −Ψ−10,dΛ0

(Ir + Λ′0Ψ

−10,dΛ0

)−1Λ′0Ψ

−10,d (7.4)

19

Page 20: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Lemma 1 Under assumptions A, B, we have

1.∥∥∥E′E

nT

∥∥∥ = Op

(1n

)+ Op

(1√T

)as n, T →∞

2. 1T Tr(F− Fθc

0)′(F− Fθc

0) = Op

(1n

)+ Op

(1√T

)as n, T →∞

3. 1nT Tr(EΨ−1

0,dE′) = 1 + Op

(1√T

)as n, T →∞

4. 1T TrF′θc

0Fθc

0= Op (1) as n, T →∞

5. 1n log

∣∣∣Ir + Λ′0Ψ−10,dΛ0

∣∣∣ = O(

log(n)n

)as n →∞

Proof We have:∥∥∥∥E′EnT

∥∥∥∥ ≤1n‖Ψ0‖+

1n

∥∥∥∥E′ET

−Ψ0

∥∥∥∥

∥∥∥∥1n

(E′ET

−Ψ0

)∥∥∥∥2

≤ 1n2

trace

[(E′ET

−Ψ0

)′ (E′ET

−Ψ0

)]=

1n2

n∑

i=1

n∑

j=1

(1T

T∑

t=1

eitejt − ψ0,ij

)2

Taking expectations, from assumption B we obtain:

1n2

E

n∑

i=1

n∑

j=1

(1T

T∑

t=1

eitejt − ψ0,ij

)2 =

1n2

n∑

i=1

n∑

j=1

E

(1T

T∑

t=1

eitejt − ψ0,ij

)2 ≤ M

T

Result 1 hence follows from the Markov inequality.

Let us turn now to result 2. First, we have: Tr(F− Fθc0)′(F− Fθc

0) ≤ r‖F− Fθc

0‖2.

Then, using (7.4), we have:

Fθc0

= XΨ−10,dΛ0(Λ′0Ψ

−10,dΛ0+Ir)−1 = FΛ′0Ψ

−10,dΛ0(Λ′0Ψ

−10,dΛ0+Ir)−1+EΨ−1

0,dΛ0(Λ′0Ψ−10,dΛ0+Ir)−1

so that:

1T ‖F− Fθc

0‖2 ≤ 2

T

[‖FΛ′0Ψ

−10,dΛ0(Λ′0Ψ

−10,dΛ0 + Ir)−1‖2 + ‖EΨ−1

0,dΛ0(Λ′0Ψ−10,dΛ0 + Ir)−1‖2

]

≤ 2∥∥∥ 1√

TF

∥∥∥2 ‖Λ′0Ψ−1

0,dΛ0(Λ′0Ψ−10,dΛ0 + Ir)−1 − Ir‖2

+2∥∥∥ 1√

nTE

∥∥∥2 ‖√nΛ0Ψ−1

0,d(Λ′0Ψ

−10,dΛ0 + Ir)−1‖2

Assumptions A imply:

Λ′0Ψ−10,dΛ0(Λ′0Ψ

−10,dΛ0 + Ir)−1 − Ir = (Λ′0Ψ

−10,dΛ0 + Ir)−1 = O

(1n

)as n →∞

20

Page 21: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Further, we have: ‖Λ0Ψ−10,d(Λ

′0Ψ

−10,dΛ0+Ir)−1‖ ≤ ‖Λ′0Ψ−1/2

0,d ‖‖Ψ−1/20,d ‖‖(Λ′0Ψ−1

0,dΛ0+Ir)−1‖.

As assumptions A also imply:

‖Ψ−1/20,d ‖ ≤ 1√

λmin (Ψ0)= O (1) as n →∞

and

‖Λ′0Ψ−1/20,d ‖ = ‖Λ′0Ψ−1

0,dΛ0‖1/2 ≤ 1λmin (Ψ0)

‖Λ′0Λ0‖1/2 = O(√

n)

as n →∞

Result 2 then follows from the previous result of this lemma and the fact that byassumption B we have

∥∥∥ 1√TF

∥∥∥ = Op(1).

Result 3 is a direct consequence of Assumption B (i), Assumption C and the Markovinequality. In fact:

1nT

Tr(EΨ−1

0,dE′)

=1n

n∑

i=1

(1T

∑Tt=1 e2

it

ψ0,ii

)= 1 + Op

(1√T

).

To obtain result 4, notice that:

1T

TrF′θc0Fθc

0≤ r

T

∥∥∥Fθc0

∥∥∥2

=r

T

∥∥∥F + Fθc0− F

∥∥∥2 ≤ 2r

(‖ 1√

TF‖2 +

1T‖F− Fθc

0‖2

)

As ‖F− Fθc0‖2 ≤ Tr

(F− Fθc

0

)′ (F− Fθc

0

), the desired rate follows from Assumption B

(iii) and result 2.

Concerning result 5, notice that, by assumptions A:

log∣∣∣Ir + Λ′0Ψ

−10,dΛ0

∣∣∣ = log(n) + log∣∣∣∣ Ir

n +Λ′0Ψ−1

0,dΛ0

n

∣∣∣∣, with:

log∣∣∣∣ Ir

n +Λ′0Ψ−1

0,dΛ0

n

∣∣∣∣ ' log∣∣∣∣Λ′0Ψ−1

0,dΛ0

n

∣∣∣∣ ≤ r logλmax

(Λ′0Λ0

n

)

λmin(Ψ0) = O(1) as n →∞. Q.E.D.

Corollary Under the same assumptions of Lemma 1, we have:

1nT

LX(X; θc0) = −1

2log(2π)− 1

2nlog |Ψ0,d|−1

2+Op

(log(n)

n

)+Op

(1√T

), as n, T →∞

Proof

The only term for which the asymptotic behavior is not a direct consequence ofLemma 1 is the following:

21

Page 22: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

1nT Tr

((F− Fθc

0)Λ′0 + E

)Ψ−1

0,d

((F− Fθc

0)Λ′0 + E

)′

= 1nT TrΛ′0Ψ

−10,dΛ0(F−Fθc

0)′(F−Fθc

0)+2 1

nT TrΛ′0Ψ−10,dE

′(F−Fθc0)+ 1

nT TrΨ−10,dE

′E

Let us analyze the three terms in the summation separately.

The asymptotic behavior of the third term in the summation has been stated in Lemma1 (3).

The asymptotic behavior of the first term follows from Assumption A and Lemma 1(2):

1nT

TrΛ′0Ψ−10,dΛ0(F− Fθc

0)′(F− Fθc

0) ≤ 1

nTλmax

(Λ′0Ψ

−10,dΛ0

)Tr(F− Fθc

0)′(F− Fθc

0)

We know (see the proof of lemma 1) that 1nλmax

(Λ′0Ψ

−10,dΛ0

)= 1

n‖Λ′0Ψ−10,dΛ0‖ = O(1).

It then directly follows from lemma 1 (2) that:

1nT

TrΛ′0Ψ−10,dΛ0(F− Fθc

0)′(F− Fθc

0) = Op

(1n

)+ Op

(1√T

)

For the second term:

1nT TrΛ′0Ψ

−10,dE

′(F− Fθ) ≤ r∥∥∥E′E

nT

∥∥∥1/2 ∥∥∥Λ′0Λ0

n

∥∥∥ 1

(λminΨ0,d)21√T‖F− Fθc

0‖

= Op

(1n

)+ Op

(1√T

)

where the last equality follows for Lemma 1 (1-2) and Assumptions A and B.

The asymptotic simplification of the likelihood, in the Corollary above, is due to thefact that under the simple approximating model the expected common factor convergeto the true ones (Lemma 1 (i)). The expected values of the common factors, Fθc

0, are

essentially the coefficients of an OLS regression of the observation, X, on the factorloadings, Λ0. If data are gaussian and the restrictions in θc

0 are satisfied, then suchestimates of the common factors are the most efficient. However, the estimates are stillconsistent under the weaker assumptions A (i) and A (ii). This result also tells us thata large cross-section solves the common factors indeterminacy we have with a finitecross-section dimension.

Consider now the likelihood evaluated at its maximum where θ :=A(L); Λ; Ψd

are the Maximum Likelihood estimates of the parameters, with θ ∈ Θc. We will denoteby Fθ the corresponding estimates of the common factors.

The likelihood at its maximum takes the form, see equation (7.2) :

LX(X; θ) = −nT2 log(2π)− T

2 log |Ψd| − 12Tr(X− FθΛ

′)Ψ−1d (X− FθΛ

′)′

−12 log |Φθ| − 1

2vec(F′θ)′Φ−1

θvec(F′

θ) + 1

2 log |Ωθ|

22

Page 23: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Assumption C insures that the constraints on the size of the idiosyncratic variancethat is imposed in the maximization are not binding, that is θc

0 ∈ Θc. Consequently,LX(X; θ) ≥ LX(X; θc

0). Using the Corollary, this implies:

0 ≥ 2nT

(LX(X; θc

0)− LX(X; θ))

= − 1n log |Ψ0,d| − 1 + Op

(1√T

)+ Op

(log(n)

n

)

+ 1n log |Ψd|+ 1

nT Tr(X− FθΛ′)Ψ−1

d (X− FθΛ′)′

+ 1nT log |Φθ|+ 1

nT vec(F′θ)′Φ−1

θvec(F′

θ)− 1

nT log |Ωθ|

Lemma 2 Under assumptions A, B, and C, we have:

1nT Tr(X− FθΛ

′)Ψ−1d (X− FθΛ

′)′ ≥ 1nT Tr(Λ′0Ψ

−1d Λ0)′(F− FθH)′(F− FθH)

−2√

1T Tr((F− FθH)′(F− FθH))

√Op

(1√T

)+ Op

(1n

)

+ 1n

∑ni=1

ψ0,ii

ψii+ Op

(1√T

)+ Op

(1n

)

where H =(F′

θFθ

)−1F′

θF is the coefficient of the OLS projection of F on Fθ

Proof Consider the coefficients of the OLS projection of X on Fθ:

ˆΛ = X′Fθ

(F′

θFθ

)−1

We can write:

1nT Tr(X− FθΛ

′)Ψ−1d (X− FθΛ

′)′

= 1nT Tr(X− Fθ

ˆΛ′ + Fθ(ˆΛ− Λ)′)Ψ−1

d (X− FθˆΛ′ + Fθ(

ˆΛ− Λ)′)′

= 1nT Tr(X− Fθ

ˆΛ′)Ψ−1d (X− Fθ

ˆΛ′)′ + 2nT Tr(X− Fθ

ˆΛ′)Ψ−1d (Fθ(

ˆΛ− Λ)′)′

+ 1nT Tr(Fθ(

ˆΛ− Λ)′)Ψ−1d (Fθ(

ˆΛ− Λ)′)′

The second term of this last summation is: 2nT Tr(X − Fθ

ˆΛ′)Ψ−1d ( ˆΛ − Λ)F′

θwhich

is itself equal to 2nT TrF′

θ(X − Fθ

ˆΛ′)Ψ−1d ( ˆΛ − Λ). As ˆΛ is obtained by OLS projection

of X on Fθ, we have: F′θX = ˆΛ′F′

θFθ, so that this term is equal to 0.

23

Page 24: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

We then get:

1nT Tr(X− FθΛ

′)Ψ−1d (X− FθΛ

′)′

= 1nT Tr(X− Fθ

ˆΛ′)Ψ−1d (X− Fθ

ˆΛ′)′ + 1nT Tr(Fθ(

ˆΛ− Λ)′)Ψ−1d (Fθ(

ˆΛ− Λ)′)′

≥ 1nT Tr(X− Fθ

ˆΛ′)Ψ−1d (X− Fθ

ˆΛ′)′

Notice that:

X− FθˆΛ′ =

(FΛ′0 + E− Fθ

(F′

θFθ

)−1F′

θFΛ′

0 − Fθ

(F′

θFθ

)−1F′

θE

)

= (F− FθH)Λ′0 + (IT − PFθ)E

where H =(F′

θFθ

)−1F′

θF is the coefficient of the OLS projection of F on Fθ and

PFθ= Fθ

(F′

θFθ

)−1F′

θis the projection matrix associated with Fθ.

Consequently:

1nT Tr(X− Fθ

ˆΛ′)Ψ−1d (X− Fθ

ˆΛ′)′ = 1nT Tr(F− FθH)Λ′0Ψ

−1d Λ0(F− FθH)′

+ 1nT Tr(IT − PFθ

)EΨ−1d E′(IT − PFθ

)

+2 1nT Tr(F− FθH)Λ′0Ψ

−1d E′(IT − PFθ

)

We have:

1nT Tr(IT − PFθ

)EΨ−1d E′(IT − PFθ

) = 1nT Tr

(EΨ−1

d E′(IT − PFθ))

= 1nT Tr

(EΨ−1

d E′)− 1

nT Tr(EΨ−1

d E′PFθ

)

By assumption B (ii):

1nT

Tr(EΨ−1

d E′)

=1n

n∑

i=1

(1T

∑Tt=1 e2

it

ψii

)=

1n

n∑

i=1

ψ0,ii

ψii

+ Op

(1√T

)

Furthermore:

1nT

Tr(EΨ−1

d E′PFθ

)=

1nT

Tr(F′

θEΨ−1

d E′Fθ

(F′

θFθ

)−1)≤ r

1nT

λmax

(EΨ−1

d E′)

= Op

(1√T

)+Op

(1n

)

24

Page 25: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Finally,

1nT

∣∣∣Tr(F− FθH)Λ′0Ψ−1d E′(IT − PFθ

)∣∣∣

≤√

1T Tr(F− FθH)′(F− FθH)

√1

n2T TrΛ′0Ψ−1d E′(IT − PFθ

)EΨ−1d Λ0

with:

1n2T TrΛ′0Ψ

−1d E′(IT − PFθ

)EΨ−1d Λ0 ≤ 1

nT ‖E′(IT − PFθ)E‖‖Ψ−1

d ‖2 1nTrΛ′0Λ0

≤ r∥∥ 1

nT E′E∥∥ 1

c2

∥∥∥Λ′0Λ0n

∥∥∥= Op

(1√T

)+ Op

(1n

)

The desired result follows. Q.E.D.

Proof of Proposition 1

Let us first notice that vec(F′θ)′Φ−1

θvec(F′

θ) ≥ 0. Moreover, since Φθ − Ωθ is, by

construction, equal to Eθ

[(vecF′θ)(vecF′θ)

′]

it is semi-positive definite. We then have:1

nT log |Φθ| − 1nT log |Ωθ| > 0. This property holds for any A(L) satisfying R1.

Finally as x− log x is greater than 1 for any x, we get:

1n

log |Ψd|+ 1n

n∑

i=1

ψ0,ii

ψii

− 1n

log |Ψ0d| − 1 =1n

n∑

i=1

(ψ0,ii

ψii

− log

(ψ0,ii

ψii

)− 1

)≥ 0

Using Lemma 2 and the fact that nlog(n) = O(n), we then obtain:

0 ≥ 2nT

(LX(X; θc

0)− LX(X; θ))

≥ 1nT Tr(Λ′0Ψ

−1d Λ0)(F− FθH)′(F− FθH)

−2√

1T Tr(F− FθH)′(F− FθH)Op

(√1

∆nT

)+ Op

(1

∆nT

)

where∆nT = min

√T ,

n

log(n)

We can now prove our main result.

25

Page 26: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

0 ≥ 1nT Tr(Λ′0Ψ

−1d Λ0)(F− FθH)′(F− FθH)

−2√

1T Tr(F− FθH)′(F− FθH)Op

(√1

∆nT

)+ Op

(1

∆nT

)

≥ λmin

(Λ′0Ψ−1

dΛ0

n

)1T Tr(F− FθH)′(F− FθH)

−2Op

(√1

∆nT

) √1T Tr(F− FθH)′(F− FθH) + Op

(1

∆nT

)

= λmin

(Λ′0Ψ−1

dΛ0

n

)VnT − 2

√VnT Op

(√1

∆nT

)+ Op

(1

∆nT

)

where VnT = 1T Tr(F− FθH)′(F− FθH).

Since Λ′0Ψ−1d

Λ0

n = Op (1) and since lim infn,T→∞ λmin

(Λ′0Ψ−1

dΛ0

n

)> 0, we have:

VnT −√

VnT Op

(√1

∆nT

)+ Op

(1

∆nT

)≤ 0 (7.6)

which implies that: VnT = Op

(1

∆nT

)

In order to prove the latter, it is actually sufficient to notice that for any T andn we have a second order polynomial: y2 + by + c with y :=

√VnT , b = Op

(√1

∆nT

),

c = Op

(1

∆nT

)which is supposed to take a negative value in y. This is possible only if

the following conditions are satisfied:

a) the discriminant is positive, i.e. c < 14b2 (which is possible since b2 = Op

(1

∆nT

))

b) y is between the two roots of the polynomial, i.e.

12

(b−

√b2 − 4c

)≤ y ≤ 1

2

(b +

√b2 + 4c

)

The conditions a) and b) imply that y = Op

(√1

∆nT

)and hence VnT := y2 = Op

(1

∆nT

).

Q.E.D.

Proof of Remark 1The fact that Proposition 1 holds for any A(L) is easily proved by noticing that:

a) A(L) only enters the likelihood through vec(F′θ)′Φ−1

θvec(F′

θ) and

(log |Φθ| − log |Ωθ|

)

and the proof only requires these two quantities to be positive.b) imposing restrictions on A(L) in the approximating model, we define a parameterspace Θc ⊆ Θc for which we still have θc

0 ∈ Θc and hence LX(X; θ) ≥ LX(X; θ0).

26

Page 27: A Quasi Maximum Likelihood Approach for Large Approximate … · 2008-11-12 · A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models⁄ Catherine Doz, Universit¶e

Proof of Remark 2If the maximization is run for a number of common factors r > r the new model

will encompass the previous one and hence LX(X; θ) ≥ LX(X; θ0). This is all we needfor Proposition 1 to hold.

Consistency of Principal ComponentsThis case does not follow immediately from the proof of Proposition 1. In fact,

under the approximating model of the principal components we have a restricted pa-rameter space, say Θc

pc, that does not necessarily contain θc0 defined above for which the

idiosyncratic component is allowed to be cross-sectionally heteroscedastic. However, ifwe replace in the proof of Proposition 1 θc

0 with

θpc0 :=

A(L) = Ir; Λ = Λ0; Ψd = σ2

0In

where σ20 = 1

nTrΨ0, the result will follow along the same lines since we would haveθpc0 ∈ Θc

pc and hence LX(X; θ) ≥ LX(X; θpc0 ). In addition it is possible to show that

Fθpc0

have the same asymptotic properties as Fθc0. A detailed proof is available upon

request.

27


Recommended