VILNIUSUNIVERSITY DmitrĳCelov TIMESERIESAGGREGATION ...celov/dc/dcelov thesis.pdf · as AR(1),...

VILNIUS UNIVERSITY

Dmitrĳ Celov

TIME SERIES AGGREGATION, DISAGGREGATION AND LONGMEMORY

Doctoral Dissertation

Physical Sciences, Mathematics (01 P)

Vilnius, 2008

The scientific work was carried out during 2004-2008 at Vilnius University

Scientific supervisor:Prof. Dr. Habil. Remigĳus Leipus (Vilnius University, Physical Sciences,Mathematics – 01 P)

VILNIAUS UNIVERSITETAS

Dmitrĳ Celov

LAIKO EILUČIU AGREGAVIMAS, DEAGREGAVIMAS IR TOLIMAPRIKLAUSOMYBE

Daktaro disertacĳa

Fiziniai mokslai, matematika (01 P)

Vilnius, 2008

Disertacĳa rengta 2004-2008 metais Vilniaus universitete

Mokslinis vadovas:Prof. habil. dr. Remigĳus Leipus (Vilniaus universitetas, fiziniai mokslai,matematika – 01 P)

Contents

Acknowledgements 3

Notations 5

Introduction 7

1 Aggregation and disaggregation problems: a survey 151.1 Classifications and types of aggregation . . . . . . . . . . . . . . . 15

1.1.1 Approaches to the aggregation problem. . . . . . . . . . . 161.1.2 Main types of aggregation. . . . . . . . . . . . . . . . . . . 17

1.2 A general framework for individual relationships . . . . . . . . . . 241.3 Alternative notions of aggregation . . . . . . . . . . . . . . . . . . 27

1.3.1 Deterministic approach . . . . . . . . . . . . . . . . . . . . 271.3.2 Stochastic approach . . . . . . . . . . . . . . . . . . . . . . 271.3.3 Forecasting approach . . . . . . . . . . . . . . . . . . . . . 28

1.4 Large scale aggregation, doubly stochastic approach and mixtures 301.5 Cross-sectional aggregation of autoregressive distributed lags models 321.6 Disaggregation as an inverse problem of aggregation: DCO’s ap-

proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.7 Long memory in aggregated time series . . . . . . . . . . . . . . . 37

2 Aggregation and disaggregation of AR(1) processes 412.1 AR(1) aggregation scheme . . . . . . . . . . . . . . . . . . . . . . 412.2 Methods of disaggregation in AR(1) aggregation scheme . . . . . 44

2.2.1 Mixture density estimation based on Gegenbauer expansion 44

v

2.2.2 Polynomial density estimation in AR(1) aggregation scheme 462.2.3 Extension of the Chong’s and LOPV estimation methods . 49

2.3 Mixture density for the product of aggregated spectral densities . 502.4 Behaviour of spectral density of the aggregated process . . . . . . 532.5 Seasonal long memory case . . . . . . . . . . . . . . . . . . . . . . 552.6 FARIMA-type spectral density case . . . . . . . . . . . . . . . . . 592.7 Appendix. Mixture density associated with FI(d) spectral density 61

3 Asymptotic normality of LOPV estimator 653.1 Consistency of LOPV estimator . . . . . . . . . . . . . . . . . . . 653.2 Asymptotic normality: main result . . . . . . . . . . . . . . . . . 673.3 Moving average representation of the aggregated process . . . . . 703.4 Proof of the main result . . . . . . . . . . . . . . . . . . . . . . . 743.5 Appendix A. Proof of Example 3.2.1 . . . . . . . . . . . . . . . . 773.6 Appendix B. Proofs of lemmas 3.4.1–3.4.2 . . . . . . . . . . . . . 78

4 Simulations and empirical applications 834.1 Asymptotic normality: a simulation study . . . . . . . . . . . . . 844.2 Empirical comparison of alternative disaggregation schemes . . . . 87

4.2.1 Monte Carlo simulation study . . . . . . . . . . . . . . . . 874.2.2 Empirical application: aggregated consumption model . . . 91

Conclusions 97

Bibliography 99

Acknowledgments

A single conversation with a wise man is betterthan ten years of study.

Chinese Proverb

First of all, I would like to thank to my supervisor, Prof. Remigĳus Leipus, forhis careful guidance and patience during my study years. From him I learned thatscientific endeavor is more than just application of already developed techniques,but rather investigating new problems, building the theoretical background tosupport the empirical applications. I am also grateful to the major consultant,Anne Philippe, for her patience and help preparing the articles. Many thanksto the staff of the department of Econometric analysis, who gave a lot of crucialsuggestions, while presenting the results at the seminars, and especially to mycolleague Vaidotas Zemlys, for he was an excellent example of hardworking Ph.D.student and by most forced me to finish this thesis. Finally I am very thankfulto my wife and family for support and understanding, while me being occupiedpreparing the thesis.

Dmitrĳ CelovVilnius

October 6, 2008

Notations

Y(j)t an endogenous characteristic of jth micro unit.

ηt innovations common to all micro units – zero-mean strong white noise.

ε(j)t micro level jth individual innovations – zero-mean strong white noise.

a(j) a random autoregression’s parameter of jth micro unit with density function

ϕ(x).

Ωt an aggregate information set Yt−1,Yt−2, . . . ;Xt−1,Xt−1, . . . .

L the lag (back-shift) operator: Lxt := xt−1.

A(L) an autoregressive polynomial of order p: A(L) = 1−∑pk=1 akL

k.

B(L) a moving-average polynomial of order q: B(L) = ∑qk=0 bkL

k.

yt a real random vector (or their realization) (y1,t, . . . , yn,t).

N the set of natural numbers, N = 1, 2, . . . .

Z the set of integers, Z = . . . ,−2,−1, 0, 1, 2, . . . .

R the set of real numbers.

EX a mean of a random variable X.

µx(t) a cross-sectional empirical mean of a random variable X.

varX a variance of a random variable X.

cov(X, Y ) a covariance between random variables X and Y .

ft ∼ gt means that ft/gt → 1 as t→∞

‖·‖ a norm measure.

6

bxc an integer part of real number x.

1 an indicator function.

∆ the difference operator ∆xt = xt − xt−1.P−→ a convergence in probability.d−→ a week convergence.

l·(·) a slowly varying function in in Karamata’s sense: ∀c > 0, l(cx)/l(x)→ 1 as

x→∞.

F (a, b, c;x) a hypergeometric function

F (a, b, c;x) = Γ(c)Γ(b)Γ(c− b)

∫ 1

0tb−1(1− t)c−b−1(1− tx)−adt,

with c > b > 0 if x < 1 and, in addition, c− a− b > 0 if x = 1.

Introduction

A fact is a simple statement that everyone believes.It is innocent, unless found guilty. A hypothesis isa novel suggestion that no one wants to believe. Itis guilty, until found effective.

Edward Teller

Aggregation of individual time series data is common in many fields of stud-ies, including applied problems in hydrology, sociology, statistics, communicationnetworks, and especially in economics. It has a relatively long history of develop-ment in different statistical and econometric applications, meanwhile its inverse,disaggregation problem, is still under development.

Considering aggregation as a time series object, a number of important ques-tions arise. These comprise the properties of macro level data obtained by smallor large-scale, spatial or temporal contemporaneous aggregation, assumptions ofwhen and how the inverse (disaggregation) problem can be solved, finally, howto apply theoretical results in practice.

The relevance of both direct and inverse problems follows from the practi-cal needs. In economics, for instance, a need for (dis)aggregation techniquesoriginates firstly because the micro level behavioural relationships are generallydefined as a set of often unobservable data generation processes (DGPs), thedecision rules of many individual economic units: firms, households, regions,commodities, etc. However a lot of the relationships that applied economists an-alyze are the subject to generally linear large-scale aggregation in space (acrosseconomic units), (non)linear aggregation in time, or contemporaneous aggrega-tion over both time and space dimensions. On the other hand it is not a prioriclear what happens with these relationships when the micro level decision rulesare aggregated (consider totals and averages as the most usual examples of ag-

8

gregation in economics).In a formal way aggregated time series can be viewed as a transformation of

the underlying time series by some (either linear or non-linear) specific functiondefined at (in)finite set of individual processes. In this thesis a linear large-scaleaggregation, typical in many econometric applications, is primarily considered.

Under the influence of the pioneering contributions of Gorman (1953), Klein(1953), Theil (1954), Grunfeld and Griliches (1960), Ando (1971), and Granger(1980), the literature on aggregation has focused on two separate but closelyrelated issues, namely the aggregation problem, and the model selection problem.The former literature attempts to derive conditions under which macro modelswill reflect and provide interpretable information on the underlying behaviour ofthe micro units. The objectives of the model selection analysis are the forecastsof a macro variable dealing with the problem of choosing between the aggregateand the disaggregate specifications when the object of interest is prediction ofthe aggregate (or macro) phenomenon.

An increasing availability of panel data and the model selection analysis is-sues helped to renew the interest for the aggregation problem. It explains whythe aggregation problem, natural in the case of macroeconomic research, tendsto be present in micro-econometric analysis as well (Pesaran (2002)). Exam-ples of such analysis in economics include the panel data models of aggregatehouseholds’ consumption, labour supply, firms’ investment and production, em-ployment decisions developed for the economies of the United States of Americaand the United Kingdom, but the availability of panel data for the most of theother countries is questionable.

It is true that corresponding statistical agencies relatively easy provide anaccess to the aggregate data, but highly disaggregate longitudinal (panel) data isexpensive and difficult (if not impossible) to obtain for an applied research in mostcircumstances. Thus to work with the aggregates may become the only empiricalalternative to choose from.“Good news” is however that in some particular casesit is possible to solve an inverse problem, which strengthens the interest in thedisaggregation topic even further, since it may also help to improve the forecastsof the macro variables.

Nevertheless, results of aggregation are highly dependent on the assumptionsput on micro level DGPs ((non)linearity, (in)dependencies, heterogeneities, pres-ence of common (aggregate) factors, etc.). A search for a simple, yet acceptableapproximation of true DGP is an often practice to begin with. In fact it is found

9

convenient to approximate individual data by simple time series models, suchas AR(1), GARCH(1, 1) for instance (see Lewbel (1994), Chong (2006), Zaffa-roni (2004, 2006) among others), whereas such an assumption may be justifieddescribing behavior of small rivers’ water level, changes in private households’consumption or firms’ production, returns of financial assets, whenever morecomplex individual data models do not provide an advantage in accuracy and ef-ficiency of estimates, and usually are very difficult to study from the theoreticalpoint of view.

Furthermore, numerous examples in economics reveal that very often mi-cro level behavioral relationships are simple models of distributed lags or canbe approximated by autoregressive polynomials of an appropriate order. If dis-tributed lags models admit Koyck’s (1954) geometric lag representation, thenagain the AR(1) form with possibly some exogenous covariates can be obtained.On the other hand short memory AR(1) produce rather close approximates forthe residuals of individual (idiosyncratic) processes, often acquired after the firstdifferences. The need for the latter is motivated by the fact that either stochasticor linear trends are present in most of the macroeconomic variables (see Lew-bel (1994), and Robinson (1978), for instance), but (dis)aggregation problems arethe subject of stationary time series analysis. These facts explain the relevanceof the AR(1) (dis)aggregation problem presented in the thesis.

Aggregation by appropriately averaging the micro level time series modelscan give intriguing results. The corresponding aggregate (i.e., macroeconomic)variables can lead to much more complicated dynamics. Just summing up (oraveraging) N AR(1) models generally yields an ARMA(n, n + 1) with n ≤ N ,but close to N . As it was shown in the seminal paper by Granger (1980) thelarge-scale aggregation of infinitely many (N → ∞) short memory AR(1) mod-els with random coefficients can lead to a long memory fractionally integratedprocess, where long memory is understood in the sense of a non-summable au-tocovariance function (acf). It means that the properties of an aggregate timeseries may in general differ from those of individual data1. These findings, ac-cording to Chong (2006), challenge the validity of economic theories that relyon the assumption of highly criticized theory of representative agent and proba-bly is a strong empirical argument discussing some other theories in previouslymentioned fields of studies.

1The only unlikely trivial case when there is no difference between the aggregate and in-dividual processes is in the absence of heterogeneity of any type, i.e. the individual data ishomogeneous in functional, parametric, stochastic sense.

10

It is clear that the weakest point of the aggregation is a considerable lossof information about individual characteristics of the underlying data. Roughlyspeaking, an aggregated time series can not be as informative about the at-tributes of individual data as the micro level processes are. On the other hand,using some special aggregation schemes, which involve, for instance, indepen-dent identically distributed “elementary” processes with known structure (suchas suggested above AR(1) or GARCH(1,1)), enables to solve an inverse problem:to recover the properties of individual series with the aggregated data at hand.This question is a part of an extensive current research concerning the challengesof disaggregation problems. In spite of this, it is relevant to define the set ofgeneral key assumptions wherever an aggregate time series can be disaggregated.These particular questions have been recently studied at a rather general level inDacunha-Castelle and Oppenheim (2001), Dacunha-Castelle and Fermin (2006),Chong (2006), Leipus et al. (2006), which formed the background for the currentresearch.

The mainstream model which is investigated in the thesis is the AR(1) ag-gregation scheme with or without common innovations. A common specificationof AR(1) individual processes is the following equation2

Y(j)t = a(j)Y

(j)t−1 + ηt + ε

(j)t , t ∈ Z, j ∈ N, (1)

where Y (j)t – jth micro unit’s characteristic; ηt – innovations common to all micro

units, ε(j)t – micro level individual innovations, both innovations are zero-mean

second-order strong white noise sequences 3, a(j) – random parameter with usuallyunknown density function ϕ(x) supported by [−1, 1]. It is assumed, that a(j),ε(j)t and ηt are mutually and serially independent. This introduction of randomparameters is known as the doubly-stochastic approach and is a key suggestionto solve the large-scale (dis)aggregation problems.

Disaggregation in AR(1) aggregation scheme is highly dependent on the setof additional assumptions put on the density function ϕ(x). Most of the timessome parametric densities are considered. For example, in the seminal paperby Granger (1980) a family of Beta distributions was analyzed. It was shownthat aggregation in this case may lead to a fractionally integrated process. Be-

2More general case is suggested, for instance, in Granger (1980) and Zaffaroni (2004), whencommon innovations are defined as ρ(j)ηt, by this introducing a heterogeneity into individualreactions to common unobservable impulses.

3Here a strong white noise denotes an i.i.d. random process εt, t ∈ Z with mean Eεt = 0Eε2t = σε.

11

sides that, a uniform case has been studied in Lewbel (1994), were it was shownthat the long memory differs from that of fractionally integrated process as wasunder the Beta distribution’s assumption. Recently some new disaggregationapproaches have been suggested in Chong (2006) − a complementary result toLewbel (1994), and Leipus et al. (2006) − a semi-parametric approach for themodel without common innovations, which complements the result obtained inRobinson (1978). The disaggregation problem in this context is defined in thefollowing way: how, having only data from an aggregated process at hand, andassuming that these data are accumulated from i.i.d. (in general dependent forthe models with common innovations) short memory dynamics, to recover thedistribution of the individual processes. The former ϕ(x) estimator is justified bythe assumption, that a desired density could belong to the family of polynomialdensities. Assuming that the true polynomial’s order m is known, one needsto derive m first empirical moments of the random parameter and to solve thecorresponding system of equations, which relates empirical moments to theoret-ical ones. The latter estimator (from here referred as LOPV) is based on theexpansion of the density function on the basis of orthogonal Gegenbauer polyno-mials. However common innovations in this case are not allowed, i.e. the AR(1)dynamics is described by a simpler model

Y(j)t = a(j)Y

(j)t−1 + ε

(j)t , t ∈ Z, j ∈ N, (2)

and a(j) density ϕ(x) has the following semi-parametric form

ϕ(x) = (1− x)1−2d1(1 + x)1−2d2ψ(x), 0 < d1, d2 < 1/2, (3)

where ψ(x) is some continuous [−1, 1] function such that (3) is a density, and theassumptions for d1, d2 correspond to the long memory case – one of the key topicsof the thesis. The LOPV estimator is based upon two significant facts: boththe underlying process and the aggregate process have the same autocovariancefunction (it is not in general true if the common innovations are present), andempirical moments can be expressed by the empirical autocovariances.

Preliminary empirical investigations revealed that none of the methods wasfound to outperform another. Chong’s method is narrowed by the class of polyno-mial densities, and the second method is not effective in the presence of commoninnovations. Both methods work correctly under assumptions proposed in thecorresponding articles. So a new extension which combines the ideas from these

12

two approaches has been suggested.

Aims and problems. The main objectives and problems of the thesis arerelated to the statistical properties of a mixture density estimator in AR(1)(dis)aggregation scheme originally proposed in Leipus et al. (2006) and its em-pirical applications. To be more precise these problems encompass

• the method of how to construct a new mixture density from "elementary"ones (the constructive approach for brevity);

• the proof of asymptotic normality of LOPV estimator;

• the augmentation of LOPV estimator to include the common innovationsas in (1).

In parallel to the theoretical results the empirical applications of LOPV esti-mator and the constructive approach are investigated.

Methods. Methods of time series analysis, general probability theory andmathematical statistics are applied. The asymptotic normality results for thequadratic forms of linear processes developed by Bhansali et al. (2007) are usedto prove the asymptotic normality of LOPV estimator. In empirical part manykey propositions are carefully checked by the means of Monte Carlo simulations.

Novelty. Novelty of the results is closely related to the formulated problems.In particular, the asymptotic normality of LOPV estimator, the constructiveapproach have not been earlier considered in AR(1) disaggregation literature.Also the augmentation of LOPV estimator has been proposed.

Publications and presentations. The main results are published in the fol-lowing articles:

1. Celov D., Leipus R. and Philippe A. Asymptotic normality of mixturedensity estimator in a disaggregation scheme. Journal of NonparametricStatistics (submitted), (2008).

2. Celov D., Leipus R. and Philippe A. Time series aggregation, disaggregationand long memory. Lithuanian Mathematical Journal 47, 4, 379-393, (2007).

Conference material is published in:

13

1. D. Celov, V. Kvedaras, R. Leipus. Comparison of estimation methodsfor the density of autoregressive parameter in aggregated AR(1) processes.Lithuanian Mathematical Journal 47, (spec. issue) (2007)(in Lith.).

2. D. Celov, R. Leipus Time series aggregation, disaggregation and long mem-ory. Lithuanian Mathematical Journal 46, (spec. issue) (2006) (in Lith.).

Several presentations at conferences were given on the topics of this thesis, in-cluding: 47-49 Conferences of the Lithuanian Mathematical Society, 22nd Nordicconference on mathematical statistics (NordStat), at the seminars of the De-partment of Econometric Analysis of Vilnius University, during foreign visitsin Nantes (France) and Oberwolfach (Germany), and at Druskininkai summerschool.

Structure of the thesis. The thesis is organized in the following way: Chap-ter 1 provides a survey of some general aspects and notions of time series(dis)aggregation problems, methods, their properties, particularly related to longmemory phenomenon, and applications. A general background for the furtherresearch is presented. Chapter 2 provides more insight on the AR(1) aggrega-tion scheme, the constructive approach and the examples of its applications arepresented. Chapter 3 is fully dedicated to the asymptotic normality problemof LOPV estimator, it also provides a short survey of already proved statisticalproperties of the estimator. Chapter 4 presents some practical applications ofthe obtained results. Therein, Monte Carlo simulations regarding the asymptoticnormality and the comparison of alternative disaggregation methods, an appli-cation and analysis of G7 aggregated consumption’s disaggregation results arepresented. Finally, main results of the thesis are summarized.

Chapter 1Aggregation and disaggregation

problems: a survey

Good things, when short, are twice as good.

Baltasar Gracian, The Art of Worldly Wisdom

Abstract

In this chapter some general aspects and notions of time series aggregation,disaggregation problems, methods, their properties, particularly related tolong memory phenomenon and applications, are investigated. Mentionedin this survey types of aggregation include: small and large scale spa-tial, temporal and time-space aggregation, though in this chapter only thesecond (large-scale) type of aggregation is presented in more details. Inthis chapter the cross-sectional aggregation of autoregressive distributedlags models, which also covers the AR(1) case, is introduced. Review ispresented in a formal way, many technical details are omitted.

1.1 Classifications and types of aggregationAggregation compared to its inverse – disaggregation – has a relatively long his-tory of development. However even here applied researchers have a variety ofreactions to the aggregation problem. At one extreme many applied econometri-cians have chosen to ignore it at all, arguing that empirically it is of the secondimportance. At the other extreme to avoid the difficulties which may arise as theconsequences of aggregation (an aggregation bias in forecasts, the inconsistencyof representative agent’s theory, the long memory phenomenon are a few to bementioned in this context) researchers have opted to deal with highly disaggre-gated panel data models. Due to the trivial absence of such data, the latter cannot help in every circumstances, but to ignore the aggregation problems is notan acceptable option as well (Pesaran (2002)). In spite of this it is important to

16 1. Aggregation and disaggregation problems: a survey

Initial for-mulationDCCE

Aggregatefunction

Linear

Non-linear

DCCE

Ht = DtRtDt

TypeCross-sectio-

nal

Smallscale

Largescale

Tempo-ral

In timeand

space

DCCE

Ht = DtRtDt

Approa-ches

Determi-nistic

Stochas-tic

Forecas-ting

Doublystochas-

tic

AggregationIndividualprocesses

Linear

Non-linear

Cond.het-erosk. Sources

ofhetero-geneity

Inputs

Parame-ters

Functio-nals

Figure 1.1: The key aspects of aggregation.

understand the spectrum of aggregation problems and the difficulties that mayarise dealing with its inverse. The mind-map 1.1 summarizes the key aspects ofaggregation exposed in the subsections below, namely: the properties of idiosyn-cratic processes, the constructive approaches to the aggregation problem, themain types (dimensions) of aggregation, and the form of the aggregate function.

1.1.1 Approaches to the aggregation problem.

First of all we have to define the basic notions of aggregation. In the field oflarge-scale linear aggregation currently exist three basic constructive approachesto the aggregation problem (Pesaran (2002)):

• a deterministic approach (see Section 1.3.1) originally postulated by Gor-man (1953), Klein (1954),Theil (1954), Muellbauer (1975) is found to be

1.1. Classifications and types of aggregation 17

very restrictive, since it requires the aggregate function to match exactlythe sum of micro functions for all realizations of disaggregate variables.

• a statistical or stochastic (see Section 1.3.2) approach advanced by Kele-jian (1980), Stoker (1984), Lippi (1988) and Lewbel (1994) is less restrictiveand induces relationships between the population aggregates from the jointprobability distribution of the micro variables and the parameters of microequations. In other words the same deterministic requirements should holdon average only.

• a forecasting approach (see Section 1.3.3) proposed in Garderen etal. (2000), and Pesaran (2002) views aggregation as a forecasting problemwhere the focus of the analysis is on the optimal prediction of the aggregatevariables conditional on available aggregate (or disaggregate) information.So the aggregate function should be an acceptable optimal forecast ratherthan satisfy strict deterministic or stochastic requirements.

1.1.2 Main types of aggregation.

The analysis of a vast literature on the problem reveals that there exist four maintypes of aggregation: small-scale and large-scale cross-sectional, temporal, andtime-space aggregations. The main attention in the thesis is paid to the cross-sectional large-scale aggregation (see Section 1.4 below). Yet foreboding the moredetailed discussion of large-scale aggregation other types of aggregation shouldbe considered.

Small-scale cross-sectional aggregation of univariate stationary dynamic(ARMA) models, worked out in detail in Granger and Morris (1976), reveals theimportance of the autoregressive part of the individual process. First, it shouldbe noticed that aggregation of N moving average (MA) processes of orders qj

Y(j)t =

qj∑k=0

b(j)k ε

(j)t−k, j = 1, 2, . . . , N,

is trivial since it results in MA(maxj(qj)), thus the order of the aggregate processcan not exceed the maximum of the underlying individual processes. But whenthe individual time series include autoregressive parts the situation becomes dif-ferent.


Example 1.1.1. A simple example of such aggregation would be an aggregationof two AR(1) processes (Granger (1990)):

Y(j)t = a(j)Y

(j)t−1 + ε

(j)t , j = 1, 2,

where ε(j)t are independent white noise sequences. The aggregate process St

defined as the sum St = Y(1)t + Y

(2)t is often found to follow a higher order

autoregressive process ARMA(2,1).

A more general case is described in the following theorem.

Theorem 1.1.1. [Granger and Morris (1976)] Let random processes Y (j)t be rep-

resented by ARMA(pj, qj), j = 1, 2, i.e.

A(L)Y (j)t = B(L)ε(j)

t ,

where A(L) = 1 −∑pjk=1 a

(j)k Lk, B(L) = ∑qj

k=0 b(j)k Lk are correspondingly autore-

gressive and moving-average polynomials, ε(j)t are the zero-mean strong white

noise sequences, and Lxt = xt−1 denotes a lag (back-shift) operator. ThenSt = Y

(1)t +Y (2)

t is an ARMA(m,n) withm ≤ p1+p2 and n ≤ maxp1+q1, p2+q1.

Practical conclusions of Theorem 1.1.1 are that, the small-scale aggregationcan help to explain the nature of mixed ARMA(p, q) models. The small-scale ag-gregation of simple autoregressive models, AR and MA models, the observationerror or autoregressive signal plus noise situations, non-integer lag and feedbackmodels are a few of real data cases likely to give rise to ARMA models. Secondly,when the number of summands increases, i.e. going to the context of large-scalecross-sectional aggregation, Theorem 1.1.1 in general leads to an extremely high"true" order of an aggregate ARMA time series model, and to a much more com-plicated dynamics compared to the underlying idiosyncratic processes. Howeverin this case the caveat to the practical application of the mathematically correctTheorem 1.1.1 is that a lower order ARMA process (a model with fewer param-eters) can approximate the underlying process more efficiently in the sense ofinformation criteria’s minimum than the search for a true order model. More-over techniques used to prove small-scale aggregation are not valuable in thelarge-scale case. To explain interesting empirical findings in macroeconomic se-ries, including long range dependence (long memory phenomenon) one needs tofind out a specific framework and to develop new methods. All this historicallyinfluenced the adoption of doubly-stochastic approach to the aggregation needs


and suggested the usage of the mixtures of densities as a key method of thelarge-scale aggregation’s analysis.

Finally, note, that the availability and development of many new time seriesmodels, including the models of stochastic volatility, make the methods of small-scale aggregation still demanded.

Temporal aggregation. The third type of aggregation, temporal aggrega-tion or aggregation in time, concerns about the relationships between high fre-quency (disaggregate), denoted by yt, and low frequency (aggregate), denotedby y∗t and observed at the moments (k, 2k, 3k...) for some integer k (rational forthe inverse problem); time series models and their impact on main features ofthe disaggregate model: forecasting abilities, causality, cointegration, etc. (seeGranger (1990), Silvestrini and Veridas (2008) among others).

Though the choice of the time series frequency is often given by some economicarguments, most of the times this choice is somewhat subjective. On the otherhand the choice of the frequency influences the estimation results, i.e. given thesame economic or econometric model, estimation results will be different for eachfrequency, but the models will be related through some polynomial T (L). Sincethe high-frequency model is richer, information wise, it makes sense to thinkthat the low frequency model should not be estimated from the data but ratherinferred from the high-frequency model, wherever the latter is available. The wayin which these two models interact is the main objective of temporal aggregation.Broadly speaking, the solution lies in deriving the low frequency model from thehigh frequency model, and it involves two main stages. Consider for instanceSARIMA-GARCH, the workhorse model of time series analysis, specified in termsof lag polynomials whose orders have to be chosen. Usually the technique oftemporal aggregation permits to infer the orders of the low frequency model (e.g.annual) from those of the high frequency model (e.g. quarterly). The inverse isnot that simple, since it involves the fractional-lags the technique used should besimilar to that of fractionally integrated processes.

Example 1.1.2. Silvestrini and Veridas (2008) provides many examples of tem-poral aggregation in SARIMA-GARCH setting. Let, for instance, yt followsARIMA(p, d, q) and y∗t follows ARIMA(P,D,Q), where corresponding errorterms possibly defined by a GARCH model. Assuming and bearing in mind


the following linear aggregation scheme

y∗t =K∑j=0

wjyt−j =K∑j=0

wjLjyt = W (L)yt, (1.1)

where A;wj, j = 0, 1, ..., A are given, T (L) is chosen such that the left-handside of the disaggregate model

T (L)A(L)(1− L)dyt = T (L)B(L)εt

is equal to the left-hand side of the aggregate model

A∗(Lk)(1− Lk)Dy∗T = B∗(Lk)ε∗T , T = 0, k, 2k, . . . .

It is worth noting, that the relation between yt and y∗t models implies immediatelythat: p = P and d = D, besides the inverted roots of A∗(Lk) are equal to theinverted roots of A(L) powered by k. Thus in this case

T (L) = A∗(L)(1− Lk)DW (L)A(L)(1− L)d =

[1− Lk1− L

]d+1 p∏j=1

1− δkjLk

1− δjL.

Finally, for the MA part’s relationships we need to equate the autocovariancesstructures of T (L)B(L)εt and B∗(Lk)ε∗T . The solution of the corresponding sys-tem is then straightforward. Note, however that Q is not equal to q. In thissetting it can be proved that Q = b (p+d)(k−1)+q

kc. In the same fashion all other

SARIMA-GARCH cases are investigated. The results are also extended intomultivariate case.

Even from this brief review it may be concluded that temporal time seriesaggregation’s topic has a wide set of questions yet to be solved. Here we haveexamined only the direct, linear aggregation problem. Its inverse, involving frac-tional lags, to our knowledge, is not yet investigated at the same detailed level asthe direct one. Even among the linear temporal aggregation problems: multivari-ate models, nonlinearities, long memory, random aggregation, continuous timeand state space representations are also the relevant topics for further research.Include also many stochastic volatility models and non-linear aggregation and itwill prove that the topic is far from being exhausted.

Contemporaneous aggregation in time and space. Spatial correla-tion. A natural consequence of spatial and temporal aggregation as used here


is the contemporaneous analysis of both time and space dimensions of so calledtime-space models (see Giacomini and Granger (2004)). It is pointed out thatmany variables of economic interest (e.g. the European Union’s macro variables)are contemporaneous aggregates of the variables observed over time and acrossdifferent regions. As in the case of model selection problem the analyst inter-ested in forecasting the aggregate across regions may ask whether it will be moreefficient to forecast the aggregate series directly or to model the individual com-ponents separately and then aggregate the forecasts. Whatever the model usedfor forecasting, it is plausible to think that the final decision on whether to ag-gregate or not will be sensitive to the degree of spatial interdependence existingamong the variables measured in the different regions. In this framework a morespecific spatial dependences are considered. Thus without loss of generality ig-noring the temporal aggregation, we investigate solely the consequences of spatialaggregation in the following example.

Example 1.1.3. A simple example of time-space model is a spatial AR(1,1) inwhich a variable xt is measured over time in three neighboring in-line regionsi − 1, i and i + 1. Because of the spatial proximity of the regions, it can beassumed that the value of the variable at time t in region i depends on the valuesof the variable at all three locations at time t− 1.

x(i)tx

(i−1)t−1 x

(i+1)t−1

ψ1

φx(i)t−1

ψ2

Then the relationship can be expressed as

x(i)t = φx

(i)t−1 + ψ1x

(i−1)t−1 + ψ2x

(i+1)t−1 + ε

(i)t , (1.2)

where ε(i)t is a zero-mean white noise uncorrelated across regions. Denoting the

spatial aggregate measured in region i and at time t with S(i)t (x) = ∑i+1

k=i−1 x(k)t ,

we have the aggregate relationship

S(i)t (x) = φS

(i)t−1(x) + ψ1S

(i−1)t−1 (x) + ψ2S

(i+1)t−1 (x) + S

(i)t−1(ε). (1.3)

If in addition to assume that the edge effects are negligible, i.e. that the aggregatevariables S(i)

t−1(x), S(i−1)t−1 (x) and S(i+1)

t−1 (x) are approximately equal, therefore (1.3)becomes

S(i)t (x) = (φ+ ψ1 + ψ2)S(i)

t−1(x) + S(i)t−1(ε). (1.4)


It follows, that the process for the aggregate is approximately an AR(1) andthe coefficients of spatial dependence (ψ1 and ψ2) are incorporated into the au-toregressive coefficient. We can thus conclude that aggregation causes in thiscase a simplification of the dynamic properties of the process. Since additionaltemporal aggregation does not bring new results despite of already mentionedin the previous part of this review, this is the main threat in the context of theaggregation of time-space models, not considered before in the previous settings.

Dealing with up to k neighbours aggregation of time-space AR(1,1) in generalleads to a vector autoregressive model (VAR) of order 1 representation, withrigorous imposed a priori restrictions on the interdependence due to the spatialallocation of the variables, exploiting the fact that contiguous regions could berelated in a systematic way.

Note, that in this thesis we study models that do not possess spatial corre-lation. Though idiosyncratic processes may be affected by common factors theyassumed to be spatially uncorrelated. For further details on time-space aggrega-tion an interested reader is referred to Giacomini and Granger (2004).

Aggregation of conditional heteroskedasticity models. Though in thethesis linear aggregation of AR(1) models is primarily considered, research ofnon-linear aggregation (e.g. by such functions as minimum, maximum, medianor quantiles) and the of conditional heteroskedasticity (CH) models are also thekey fields of current extensive research, particularly on the ground of modelingthe long memory phenomenon, the leverage effect and other well known empiricalstylized facts of asset returns (see Granger and Ding (1996), Andersen and Boller-slev (1997), Zaffaroni (2006) among others). However, the widely used class ofARCH/GARCH models, specified in a such way that the conditional variance isa weighted linear combination of squares of past returns and the conditional vari-ances itself, is not capable to reproduce the covariance long memory of squaredreturns (see Giraitis et al. (2000), Kazakevičius et al. (2004), Zaffaroni (2006),Giraitis et al. (2007)).

Since economic reasons of the above mentioned long memory phenomenon arenot very clear, attempts have been made to explain it by simple heterogeneousshort memory models involving regime shifts or aggregation. Yet a few modelsof stochastic volatility were found to reproduce the long memory property viacontemporaneous aggregation schemes: summing and averaging across observa-tions. Actually the two models among a very large class of square root stochasticautoregressive volatility (SR-SARV) and CH models are shown to possess the


long memory in the result of linear large-scale aggregation, namely: a nonlinearmoving average model, equivalently representable as SR-SARV(1), introduced inRobinson and Zaffaroni (1998); and the contemporaneous aggregation of linearARCH (LARCH) models (Giraitis et al. (2008)). The following two examplesillustrates these findings.

Example 1.1.4. Non-linear moving average in the simplest formulation for nidentically distributed individual processes can be defined by

x(j)t = u

(j)t

∞∑k=1

(α(j))kε(j)t−k, j = 1, 2, . . . , n, (1.5)

with |α(j)| < 1, ut and εt are zero-mean second-order stationary, in general mutu-ally correlated, ut also standard, white noise sequences. Denoting

√D(xt|Ωt−1)

by ht, follows the following random parameter AR(1) model

ht = α(j)ht−1 + ε(j)t . (1.6)

It can be shown that (1.5)–(1.6) is SR-SARV(1), with no memory both under theassumption of common innovations’ (u(j)

t = ut, ε(j)t = εt, for all j) restrictions,

and when it is relaxed u(j)t = a(j)ut, ε(j)

t = b(j)εt for some heterogeneity in loadsof the common factors (a(j), b(j) are i.i.d random variables)

Assuming that idiosyncratic parameters α(j) are independent copies of αdrawn from an absolutely continuous distribution supported by [0, ϑ) and thecorresponding density

f(α) ∼ cl(1/(ϑ− α))(ϑ− α)δe−β/(ϑ−α2), as α→ ϑ−, (1.7)

with l – slowly varying function1, the aggregate process

Xt = ut∞∑k=1

E(α(j))kε(j)t−k−1,

possess the long memory cov(X2t , X

2t+h) ∼ ch−4d−2 as h → ∞, if ϑ = 1, β =

0, δ > −1/2. Evidently the result shares the idea of Granger (1980) in originalAR(1) aggregation scheme.

1Recall that a positive measurable function l(·) defined on some neighborhood [a,∞) ofinfinity is said to be slowly varying in Karamata’s sense if and only if for any c > 0, l(cx)/l(x)converges to 1 as x tends to infinity. Classical examples of slowly varying functions are l(x) =log(x) and l(x) = b, where b is a positive constant.


Example 1.1.5. Now consider the GLARCH(1,1) model with

Xt = ζtVt, ;Vt = α + β + cβ√

1− β2Xt − 1 (1.8)

= α/(1− β) + c√

1− β2∞∑j=1

βjXt−j, (1.9)

where parameters α/(1− β) =: α′ ∈ R, 0 < c < 1, 0 < β < 1, and cβ < 1. Hereparameters (α′i, βi), i ∈ N, assumed to be random with a common distribution(α′, β), and E(a′)2 <∞, c is a constant, ζt denotes common innovations.

Defining the partial aggregates X(N)t = N−1∑N

j=1X(j)t , V

(N)t =

N−1∑Vj=1X

(j)t , where X(j)

t = ζtV(j)t is a stationary solution to random coeffi-

cient LARCH(1,1) model, the limiting aggregated process, which is proved to becorrectly specified, for 0 < q < 1/2 has the long memory in volatility (Vt), i.e.cov(V0, Vh) ∼ c2(V )h−2q, if α′ = 1, and β has Beta distribution with probability

f(x) = B(p, q)−1xp−1(1− x)q−1, x ∈ (0, 1), q ∈ (0, 1], p > 0.

Moreover, it is also proved that both X2t and V 2

t under the same assump-tions possess the long memory in the same sense of non-summable correspondingautocovariance functions.

In sum, all the cases considered in this section, as well as the contemporaneouscross-sectional large-scale linear aggregation presented in the thesis, provide astrong background and a huge selection of problems for further research.

1.2 A general framework for individual rela-tionships

Every model that accounts for aggregation over individuals must begin with aspecification of individual behavior, or an econometric model applicable to indi-vidual (micro) level data. To do so it is relevant to present a general frameworkfor individual relationships.

Aggregation from the practical point of view is not trivial problem whenthere exists a certain sort of heterogeneity among the individual processes. In-deed when the individual time series are identical (homogeneous) in every sense,aggregation becomes trivial and is not a problem. However this case is not likelyto be present in practice. So one should consider the following possible sources

1.2. A general framework for individual relationships 25

of heterogeneity (Pesaran (2002)):

• input variables (either observable or unobservable heterogeneous initial en-dowments)

• micro parameters (heterogeneous coefficients)

• micro functionals (heterogeneous preferences, production functions, behav-ioral equations, distributed lags models, etc.)

Let the micro level relationship be represented in a vector form2 as follows

yit = fi(xit,uit,θi), i = 1, 2, . . . , N, t = 1, 2, . . . , T, (1.10)

where yit denotes the real vector of decision variables, xit is a vector of observablevariables, uit is a vector of unobservable variables, and θi denotes the vector ofunknown parameters. The most important cases are considered in the followingexamples.

Example 1.2.1. The simplest case is a pooled panel data model when the sourceof heterogeneity are, either observable or unobservable, different inputs acrossindividuals, but both functional form and parameters are the same for all microunits. To put it in another way in this case we have

yit = f(xit,uit,θ), i = 1, 2, . . . , N, t = 1, 2, . . . , T. (1.11)

This example may arise in the analysis of linear in parameters, but non-linearin variables Engel curves

wit = α0 + α1 log xit + α2(log xit)2 + uit, (1.12)

or in the analysis of Cobb Douglas (CD) production function of the form,

yit = ALαitK1−αit euit , (1.13)

which is easily transformed into a log-linear form

log yit = logA+ α log(Lit) + (1− α) log(Kit) + uit. (1.14)2In the thesis a univariate case is considered, but in general specification of aggregation

procedure it is convenient to write in a vector form


For this type of heterogeneity, aggregation will not be a problem when the microrelationships are linear both in variables and in parameters.

Example 1.2.2. Pooled panel data model is not as realistic as mixed effectslinear models. Indeed it is more relevant to account for the heterogeneity ofsome or all parameters, but fix the functional form

yit = f(xit,uit,θi), i = 1, 2, . . . , N, t = 1, 2, . . . , T. (1.15)

Such a scenario arises in the analysis of non-linear constant elasticity of sub-stitution (CES) production functions

yit = (λiLδiit + (1− λi)K−δiit )−1/δieuit , (1.16)

or any other parametric family of functions. This is the particular case consideredin the thesis.

The analysis can also be extended to account for observed and unobservedmacro (aggregate) effects on individual behavior, namely

yit = fit(xit, zt,uit,vt,θi), i = 1, 2, . . . , N, t = 1, 2, . . . , T, (1.17)

where zt represents a vector of observed macro variables, and vt denotes a vectorof unobserved macro effects.

Example 1.2.3. Finally, the most exotic example in real applications is the casewhen the functional (not in the sense of a certain parametric family!) formdiffers across the individual units. This can happen studying the mixtures ofproduction functions, when some firms adopt Cobb-Douglas, but others − eitherCES, or CET (constant elasticity of transformation ), or any other productionfunction. It seems that problem will be related to a small-scale aggregation of thehomogeneous in functionals aggregates. Since the latter are usually non-linearlarge-scale aggregates, the problem in general case is likely to be an ill-posed one.

1.3. Alternative notions of aggregation 27

1.3 Alternative notions of aggregation

1.3.1 Deterministic approach

This approach employed for example by Gorman (1953) and Theil (1954), treatsall input variables and parameters as given and asks whether an aggregate func-tion exists which is identical to the function that results from the aggregationof the micro relationships. Let Yt = N−1∑N

i=1 yit be a population mean (whenN →∞ a partial sum) of individual processes. Then linear aggregation of (1.15)across all individual units, taking xit,uit and θi as given, firstly results in thefollowing expression for the population means (partial sums)

Yt = N−1N∑i=1f(xit,uit,θi). (1.18)

According to deterministic approach an aggregation problem is said to be presentif the aggregate function F (Xt,Ut,θa), where Xt = N−1∑N

i=1 xit, Ut =N−1∑N

i=1 uit, and θa is the vector of parameters of the aggregate function, differsfrom population mean (partial sum) (1.18). On the other hand perfect aggrega-tion holds if the condition

∥∥∥∥∥F (Xt,Ut,θa)−N−1N∑i=1f(xit,uit,θi)

∥∥∥∥∥ = 0 (1.19)

is satisfied for all realizations of xit,uit, and θi, where ‖a− b‖ denotes a suit-able norm discrepancy measure between a and b. However it is clear that this“deterministic” requirement turns out to be extremely restrictive and is rarelymet in applied economic analysis. The only case is found among the pooled dataregression models, i.e. linear models with fixed (identical) parameters across in-dividual units, when the model is also linear in variables. Meaning that not allpooled data regressions can be considered deterministically aggregatable. Thisfollows from the fact that condition (1.19) is violated whenever the f(·) is anonlinear function of either observable or unobservable inputs (see Engle curvesfor instance), even if it is linear by parameters identical across all idiosyncraticprocesses.

1.3.2 Stochastic approach

According to Pesaran (2002), strong restrictions of deterministic aggregation areprimarily related to the requirement that the condition (1.19) is satisfied for all


realizations of xit,uit, and θi, no matter how close or remote the possibility oftheir occurrence is. In spite of this it is natural to relax this deterministic re-quirement and to introduce less restrictive stochastic (statistical) approach wherecondition (1.19) assumed to hold “on average” only. This approach, suggestedby Kelejan (1980) and rigorously formalized by Stocker (1984), relates the cor-responding empirical means of yit and xit denoted as µy(t) and µx(t) at a pointof time (stock variable) or at a period of time (flow variable) and defines anaggregate relationship as one that links µy(t) and µx(t) at a point of time t. Inother words it treats both variables xit,uit and parameters θi as the realizationsof stochastic variables xt,ut,θ across individual units, having a joint probabilitydistribution function (pdf) P (xt,ut,θ;φt) with additional parameter vector φtthat could vary in time, but not across individuals. Then

µy(t) = Ψy(φt) =∫f(xt,ut,θ)P (xt,ut,θ;φt)dxtdutdθ (1.20)

and

µx(t) = Ψx(φt) =∫xP (xt,ut,θ;φt)dxtdutdθ. (1.21)

Besides that, assume that φt = (φ1,t,φ2,t), where φ2,t has the same dimen-sions as the xt, and suppose that for a given φ1t function Ψ is a one-to-onerelationship between φ2,t and µx(t). Then

φ2,t = Ψ−1x (φ1,t, µx(t)), (1.22)

and the corresponding exact aggregate function follows from

µy(t) = Ψy(φ1t,Ψ−1x (φ1,t, µx(t))) = F (φ1,t, µx(t)). (1.23)

Stochastic approach is clearly an advancement over the deterministic ap-proach, but it is still loosely related to real applications, and does not focuson the approximate nature of econometric analysis. Besides its reliance on un-conditional means does not allow to include dynamic models and systems. Thesimplest example of AR(1) aggregation is not suitable here. All these motivatesthe development of the forecasting approach to the aggregation problem.

1.3. Alternative notions of aggregation 29

1.3.3 Forecasting approach

To begin with, once again consider the deterministic condition (1.19), but nowrequire that the mean of

∥∥∥∥∥F (Ωt,θat)−N−1N∑i=1f(xit,uit,θi)

∥∥∥∥∥conditional on the aggregate information set Ωt =Yt−1,Yt−2, . . . ;Xt−1,Xt−1, . . . be as small as possible. For the simplic-ity denote an aggregate function by Ft and idiosyncratic functions by fit. Aclassical example of such minimization problem is to deal with the optimalin mean-squared error sense predictors, as for instance suggested in Pesaran(2002). Let ‖a− b‖ = (a− b)2, then from

E(Ft − Yt)2|Ωt

= E

[(Ft − E(Yt|Ωt))− (Yt − E(Yt|Ωt))]2|Ωt

= E

(Ft − E(Yt|Ωt))2|Ωt

+ E

(Yt − E(Yt|Ωt))2|Ωt

(1.24)

− 2(Ft − E(Yt|Ωt))(Yt − E(Yt|Ωt))|Ωt

it follows that the mean-square error optimal prediction is given by

Ft = E(Yt|Ωt) = N−1N∑i=1

E(fit|Ωt). (1.25)

In contrast with either deterministic or stochastic requirement for (1.19) to hold,the choice of Ft as a global minimum of E

(Ft − Yt)2|Ωt

does not require

the perfect match for all realizations of the triplet (xit,uit,θi). Indeed unlessE(Yt|Ωt) ≡ Yt we have rather

E(Ft − Yt)2|Ωt

= var(Yt|Ωt) > 0. (1.26)

Remark 1.3.1. It is also possible to define an aggregate prediction function,based on micro observations information set if available. Such construction in-volves a wider information set than Ωt, since the last contains aggregate dataonly. Thus the corresponding optimal predictor based on a wider information isin general more efficient (Pesaran (2002)).


1.4 Large scale aggregation, doubly stochasticapproach and mixtures

As it was mentioned before many of the most important economic variables aresimple, unweighted sums or linear aggregates of large numbers of components.Analyzing the examples related to small scale aggregation of AR(1) individualprocesses, it has been shown (see Theorem 1.1.1) that the aggregate process ingeneral will be a high order ARMA time series. When the number of idiosyncraticcomponents are millions it is not even possible to estimate the parameters ofreal order ARMA process. A different approach is to assume that the AR(1)parameters are drawn from some distribution. Roughly speaking, not only theinputs, but also the parameters are random (stochastic). This approach to theaggregation problem is known as doubly stochastic.

By this approach first of all it is convenient to define aggregation by the follow-ing general procedure as in Dacunha-Castelle and Oppenheim (2001), Dacunha-Castelle and Fermín (2006b), therein after we call DCO’s approach for brevity.Let Y (j) = Y (j)

t (A(j)(ω), ω′), t ∈ T, j ∈ N denotes a set of centered second-orderdoubly stochastic stationary elementary random processes, whose structure isdriven by random parametersA = A(j), j ∈ N – an ergodic process with known(or unknown) distribution ν; and another source of randomness can be associatedwith some noise sequence independent of A(j). Let X(N)(Y ) = X(N)

t , t ∈ T bea set of partial aggregates of elementary individual processes Y (j) defined by

X(N)t (A) = 1√

BN

N∑j=1

Y(j)t (Aj), (1.27)

where BN is a normalizing sequence of real numbers such that BN → ∞, asN →∞. Then the linear aggregation of Y (j) is possible if there exists such asummation-normalization procedure (1.27), where, for every fixed trajectory ofA, the sequence of partial sums X(N)(A) converges ν – a.s. in distributionto a process X, independent of A realization, which in this context is called theaggregate process (aggregation) of the elementary processes Y (j).

Of course the presented framework is too general to apply in practice. Infact all we can do is to investigate the particular cases of the aggregation ofdoubly stochastic individual processes. Known examples comprise aggregationof p-order autoregressive AR(p) processes and their continuous analogy p-orderOrnstein-Ulenbeck OU(p) processes (see Oppenheim and Viano (1999), (2004)).

1.4. Large scale aggregation, doubly stochastic approach and mixtures 31

The particular interest is paid to the aggregation of AR(1) individual processes.Analyzing these examples two main cases are studied in detail:

1. The Y (j)t (A(j), j ∈ N is a sequence of independent elementary processes

– the case without common innovations. In this case it is proved that BN

is asymptotically equivalent to N .

2. The Y (j)t (A(j), j ∈ N is a sequence of linear Gaussian elementary processes

with common innovations η = ηt, t ∈ T independent of A, then BN isasymptotically equivalent to N2 (Dacunha-Castelle and Fermín (2006a)).

Remark 1.4.1. An interesting conclusion in this context follows from the analysisof the innovations’ decomposition. Recall, that the innovations can be describedas in Granger (1980) by the sum

y(j)t = ρ · ηt + ε

(j)t , ρ 6= 0.

If we consider two extreme cases where one and only one innovation’s component(either individual ε(j)

t or common ηt standard strong white noises) is observable,and bearing in mind that the variance for the aggregated partial sums of inno-vations is equal to N(1 − ρ2) + N2ρ2, we may see that for N sufficiently largeand ρ close to zero, the common innovations become insignificant in the senseof R2 value at micro level, yet dominating the macro level relationship. Thisexplains why we can’t simply ignore common innovations if they are present inthe particular micro level DGP, analyzing the macro level data, even though incontrary we can do this at the micro level.

Secondly, mixtures of spectral densities, as the main tool to study aggregation,should be considered.

Definition 1.4.1. Let g(λ, y), λ ∈ [−π, π], y ∈ Rs, be a family of spectral den-sities and µ(dy) some positive measure. The mixture f of g by µ is definedby

f(λ) =∫

Rsg(λ, y)µ(dy) (1.28)

if and only if f(λ) is integrable with respect to the Lebesgue measure.

If we let random parameters A(j) be drawn from the common distributionµ, i.e. ν = µ⊗N then the existence of the stationary second order aggregatedprocess X is equivalent to the existence of its spectral density’s representation


(1.28), where g(λ,A(j)) is the spectral density of Y (j), and f is the spectral den-sity of the aggregate process. The key statement proved in Dacunha-Castelle andOppenheim (2001) explains the relevance of mixture technique solving disaggre-gation problems: every mixture can be considered as the result of the aggregationproblem. We will turn back to this discussion latter dealing with disaggregationproblems, but first it is important to make some final comments on the mainproperties of the large-scale aggregation.

We have seen that the presence of common innovations (common covariates,factors) at macro level is very influential in the sense of macro R2. However,generalizing the simple cases considered earlier in this section, common factorsare in general a vector of time series, and not all of its components are observable.Especially this happens if one tries to model the micro level DGPs. In this casecommon factors are not so important (even can be not statistically significant)to derive an appropriate approximation without them, thus they can be fairlyomitted. Latter however this leads to the misspecification and bias in estimatesof the parameters of the aggregated, macro level process. So if one suggests toderive macroeconomic relationships from such misspecified micro level models,this should be made very carefully.

Another caveat for the practical usage of large-scale aggregation is that theaggregation of non-linear in individual innovations models can lead to a linearmacro level aggregates. Once again it happens because individual innovations ortheir functions are dominated by the common innovations. Roughly speaking, atmacro level they entirely become a part of a residual term.

The problem of a large-scale aggregation presented in this section is a bitdifferent from the classical aggregation problems of Theil (1954) and many others,who were concerned with the relationships between micro and macro parameters,rather than with the effects on the specification of macro relationships givenmicro level relationships. However, we should admit that the two sets of resultsare rather complementary than competitive.

1.5 Cross-sectional aggregation of autoregres-sive distributed lags models

A general case which also includes the models studied in this thesis isKoyck (1954) autoregressive distributed lags (ARDL) model. The relevance ofthe latter follows from the fact that many key micro level behavioural equa-

1.5. Cross-sectional aggregation of autoregressive distributed lags models 33

tions admit Koyck ARDL representation. Consider for instance an economy inwhich each of the agents satisfies the simple ARDL model as in Lewbel (1994)or Pesaran (2002)

Y(j)t = a(j)Y

(j)t−1 + b(j)X

(j)t + u

(j)t , j = 1, 2, ..., N, t = 1, 2, ..., T. (1.29)

Equation (1.29) can be interpreted as a disaggregate random-coefficient model,where the following assumptions are made:

1. parameters (a(j), b(j)) are identically and independently distributed ofX

(i)t , u

(i)t copies of random variables (a, b), for all t, i, j – a standard as-

sumption in the aggregation and panel data literature, when mixtures areconsidered;

2. a stability assumption – |a| < 1 with probability 1, and the (1.29) is startedat time t = −∞;

3. a consistency assumption – X(j)t are second order stationary time series,

independent of u(i)s , for all j, i, t and s ≤ t;

4. micro disturbances u(j)t are serially uncorrelated zero-mean second order

stationary, and can be decomposed into common and individual (idiosyn-cratic) innovations

u(j)t = ρ(j)ηt + ε

(j)t , ρ(j) 6= 0,

where ηt and ε(j)t are zero-mean strong white noise sequences (in general

can be mutually dependent).

Note that if in (1.29) b(j) ≡ 0 for all j we will have an AR(1) model. Assump-tions 2–4 can be relaxed, allowing the random walks for instance. Besides that,the Y (j)

t can be already a differenced process. Under assumptions 1–4, applyingstatistical approach to the aggregation problem Lewbel (1994) showed that theaggregate process Yt satisfies

µy(t) =∞∑j=1

cjµy(t− j) + βµx(t) + µu(t), (1.30)

where µy(t), µx(t) and µu(t) denote the cross-sectional means (defined as limitsof the empirical means when N →∞) of the corresponding micro level variables.


Assuming that the above AR(∞) representation exists, one can calculate themoments of random autoregressive parameter a from the following recursions(see Brockwell and Davis (1991) for instance)

E(as) =s−1∑r=0

E(ar)cs−r. (1.31)

This method relating the moments of random parameters and the coefficientsof AR(∞) representation is the key part of Chong’s and hybrid disaggregationalgorithms discussed in the subsequent chapters.

By the forecasting approach it is proved in Pesaran (2002) that the optimalpredictor is

E(Yt|Ωt) =∞∑j=0

E(baj)Xt−j +∞∑j=1

E(aj)E(Ut−j|Ωt), (1.32)

where Yt, Xt, Ut are the aggregates (population means) of Y (j)t , X(j)

t and u(j)t ;

and the corresponding aggregate function

Yt =∞∑j=0

E(baj)Xt−j +∞∑j=1

E(aj)E(Ut−j|Ωt) + εt, (1.33)

with εt = Yt − E(Yt|Ωt). If to assume, as in Lewbel (1994), that a and b areindependent, consequently E(baj) = EbEaj = βbj, then from (1.33) it followsthat

Y (t) =∞∑j=0

bjLj[βXt + E(Ut|Ωt)] + εt = B(L)[βXt + E(Ut|Ωt)] + εt, (1.34)

thus the infinite-order autoregressive representation exists if and only if B(L) isinvertible and this turns out to be related to the distributional properties of therandom autoregressive parameter a. From the time series theory it follows thatB(L) is invertible if and only if all the complex roots λ of B(L) = 0 lies outsidethe unit circle, |λ| > 1 (see Brockwell and Davis (1991) or Pollock (1999)).

Remark 1.5.1. This theoretical result however can mostly be solved in realapplications, since we consider the finite number of generally finite moments anda truncated AR(∞) approximation AR(H). What is less realistic is to requirethat the (a, b) are mutually independent. Relaxing this assumption will providemore realistic framework, but then the AR(∞) recursions make no sense.

1.6. Disaggregation as an inverse problem of aggregation: DCO’s approach 35

1.6 Disaggregation as an inverse problem of ag-gregation: DCO’s approach

In the framework of doubly-stochastic approach to the aggregation problem(see Dacunha-Castelle and Oppenheim (2001), Dacunha-Castelle and Fermin(2006b)) we have seen that both the approach and mixtures of densities arethe key tools for understanding the properties of aggregation and the aggregates.Besides that these tools are essential to solve an inverse – disaggregation problem.In literature disaggregation topic is not presented so extensively as those for thedirect problem. Despite of this the objective – recovering of due to aggregationlost information about the attributes of the micro level variables will always bea relevant topic for the theoretical and applied research.

At a fundamental level the disaggregation problem can be formulated as fol-lows: which class of long memory processes can be obtained by the aggregationof short memory models with random coefficients?

To be precise consider again the case of aggregation by the doubly-stochasticapproach (1.27). The inverse problem will be formulated as follows. Let X bean aggregate (limiting) process. A fundamental question is: does there exista sequence of individual processes Y (j) driving an ergodic process A and anormalization sequence BN such that the partial sum sequence X(N)(A)converges in distribution to X for almost all realization A? Disaggregation’sexistence can allow to consider X, for instance in a statistic mechanics context,as an observable macro-process obtained as the result of suitable averaging ofthe Y (j), non necessarily observable, micro-processes; and in some cases, dis-aggregation allows to understand how long memory could be generated.

It is clear that the previous framework is very general, thus only a very par-ticular situations can further be investigated. Let X be a given centered station-ary process with spectral density F . Let G be a class of centered second-orderstationary processes, we denoted by G the class of the corresponding spectraldensities. Then, there exists a disaggregation of X on G if and only if the se-quence X(N)(Y ) converges in distribution to X for almost all realization Y

(this is exactly the deterministic approach to the aggregation problem discussedin section 1.3.1), with Y (j)(A(j)) ∈ G and A is an ergodic process with stationarydistribution ν. Recall that micro level innovations are in general defined as thelinear combination of idiosyncratic and common innovations, thus these two com-ponents form the two cases to be considered. In the first case (for idiosyncratic


part) disaggregation existence is equivalent to the existence of a mixture

Fi(λ) =∫

Rsg(λ, y)µ(dy), (1.35)

where µ is a common distribution of parameters A(j), and g(λ, y) ∈ G µ – a.s.;Mi(G) denotes all such spectral densities. For the common innovations disaggre-gation would mean the existence of the following spectral density’s decomposition

Fc(λ) =∣∣∣∣ ∫

Rsh(λ, y)µ(dy)

∣∣∣∣2, (1.36)

with h(λ, y) ∈ G µ – a.s.;Mc(G) denotes all such spectral densities. Disaggrega-tion in general case of common and idiosyncratic innovations is equivalent to theexistence of a convex positive representation of the corresponding spectral den-sity in the form aFi(λ) + bFc(λ), thus the results are deduced from the previoustwo situations. To put in another way the disaggregation problem is formulatedas when the F ∈Mi(G) ∪Mc(G) for the given F and G?

This fundamental question has been analyzed in detail in Dacunha-Castelleand Oppenheim (2001), Dacunha-Castelle and Fermin (2006b), where the au-thors proved that a large set of long memory processes, including classical longmemory examples and all processes whose spectral densities have a countablenumber of singularities controlled by exponential functions, is obtained by theaggregation of short memory processes having infinitely differentiable (C∞) spec-tral densities and the result can not be improved by taking for instance analyticfunctions instead of C∞. The prove is based upon the representation of thespectral densities by the means of multiplicative kernels

F (λ) =∫

RsF (λ)K(yφ(λ))φ(λ)(dy), (1.37)

chosen such that to get rid of the F (λ) singularities, i.e. for all singularity pointsλ0: F (λ0)K(yφ(λ0)) = 0, as well as for all the derivatives of this multiplicativekernel K φ. An example of such a kernel in AR(1) case is K(y) = 1

1+y , andφ(λ) = 1− cosλ.

Remark 1.6.1. According to Dacunha-Castelle and Oppenheim (1991), the dis-aggregation in AR(1) is equivalent for a correspondingK-transform to be a Mellintransform. Analyzing the products of elementary mixture densities we will seethat some other cases of spectral densities belonging to AR(1) can be considered.

Besides this fundamental results relevant topics in disaggregation comprise

1.7. Long memory in aggregated time series 37

the questions of how to construct a mixture density when for a given aggregatedtime series we know, that it can be obtained by the aggregation of short memoryprocesses (the above fundamental results are in general not constructive); how toestimate the mixture density when we have only aggregated data at hand; andwhat can be said about the statistical properties of the estimators?

Another aspect of disaggregation issues follows from the aggregation problem:how to relate the micro and macro level parameters. In general not all micro levelparameters can be recovered from the estimated parameters of the aggregatedrelationships. Important in applications examples include the estimation of thelong-run impacts of x(j)

t on y(j)t averaged across micro units, for example, in

ARDL (1.29) aggregation scheme

θL = 1N

N∑j=1

b(j)

1− a(j) → E b

1− a =∞∑j=0

E(baj) =∞∑j=0

aj, (1.38)

or the average of the individual "mean lags"

τN = 1N

N∑j=1

a(j)

1− a(j) → E a

1− a =∞∑j=1

E(aj) =∞∑j=1

bj, (1.39)

where ∑∞j=0 aj and ∑∞j=1 bj are the corresponding sums from the macro level

model. The most important aspect of ARDL aggregation scheme is that themacro level parameters are related to the moments of the random parametersin the individual processes, thus makes possible to solve the mixture density’sestimation problem by the means of the method of moments.

1.7 Long memory in aggregated time seriesLast but not least objective of the thesis is to analyze the long memory phe-nomenon which has been shown to occur as the result of the linear large-scaleaggregation of doubly-stochastic processes (the seminal paper by Granger (1980)is a good source to begin with).

In time series analysis, a feature of integrated series is an unbounded spectraldensity function at the origin, although this does not necessarily imply a unitroot. An alternative way of approaching unit roots is to consider the class offractionally integrated (long memory, LM) models, in which the ordinary dif-ferencing operator is allowed to take on fractional (non-integer) orders. Anunbounded spectral density can also arise in fractional models, even when the


series is, in fact, stationary, occurring when the fractional differencing parameterassumes a value in the range (0, 1/2) (Chambers (1998)).

Now let yt, t ∈ N be a sequence of real valued second-order stationaryrandom variables with the autocovariance function at lag h defined as

σ(h) = cov(yt, yt+h) = E(yt − µ)(yt+h − µ), with µ = Eyt. (1.40)

In the literature there are several alternative definitions of the long memoryproperty. Firstly, the concept of long memory is often understood in terms of aslowly decaying autocovariance function. According to Palma (2007)

Definition 1.7.1. yt has a long memory property if the autocovariance func-tions

∞∑h=−∞

|σ(h)| =∞, (1.41)

are not absolutely summable.

Another popular definition of the long-range dependence is directly relatedto the asymptotic properties of the σ(h).

Definition 1.7.2. yt is a long memory time series if the autocovariance func-tions

σ(h) ∼ h2d−1l1(h), 0 < d < 1/2, (1.42)

decays hyperbolically as h→∞, with d a long-memory parameter, and l1(·) – aslowly varying function (see the notations section for the definition of the latter).

Next widely used definition of strong dependence on the spectral domain isthe asymptotic of the corresponding spectral density function f(λ).

Definition 1.7.3. Yt is a long memory time series if its spectral density functionsatisfies

f(λ) ∼ |λ|−2dl2(1/|λ|), 0 < d < 1/2, (1.43)

for the λ in the neighbourhood of zero and l2(·) – a slowly varying function.

Finally, the fourth definition of long-memory behaviour is based on purelynon-determi-nistic part of the yt’s Wold decomposition yt = ∑∞

j=0 ψjεt−j. Thus

1.7. Long memory in aggregated time series 39

Definition 1.7.4. yt is a long memory time series if the coefficient ψj inpurely non-deterministic part of the yt’s Wold decomposition satisfies

ψj ∼ jd−1l3(j), 0 < d < 1/2. (1.44)

Examining the definitions involving the parameter d we have to notice thatfor d 6= 0, the above definitions implies a singularity in f(λ) at the origin, thesingularity being a pole when 0 < d < 1/2 or a zero when −1/2 < d < 0. Sincemany time series, particularly in macroeconomics, appear to have an unboundedspectral density at the origin, most attention has been given to the case whend > 0. Note that, for d > 1/2, the time series yt, is nonstationary, while ford < −1/2 it is noninvertible. The parameter d, in fact, can often be interpretedas a differencing parameter, and the notion of fractional differencing is the usualway in which long memory time series are motivated.

Unfortunately, unless further restrictions are imposed, these four definitionsare not necessarily equivalent (see Palma (2007) for some useful inclusions, as-suming that 0 < d < 1/2). Further extensions arises from the specific assump-tions put on slowly varying functions. A general approach for modeling the longmemory time series with the singularities outside the neighbourhood of zero isdiscussed in Leipus and Viano (2000).

Example 1.7.1. A well-known class of long-memory models is the autoregres-sive fractionally integrated moving-average (FARIMA) processes, which can beexpressed by

A(L)yt = B(L)(1− L)−dεt, |d| < 1/2, (1.45)

where A(L) = 1+∑pi=1 aiL

i and B(L) = ∑pi=1 biL

i are autoregressive and moving-average polynomials without common roots, L the lag (back-shift) operator, εt– a strong white noise sequence. The parameter d denotes the fractional orderof differencing and describes the long run (low frequency) behaviour of yt, whileA(L) and B(L) capture the short run (high frequency) properties. The spectraldensity function of yt is given by

f(λ) = |1− eiλ|−2dφ(λ), |d| < 1/2, (1.46)

where φ(λ) is slowly varying near zero. As it was mentioned before the parameterd controls if the process is stationary, invertible or posses the long memory.


Example 1.7.2. Another important example to be considered is the extensionof FARIMA model called in accordance to Giraitis and Leipus (1995) fractionalARUMA (also simultaneously investigated in Viano et al (1995)). The authorsconsidered that the spectral density of the fractional ARUMA model is describedby

f(λ) =∣∣∣∣ m∏j=1

(1− αjeiλ)−dj∣∣∣∣2. (1.47)

These spectral densities are obtained by linear filtering the second-order whitenoise through a fractional ARMA filter, that is a filter with a transfer functionof the form

m∏j=1

(1− αjz)−dj . (1.48)

Among other facts proved in these articles it is sown that the correspondingcovariance sequence σ(h) satisfies

σ(h) ∼ |h|2d−1β(h), (1.49)

with β(h) – a sum of sine waves and |d| < 1/2. According to the definition1.41 such fractional ARUMA either has long-memory (∑ |σ(h)| = ∞) or inter-mediate memory (∑ |σ(h)| < ∞). An example of how to calculate the mixturedensity corresponding to a particular case of seasonal fractional ARUMA modelis discussed in the applications’ part of the next chapter.

Chapter 2Aggregation and disaggregation of

AR(1) processes

The ability to simplify means to eliminate theunnecessary so that the necessary may speak.

Hans Hofmann

Abstract

The chapter provides preliminaries for further disaggregation research inAR(1) aggregation, disaggregation scheme and presents necessary back-ground theory for the problem discussed in the paper of Leipus et al (2006).Alternative methods of disaggregation are presented. Finally, theoretic re-sults for the constructive approach are analyzed. The examples of itsapplications, including seasonal fractionally integrated process, are alsodiscussed in this chapter.

2.1 AR(1) aggregation schemeIn this chapter we consider the special case of a short memory random parametersautoregressive distributed lags (ARDL) aggregation scheme. Recall from theprevious chapter that when the parameter βi in (1.29) is equal to zero for alli ∈ N, the ARDL model is equivalent to the aggregation of AR(1) dynamics as inGranger (1980). The key issue showed by Granger is that the contemporaneousaggregation of random parameter AR(1) processes can produce a long memory.Besides that, the practical relevance of the AR(1) case follows from rather closeapproximations of micro level dynamics, and some times by the same argumentsused to motivate the usage of ARDL models with Koyck (1954) lags.

First of all we provide a more rigorous specification of the questions related tothe aggregation of AR(1), and give more insight on its inverse (disaggregation)problem. Namely, here we study three alternative disaggregation methods: a

42 2. Aggregation and disaggregation of AR(1) processes

parametric (Chong (2006)), a semi-parametric (Leipus et al (2006)) approaches,and the augmentation of the latter (LOPV) method, referred to as a hybrid ap-proach. Besides that we investigate the constructive approach for an explicitform of the mixture density, which is important in some applications and simu-lations (e.g. Monte-Carlo experiments), at the same time it helps to understandthe nature of the long memory phenomenon observed in the aggregated data.Its intuition follows from the observation, that, typically, a spectral density canbe factorized into the product of much simpler "elementary" functions, for whichit is relatively easy to obtain the associated mixture densities. Thus the latterdisaggregation problem transforms into the following question: when the mixturedensity associated with a product spectral density can be obtained?

Let us give a precise formulation of the aggregation scheme, when the under-lying short memory models are described by an AR(1) dynamics1.

Let ε = εt, t ∈ Z denotes a sequence of independent identically distributed(i.i.d.) random variables (r.v.) with Eεt = 0, Eε2

t = σ2ε , and let a be a random

variable supported on [−1, 1] and satisfying

E[ 11− a2

]<∞. (2.1)

Consider a sequence of i.i.d. processes Y (j) = Y (j)t , t ∈ Z, j ≥ 1 defined by the

random AR(1) dynamics

Y(j)t = a(j)Y

(j)t−1 + ε

(j)t , (2.2)

where ε(j) = ε(j)t , t ∈ Z, j ≥ 1, are independent copies of ε = εt, t ∈ Z and

a(j)’s are independent copies of a. Here, the sequences a, a(j), j ≥ 1 and ε, ε(j),j ≥ 1, are independent. Under these assumptions, (2.2) admits a covariance-stationary solution Y (j) and the finite dimensional distributions of the processX

(N)t = N−1/2∑N

j=1 Y(j)t , t ∈ Z, weakly converge, as N → ∞, to those of a

zero mean Gaussian process X = Xt, t ∈ Z, called the aggregated process (seeOppenheim and Viano (2004)).

Assume that the distribution of r.v. a admits a mixture density ϕ, which by(2.1) satisfies ∫ 1

−1

ϕ(x)1− x2 dx <∞. (2.3)

1As we have already mentioned these assumptions in general coincides with the assumptions1–4 for the ARDL models

2.1. AR(1) aggregation scheme 43

The covariance function and spectral density of aggregated process X coin-cides with those of Y (j) and are given, respectively, by

γ(h) := Cov(Xh, X0) = σ2ε

∫ 1

−1

x|h|

1− x2 ϕ(x)dx (2.4)

andf(λ) = σ2

ε

2π

∫ 1

−1

ϕ(x)|1− xeiλ|2

dx. (2.5)

Equality (2.4) shows that the covariance function γ(h) can be interpreted as a h-moment of the density function (after normalization) ϕ(x)(1 − x2)−1 supportedon [−1, 1], and thus finding of the mixture density is related to the moments’problem (see Feller (1971)).

Note that an aggregated process X possesses the long memory property inthe sense of absolutely non-summable autocovariances (i.e. ∑∞h=−∞ |γ(h)| = ∞)if and only if ∫ 1

−1

ϕ(x)(1− x2)2 dx =∞. (2.6)

If the mixture density ϕ is a priori given and our aim is to characterizethe properties of the induced aggregated process (behavior of spectral density,covariance function, etc.), we call this problem an aggregation problem. And viceversa, if we observe the aggregated processXt with spectral density f and we needto find the individual processes (if they exist) of form (2.2) with some mixturedensity ϕ, which produce the aggregated process, then we call this problem adisaggregation problem. The second problem, which is much harder than thefirst one, is equivalent to the finding of ϕ such that (2.5) (or (2.4)) and (2.3)hold. In the latter case we say that the mixture density ϕ is associated with thespectral density f .


2.2 Methods of disaggregation in AR(1) aggre-gation scheme

2.2.1 Mixture density estimation based on Gegenbauerexpansion

In order to estimate the mixture density ϕ using aggregated observationsX1, . . . , Xn, Leipus et al. (2006) assumed the following semiparametric form:

ϕ(x) = (1− x)1−2d1(1 + x)1−2d2ψ(x), 0 < d1, d2 < 1/2, (2.7)

where ψ(x) is continuous on [−1, 1] and does not vanish at ±1. The proposedestimate is based on a decomposition of function ζ(x) = ϕ(x)(1 − x2)−α in theorthonormal L2(w(α))–basis of Gegenbauer polynomials G(α)

k (x), k = 0, 1, . . . ,where w(α)(x) = (1 − x2)α, α > −1. This decomposition is valid (i.e. ζ belongsto L2(w(α))) if ∫ 1

−1

ϕ2(x)(1− x2)α dx <∞, α > −1. (2.8)

Let G(α)n (x) = ∑k

j=0 g(α)n,j x

j. The resulting estimate has the form

ϕn(x) = σ−2n,ε(1− x2)α

Kn∑k=0

ζn,kG(α)k (x), (2.9)

where the ζn,k are estimates of the coefficients ζk in the α-Gegenbauer expansionof the function ζ(x) = ∑∞

k=0 ζkG(α)k (x) and are given by

ζn,k =k∑j=0

g(α)k,j

∫ 1

−1ϕ(x)xjdx =

k∑j=0

g(α)k,j (σn(j)− σn(j + 2)), (2.10)

σ2n,ε = σn(0) − σn(2) is the consistent estimator of variance σ2

ε and σn(j) =n−1∑n−j

i=1 XiXi+j is the sample covariance of the aggregated process. Accordingto Theorem 3.1.1 truncation level Kn satisfies

Kn = [γ log n], 0 < γ < (2 log(1 +√

2))−1. (2.11)

Under these conditions and corresponding relations between α and d1, d2, Lei-pus et al. (2006) studied the consistency of the estimator ϕn(x) assuming thatthe variance of the noise, σ2

ε , is known and equals 1. In more realistic situ-

2.2. Methods of disaggregation in AR(1) aggregation scheme 45

ation of unknown σ2ε , it must be consistently estimated. In order to under-

stand intuitively the construction of estimator σ2n,ε, it suffices to note that,

by (2.4), σ2ε = σ(0) − σ(2). Also note that the estimator ϕn(x) in (2.9)

possesses property∫ 1−1 ϕn(x)dx = 1, which can be easily verified noting that∫ 1

−1(1− x2)αG(α)k (x) dx = (g(α)

0,0 )−1 if k = 0, and = 0 otherwise, implying

∫ 1

−1(1− x2)α

Kn∑k=0

ζn,kG(α)k (x)dx = ζn,0/g

(α)0,0 = σn(0)− σn(2)

by (2.10).

Remark 2.2.1. LOPV disaggregation method is not in general usable in thepresence of common innovations (see the model (2.12) below). This is becausethe sequence of partial aggregates X(N)

t does not converge into not degeneratedistribution. Thus one needs to deal with Y (N)

t described in the subsection below.However in this case LOPV’s idea does not work any more, since Yt and individualprocesses have different autocovariances. To show this supposition it is sufficientto check if the variances are equal. Indeed if Dηt = σ2

η, Dε(j)t = σ2

ε , then:

D(Y (j)t ) = (σ2

η + σ2ε)(1 + Ea2 + Ea4 + . . . ),

butD(Yt) = σ2

η · (1 + (Ea)2 + (Ea2)2 + . . . ),

just as expected (see also Lippi and Zaffaroni (1998)).

Remark 2.2.2. In practical applications we have only the upper limit for thenumber of Gegenbauer polynomials Kn = [(2 log(1 +

√2))−1 log n], so the choice

of true number of polynomials is yet somewhat subjective. Besides that, theestimation of the parameter α is also a crucial problem in real applications.Preliminary Monte Carlo simulations presented in the thesis and in precedingarticle by Leipus et al (2006) suggest α to be a function of d1, d2. We postulatethat if 0 < d1, d2 < 1/2, then α = min(1 − 2d1, 1 − 2d2), else α = 0. Thelatter corresponds to polynomial density’s case. Memory parameters d1, d2 inthe empirical part obtained from the corresponding periodogram by the methodof Reisen (1994) similar to Geweke, Porter-Hudak (1983).


2.2.2 Polynomial density estimation in AR(1) aggrega-tion scheme

Chong (2006) differently from Leipus et al. (2006) considered the case of AR(1)aggregation with common innovations ηt

Y(j)t = a(j)Y

(j)t−1 + ηt + ε

(j)t , t ∈ Z, j ∈ N, (2.12)

under the same assumptions as in Gouriéroux and Monfort (1997) or in Lin-den (1999), namely: error components ηt, ε(j)

t j≥1 are zero-mean second orderstationary strong white noises, serially and pairwise independent, and indepen-dent of a(j)j≥1. Besides Y (j)

−1 = 0, ∀i, which is important deriving the theoret-ical expressions of the moments in the polynomial case, though not crucial forthe construction of an estimator.

Next the author supposes that a(j) are drawn from a continuous distributionin an interval [0, 1) whose density ϕ(x) can be represented by a polynomial oforder m and this density function is written as

ϕ(x) =m∑s=0

csxs1x∈[0,1), ϕ(x) ≥ 0,

∫ 1

0ϕ(x)dx =

m∑s=0

css+ 1 = 1. (2.13)

Finally, Chong requires that both m < ∞ and |cs| < ∞, ∀s = 0, 1, ...,m.Since ϕ(x) is a density function it should be in general agreement with its defi-nition, precisely

ϕ(x) ≥ 0,∀x ∈ [0, 1);∫ 1

0ϕ(x)dx =

m∑s=0

css+ 1 = 1. (2.14)

The estimation procedure is based on the moments’ problem, since in thecase of Chong’s parametric approach the r-th moment is given by

E(ar) =m∑s=0

css+ r + 1 , r = 0, 1, ...,m. (2.15)

Aggregating individual equations (2.12) and bearing in mind that Y (j)−1 = 0,

we get that the empiric mean (a partial sum) of N individual processes Y (N)t =

N−1∑Nj=1 Y

(j)t can be expressed by

Y(N)t =

t∑r=0

1N

N∑j=1

(a(j))rηt−r +t∑

r=0

1N

N∑j=1

(a(j))rε(j)t−r.

2.2. Methods of disaggregation in AR(1) aggregation scheme 47

Thus we have the following convergence in probability

Y(N)t

P−→t∑

r=0E(ar)ηt−r =: Yt, N →∞.

It follows that in AR(1) aggregation scheme with common innovations thecommon part dominates the idiosyncratic one.

Remark 2.2.3. In the presence of common innovations the assumption Y (j)−1 = 0

is not necessary Lippi and Zaffaroni (1998). Omitting this assumption meansthat

Yt =∞∑r=0

E(ar)ηt−r,

thus the autocovariance and the variance of the aggregated process Yt are corre-spondingly equal to

σ(h) = σ2u

∞∑i=0

Eai+|h|Eai

andvarYt = σ2

u

∞∑i=0

(Eai)2.

Now if in addition to assume that the aggregate process Yt is invertible, i.e.AR(∞) representation of Yt

Yt =∞∑j=1

AjYt−j + ηt,∞∑j=1

AjLj = 1−

( t∑r=0

E(ar)Lr)−1

, (2.16)

exists, where L denotes a lag (back-shift) operator (LYt = Yt−1). Then thecoefficients Aj are linked with the desired polynomial density moments throughthe recurrence relation (see, for example, Brockwell, Davis (1991)). That isinverting the equation Yt = θ(L)ηt, where θ(L) := 1 + ∑t

j=1 θjLj, one needs to

solve π(L) := ∑∞j=0 πjL

j = 1/θ(L) with respect to unknown parameters (eitherπj or θj). In the former case the solution is π0 = 1;πj = −∑j

k=1 θkπj−k, 0 < j ≤t; πj = −∑t

k=1 θkπj−k, j > t. In the latter – θ0 = 1; θj = −∑jk=1 πkθj−k, 0 < j ≤

t; θj = 0, j > t. Thus we can get either the coefficients of AR(∞) representationor the moments. This is the crucial part of the Chong’s method, which remainscorrect despite the relevance of polynomial density’s assumption. So the desiredmoments are obtained from the following recursion

E(as) =s−1∑r=0

E(ar)As−r. (2.17)


Since it is not possible to estimate an autoregression of infinite order, we haveto make a truncation at a finite order H, where the latter is defined by H = λnβ,1/2 < β < 1, λ > 0 with n being the length of the aggregated process. Then,having AR(H) estimated coefficients Aj, j = 1, . . . , H employing, for example,the method of Yule-Walker, and empirical equivalent of (2.17) equation

µs =s−1∑r=0

µrAs−r,

the estimatesµs of the random parameter’s moments µs := E(as) are derived.As long as polynomial density of order m is parametric, it is fully described

by its parameters C = (c0, c1, . . . , cm)′, then having the estimates of the momentsΛ = (1, µ1, . . . , µm)′ and introducing the auxiliary matrix

Ω =

(

1s+r+1

)ms,r=0

, supp a = [0, 1];(1−(−1)s+r+1

s+r+1

)ms,r=0

, supp a = [−1, 1],

from the equation Λ = ΩC immediately follows that the Chong’s estimator forthe disaggregation problem is C = Ω−1Λ.

Remark 2.2.4. Since it is unlikely to know a priori the order of the polynomialm Chong (2006) suggests to apply a specific information criteria, which measuresthe distance between empirical and theoretical autocorrelation functions. So mis chosen to be equal to arg minQ(m).

Remark 2.2.5. Chong (2006) also proves, that a long memory property∑∞h=−∞ |σY (h)| = ∞ in the case of polynomial density occurs if and only if

ϕ(±1) > 0. Thus polynomial densities clearly perform much worse when thetrue density function is different from polynomial (e.g. belongs to the Beta-typeof distributions). The LOPV estimator is more flexible, since it covers both casesof distributions.

2.2.3 Extension of the Chong’s and LOPV estimationmethods

The conclusion of a preliminary simulation study (Celov, Kvedaras and Lei-pus (2007)) was that none of the suggested methods, i.e. neither Chong’s norLOPV’s method is able to outperform another. Chong’s method is narrowed by

2.3. Mixture density for the product of aggregated spectral densities 49

the class of polynomial densities, and the LOPV is not effective (not correctlyspecified) in the presence of common innovations. On the other hand being semi-parametric the latter approach is more robust to the assumptions about the truedensity form.

From the expression (2.10) for the Gegenbauer expansion of the auxiliaryfunction ζ(x), whose form in presence of common innovations is not directlyrelated to the covariances of the aggregated process, belongs to L2(w(α)) whenever(2.8) holds, which is fully the property of the mixture density ϕ(x), but not ofthe composition of innovations. Thus it follows that

ζn,k =k∑j=0

g(α)k,j

∫ 1

−1ϕ(x)xjdx =

k∑j=0

g(α)k,j Eaj,

is dependent solely on the moments E(aj). The latter however are not necessarilyobtained from the empirical autocovariance function of the aggregated processas in LOPV’s disaggregation scheme.

For the estimation of the moments we could apply the algorithm suggested forinstance in Lewbel (1994), Linden (1999) or Chong (2006). Thus a new methodwill be the combination of the two preceding approaches: moments are obtainedfrom truncated AR(∞) representation AR(H) of the aggregated process andthe recurrence relation which links the moments Eaj and the AR(H) estimatedcoefficients as in Chong (2006). And from the LOPV estimator we take the ideato apply the Gegenbauer expansion for the estimation of an auxiliary function.This combination of LOPV’s and Chong’s methods we call a hybrid approach fora brevity.

An extended simulation analysis comparing the properties of all three ap-proaches is presented in the empirical part of the thesis. Statistical properties ofsuggested estimator are the subject for further research.

2.3 Mixture density for the product of aggre-gated spectral densities

In this section we show how the mixture density associated with the product ofspectral densities can be obtained from (simpler) mixture densities associatedwith multipliers.

Let X1,t and X2,t are two aggregated processes obtained from the indepen-


dent copies of AR(1) sequences Y1,t = a1Y1,t−1 + ε1,t and Y2,t = a2Y2,t−1 + ε2,t,respectively, where a1, a2 satisfy (2.1), εi = εi,t, t ∈ Z and ai are independent,i = 1, 2. Denote σ2

i,ε := Eε2i,t, i = 1, 2.

Assume that ϕ1 and ϕ2 are the mixture densities associated with spectraldensities f1 and f2, respectively, i.e.

fi(λ) =σ2i,ε

2π

∫ 1

−1

ϕi(x)|1− xeiλ|2

dx, i = 1, 2. (2.18)

The following theorem shows that, if ϕ1 and ϕ2 in (2.18) are supported on [0, 1]and [−1, 0] respectively, then the stationary Gaussian process with spectral den-sity f(λ) = f1(λ)f2(λ) can also be obtained by aggregation of the i.i.d. AR(1)processes with some mixture density ϕ and noise sequence ε. It should be notedthat, under assumption about disjoint supports and condition (2.3), the productf1(λ)f2(λ) is integrable, i.e. is correctly specified spectral density. This can beseen from equality

∫ π

−πf1(λ)f2(λ)dλ =

∫ 1

−1

∫ 1

−1w(x, y)ϕ(x)ϕ(y)dxdy,

with

w(x, y) =∫ π

−π

dλ|1− xeiλ|2|1− yeiλ|2

=∫ π

−π

dλ(1 + xy)2 − 2(1 + xy)(x+ y) cosλ+ x2 + y2 + 2xy cos(2λ)

≤∫ π

−π

dλ(1 + xy)2 − 2(1 + xy)(x+ y) cosλ+ (x+ y)2

= 2π(1− x2)(1− y2)

due to inequality xy cos(2λ) ≥ xy (as xy ≤ 0).

Theorem 2.3.1. Let ϕ1 and ϕ2 be the mixture densities associated with spectraldensities f1 and f2, respectively. Assume that supp(ϕ1) ⊂ [0, 1], supp(ϕ2) ⊂[−1, 0] and

f(λ) = f1(λ)f2(λ). (2.19)

Then the mixture density ϕ(x), x ∈ [−1, 1] associated with f is given by equality

ϕ(x) = 1C∗

(ϕ1(x)

∫ 0

−1

ϕ2(y)(1− xy)(1− y/x) dy + ϕ2(x)

∫ 1

0

ϕ1(y)(1− xy)(1− y/x) dy

),

(2.20)

2.3. Mixture density for the product of aggregated spectral densities 51

where C∗ :=∫ 10

( ∫ 0−1 ϕ1(x)ϕ2(y)(1− xy)−1dy

)dx. The variance of the noise is

σ2ε =

σ21,εσ

22,εC∗

2π . (2.21)

Proof. Obviously, the covariance function γ(h) = Cov(Xh, X0) has a form

γ(h) = 12π

∞∑j=−∞

γ1(j + |h|)γ2(j), (2.22)

where γ1 and γ2 are the covariance functions of the aggregated processes obtainedfrom the mixture densities ϕ1 and ϕ2, respectively:

γ1(j) = σ21,ε

∫ 1

0ϕ1(x)

x|j|

1− x2 dx, γ2(j) = σ22,ε

∫ 0

−1ϕ2(x)

x|j|

1− x2 dx.

Clearly, γ1(j) > 0 and γ2(j) = (−1)j|γ2(j)|. Let h ≥ 0. Then

2πσ−21,εσ

−22,εγ(h) = σ−2

1,εσ−22,ε

∞∑j=−∞

γ1(j + h)γ2(j)

= σ−21,εσ

−22,ε

( ∞∑j=0

γ1(j + h)γ2(j) +∞∑j=0

γ1(j)γ2(j + h)

+h−1∑j=1

γ1(−j + h)γ2(j))

=: s1 + s2 + s3. (2.23)

We have s1 = limN→∞ s(N)1 , where

s(N)1 =

N∑j=0

∫ 1

0ϕ1(x)

xj+h

1− x2 dx∫ 0

−1ϕ2(y)

yj

1− y2 dy

=∫ 1

0

∫ 1

0ϕ1(x)ϕ2(−y)

11− x2

11− y2

N∑j=0

(−1)jxj+hyj dx dy

=∫ 1

0

∫ 1

0ϕ1(x)ϕ2(−y)

xh

1− x21

1− y21− (−xy)N+1

1 + xydx dy.

Note that∣∣∣∣∣ϕ1(x)ϕ2(−y)xh

1− x21

1− y21− (−xy)N+1

1 + xy

∣∣∣∣∣ ≤ 2ϕ1(x)ϕ2(−y)1

1− x21

1− y2

and ∫ 1

0

∫ 1

0ϕ1(x)ϕ2(−y)

11− x2

11− y2 dx dy = γ1(0)γ2(0) <∞.


Therefore, by the dominated convergence theorem, we obtain

s1 =∫ 1

0

∫ 1

0ϕ1(x)ϕ2(−y)

xh

1− x21

1− y2 limN→∞

1− (−xy)N+1

1 + xydxdy

=∫ 1

0

∫ 1

0ϕ1(x)ϕ2(−y)

xh

1− x21

1− y21

1 + xydxdy

=∫ 1

0

xh

1− x2

ϕ1(x)

∫ 0

−1ϕ2(y)

11− y2

11− xy dy

dx. (2.24)

Analogously, we have

s2 =∫ 0

−1

yh

1− y2

ϕ2(y)

∫ 1

0ϕ1(x)

11− x2

11− xy dx

dy. (2.25)

For the last term in the decomposition (2.23) it follows that

s3 =h−1∑j=1

∫ 1

0ϕ1(x)

x−j+h

1− x2 dx∫ 0

−1ϕ2(x)

xj

1− x2 dx

=∫ 1

0

∫ 1

0ϕ1(x)

xh

1− x2 ϕ2(−y)1

1− y2 (−y/x)1− (−y/x)h−1

1 + y/xdx dy

=∫ 1

0

xh

1− x2

ϕ1(x)

∫ 0

−1ϕ2(y)

11− y2

y/x

1− y/x dy

dx

−∫ 0

−1

yh

1− y2

ϕ2(y)

∫ 1

0ϕ1(x)

11− x2

11− y/x dx

dy. (2.26)

Equalities (2.24)–(2.26), together with (2.23), imply

2πσ−21,εσ

−22,εγ(h) =

∫ 1

0

xh

1− x2 ϕ1(x)∫ 0

−1ϕ2(y)

11− y2

y/x

1− y/x dy

+∫ 0

−1ϕ2(y)

11− y2

11− xy dy

dx

+∫ 0

−1

yh

1− y2 ϕ2(y)∫ 1

0ϕ1(x)

11− x2

11− xy dx

−∫ 1

0ϕ1(x)

11− x2

11− y/x dx

dy

=∫ 1

0

xh

1− x2 ϕ1(x)∫ 0

−1ϕ2(y)

1(1− y/x)(1− xy) dy

dx

+∫ 0

−1

yh

1− y2 ϕ2(y)∫ 1

0ϕ1(x)

1(1− x/y)(1− xy) dx

dy.

This and (2.4) imply (2.20), taking into account that∫ 1−1 ϕ(x)dx = 1. 2

2.4. Behaviour of spectral density of the aggregated process 53

2.4 Behaviour of spectral density of the aggre-gated process

In this section we will study the behavior of the spectral densities correspondingto the general class of semi-parametric mixture densities of the form (see Vianoand Oppenheim (2004), Leipus et al. (2006))

ϕ(x) = (1− x)1−2d1(1 + x)1−2d2ψ(x), 0 < d1 < 1/2, 0 < d2 < 1/2, (2.27)

where ψ(x) is continuous and not vanishes at the points x = ±1. As it is seen fromthe mixture densities appearing in Section 2.1, this form is natural, in particular(2.27) covers the seasonal mixture density as in (2.38). It worth noting that thecorresponding spectral density behaves as a long memory spectral density in thesense of definition 1.43.

Proposition 2.4.1. Let the density ϕ(x) be given in (2.27), ψ(x) is nonnegativefunction on [−1, 1] and continuous at the points x = ±1 with ψ(±1) 6= 0. Thenthe following relations for the corresponding spectral density hold:

f(λ) ∼ σ2εψ(1)

22d2+1 sin(πd1)|λ|−2d1 , |λ| → 0, (2.28)

f(λ) ∼ σ2εψ(−1)

22d1+1 sin(πd2)|π ∓ λ|−2d2 , λ→ ±π. (2.29)

Proof. Let 0 < λ < π. (2.27), (2.5) and change of variables u = (x−cosλ)/ sin λlead to

f(λ) = σ2ε

2π

∫ 1

−1

(1− x)1−2d1(1 + x)1−2d2ψ(x)|1− xeiλ|2

dx

=(2 sin λ2

)−2d1(2 cos λ2

)−2d2

g∗(λ),

where

g∗(λ) = σ2ε

π

∫ tan λ2

− cot λ2

(sin λ2 − u cos λ2 )1−2d1(cos λ2 + u sin λ

2 )1−2d2

1 + u2 ψ(u sin λ+cosλ) du.

By assumption of continuity at the point 1, the function ψ(u sin λ + cosλ) isbounded in some neighbourhood of zero, i.e. ψ(u sin λ + cosλ) ≤ C1(λ0) for


0 < λ < λ0 and λ0 > 0 sufficiently small. Hence,

(sin λ2 − u cos λ2 )1−2d1(cos λ2 + u sin λ

2 )1−2d2

1 + u2 ψ(u sin λ+cosλ) ≤ C2(λ0)(1 + |u|)1−2d1

1 + u2

for 0 < λ < λ0 and, by the dominated convergence theorem, as λ→ 0,

g∗(λ) → σ2εψ(1)π

∫ ∞0

u1−2d1

1 + u2 du

= σ2εψ(1)2π Γ(d1)Γ(1− d1) = σ2

εψ(1)2 sin(πd1)

,

implying (2.28). The same argument leads to relation (2.29). 2

Remark 2.4.1. Note, differently from Viano and Oppenheim (2004), we do notrequire the boundedness of function ψ(x) on interval [−1, 1]. In fact, ψ(x) canhave singularity points within (−1, 1), see (2.32), (2.35) below.

2.5 Seasonal long memory case

In this section, we apply the obtained result to the spectral densities f1 and f2

having the forms:

f1(λ; d1) = 12π |1− eiλ|−2d1 (2.30)

f2(λ; d2) = 12π |1 + eiλ|−2d2 , 0 < d1, d2 < 1/2. (2.31)

We call these spectral densities (and corresponding processes) fractionally inte-grated, FI(d1), and seasonal fractionally integrated, SFI(d2), spectral densities.The mixture density associated with the FI(d1) spectral density (2.30) is givenby the following expression (see Dacunha-Castelle and Oppenheim (2001)):

ϕ1(x; d1) = C(d1)xd1−1(1− x)1−2d1(1 + x)1[0,1](x) (2.32)

withC(d) = Γ(3− d)

2Γ(d)Γ(2− 2d) = 22d−2 sin(πd)√π

Γ(3− d)Γ((3/2)− d) (2.33)

and the variance of the noise

σ21,ε = sin(πd1)

C(d1)π. (2.34)

2.5. Seasonal long memory case 55

For convenience, we present the proof of this result in Proposition 2.7.1 of Ap-pendix.

Similarly, the mixture density associated with the spectral density (2.31) isgiven by

ϕ2(x; d2) = ϕ1(−x; d2) = C(d2)|x|d2−1(1 + x)1−2d2(1− x)1[−1,0](x) (2.35)

andσ2

2,ε = sin(πd2)C(d2)π

, (2.36)

since

f2(λ; d2) = f1(π − λ; d2) =σ2

2,ε

2π

∫ 1

0

ϕ1(x; d2)|1 + xeiλ|2

dx =σ2

2,ε

2π

∫ 0

−1

ϕ1(−x; d2)|1− xeiλ|2

dx.

Clearly, mixture densities ϕ1(x; d1) and ϕ2(x; d2), given in (2.32), (2.35),satisfy (2.3) and, hence, meet the assumptions of Theorem 2.3.1, whenever0 < d1 < 1/2 and 0 < d2 < 1/2. Moreover, since d1 > 0 and d2 > 0, bothϕ1 and ϕ2 satisfy (2.6). The mixture density associated with the spectral density

f(λ; d1, d2) = f(λ; d1)f(λ; d2) = 1(2π)2 |1− eiλ|−2d1|1 + eiλ|−2d2 , 0 < d1, d2 < 1/2

(2.37)can be derived from representation (2.20).

Denote F (a, b, c;x) a hypergeometric function

F (a, b, c;x) = Γ(c)Γ(b)Γ(c− b)

∫ 1

0tb−1(1− t)c−b−1(1− tx)−adt,

where c > b > 0 if x < 1 and, in addition, c− a− b > 0 if x = 1.

Proposition 2.5.1. The mixture density associated with f(·; d1, d2) (2.37) isgiven by equality

ϕ(x; d1, d2) = C(d1, d2)xd1−1(1− x)1−2d1G(−x; d2)1[0,1](x)

+C(d2, d1)|x|d2−1(1 + x)1−2d2G(x; d1)1[−1,0](x), (2.38)

whereG(x; d) := F

(1, d, 2− d; 1

x

)− xF (1, d, 2− d;x)


and

C(d1, d2) = (C∗)−1 Γ(d2)Γ(2− 2d2)Γ(2− d2)

,

C∗ =∫ 1

0xd1−1(1− x)1−2d1(1 + x)

∫ 1

0

yd2−1(1− y)1−2d2(1 + y)1 + xy

dy

dx.

The variance of the noise is

σ2ε = (2π)−1C∗C(d1)C(d2)σ2

1,εσ22,ε = sin(πd1) sin(πd2)C∗

2π3 . (2.39)

Proof. (2.20) implies that

ϕ(x; d1, d2) = 1C∗

(C(d2)ϕ1(x; d1)F (−x; d2) + C(d1)ϕ2(x; d2)F (x; d1)), (2.40)

whereF (x; d) :=

∫ 1

0

yd−1(1− y)1−2d(1 + y)(1− xy)(1− y/x) dy,

C∗ =∫ 1

0ϕ1(x; d1)

(∫ 0

−1

ϕ2(y; d2)1− xy dy

)dx = C(d1)C(d2)C∗.

Using equality

1 + y

(1− xy)(1− y/x) = 1(1− x)(1− y/x) −

x

(1− x)(1− xy) ,

we have

F (x; d) = 11− x

∫ 1

0

yd−1(1− y)1−2d

1− y/x dy − x

1− x

∫ 1

0

yd−1(1− y)1−2d

1− xy dy

= Γ(d)Γ(2− 2d)Γ(2− d)

11− x (F (1, d, 2− d; 1/x)− xF (1, d, 2− d;x))

= Γ(d)Γ(2− 2d)Γ(2− d)

G(x; d)1− x . (2.41)

Now, (2.38) follows from (2.40) and (3.9), whereas (2.39) follows from (2.21),(2.34), (2.36).

To finish the proof note that all the hypergeometric functions appearing inthe form of the mixture density are correctly defined. 2

In the next proposition we present the asymptotics of ϕ(x; d1, d2) in the neigh-borhoods of 0 and ±1.

2.5. Seasonal long memory case 57

Proposition 2.5.2. The mixture density ϕ (2.38) satisfies

ϕ(x; d1, d2) ∼

π

C∗ sin(πd2)xd1+d2−1, x→ 0+,

π

C∗ sin(πd1)|x|d1+d2−1, x→ 0−,

(2.42)

ϕ(x; d1, d2) ∼

21−2d2π

C∗ sin(πd2)(1− x)1−2d1 , x→ 1−,

21−2d1π

C∗ sin(πd1)(1 + x)1−2d2 , x→ −1 + .

(2.43)

Proof. Applying identities (see Abramovitz and Stegun (1972))

F (a, b; c; 1/x) =(

x

x− 1

)bF (b, c− a; c; 1

1− x), F (a, b; c; 1) = Γ(c)Γ(c− a− b)Γ(c− a)Γ(c− b) ,

we have that for x→ 0+

G(−x, d2) = F(1, d2, 2− d2;−

1x

)+ xF (1, d2, 2− d2;−x)

=(

x

1 + x

)d2

F (d2, 1− d2, 2− d2; 1/(1 + x)) + xF (1, d2, 2− d2;−x)

∼ xd2F (d2, 1− d2, 2− d2; 1) = Γ(2− d2)Γ(1− d2)Γ(2− 2d2)

xd2

=√π Γ(2− d2)

21−2d2Γ((3/2)− d2)xd2 ,

and similarly for x→ 0−

G(x, d1) ∼√π Γ(2− d1)

21−2d1Γ((3/2)− d1)|x|d1 .

This and equality (2.38) imply

ϕ(x; d1, d2) ∼ C(d1, d2)√π Γ(2− d2)

21−2d2Γ((3/2)− d2)xd1+d2−1

= C(d1)C(d2)C∗

Γ(d2)Γ(1− d2)xd1+d2−1

= (C∗)−1 π

sin(πd2)xd1+d2−1, x→ 0+,

ϕ(x; d1, d2) ∼ (C∗)−1 π

sin(πd1)|x|d1+d2−1, x→ 0− .


For x→ 1− we obtain

ϕ(x; d1, d2) ∼ 2C(d1, d2)F (1, d2, 2− d2;−1)(1− x)1−2d1

= C(d1, d2)√πΓ(2− d2)

Γ((3/2)− d2)(1− x)1−2d1

= (C∗)−1 21−2d2π

sin(πd2)(1− x)1−2d1

and similarly for x→ −1+

ϕ(x; d1, d2) ∼ 2C(d2, d1)F (1, d1, 2− d1;−1)(1 + x)1−2d2

= (C∗)−1 21−2d1π

sin(πd1)(1 + x)1−2d2 .

Remark 2.5.1. Clearly,

ϕ1(x; d1) ∼

C(d1)xd1−1, x→ 0+,

2C(d1)(1− x)1−2d1 , x→ 1−,

ϕ2(x; d2) ∼

C(d2)|x|d2−1, x→ 0−,

2C(d2)(1 + x)1−2d2 , x→ −1 + .

Hence, by Proposition 2.5.2, the mixture density ϕ associated with the productspectral density (2.37) behaves as ϕ1 when x approaches 1, and behaves as ϕ2

when x approaches −1. However, at zero, ϕ behaves as |x|d1+d2−1, i.e. bothdensities ϕ1 and ϕ2 are involved.

2.6 FARIMA-type spectral density case

Theorem 2.3.1 allows us construct the mixture density also in the case when thespectral density f of aggregated process has the form

f(λ) = 12π

(2 sin |λ|2

)−2dg(λ), 0 < d < 1/2,

where g(λ) is analytic spectral density on [−π, π]. In general, the existence ofthe mixture density associated with any analytic spectral density is not clear.For example, AR(1) is aggregated process only if the mixture density is theDirac delta function, what is difficult to apply in practice. Similar inferenceconcerns also the ARMA processes, i.e. rational spectral densities. Another class

2.6. FARIMA-type spectral density case 59

of spectral densities obtained by aggregating "non-degenerated" mixture densitiesis characterized in the following proposition.

Proposition 2.6.1. A mixture density ϕg is associated with some analytic spec-tral density if and only if there exists 0 < a∗ < 1 such that supp(ϕ) ⊂ [−a∗, a∗].

Proof. For sufficiency, assume that there exists 0 < a∗ < 1 such that supp(ϕ) ⊂[−a∗, a∗]. The covariance function of the corresponding process satisfies

|γ(h)| ≤ σ2ε

∫ 1

−1

|x||h|

1− x2 ϕ(x) dx

= σ2ε

∫ a∗

−a∗

|x||h|

1− x2 ϕ(x) dx

≤ σ2εa|h|∗

∫ a∗

−a∗

ϕ(x)1− x2 dx

= Ca|h|∗ ,

i.e. the covariance function decays exponentially to zero. This implies that thespectral density f(λ) = (2π)−1∑∞

h=−∞ γ(h)eihλ is analytic function on [−π, π](see, e.g., Bary (1964), p. 80–82)).

To prove the necessity, assume that f is an analytic function on [−π, π] or,equivalently, the corresponding covariance function decays exponentially to zero,i.e. there exists θ ∈ (0, 1) and a constant C > 0 such that |γ(h)| ≤ Cθ|h|. Assumeto the contrary that supp(ϕ) = [−1, 1].

Let h ≥ 0 be an even integer. Then

|γ(h)| = σ2ε

∫ 1

−1

|x|h

1− x2 ϕ(x) dx ≤ Cθh

implies that ∫ 1

−1

( |x|θ

)h ϕ(x)1− x2 dx ≤ C. (2.44)

Rewrite the last integral as

∫ 1/θ

−1/θ|x|h ϕ(θx)

1− (θx)2 dx =∫ 1

−1|x|h ϕ(θx)

1− (θx)2 dx+∫ −1

−1/θ|x|h ϕ(θx)

1− (θx)2 dx

+∫ 1/θ

1|x|h ϕ(θx)

1− (θx)2 dx =: I1(h) + I2(h) + I3(h).

For every x ∈ [−1, 1] we have

|x|h ϕ(θx)1− (θx)2 ≤

ϕ(θx)1− (θx)2 .


Hence, by the dominated convergence theorem and (2.3), I1(h) → 0 as h → ∞.The Fatou lemma, however, implies that both integrals I2(h) and I3(h) tend toinfinity as h→∞:

lim infh→∞

I2(h) ≥∫ 1/θ

1

ϕ(θx)1− (θx)2 lim inf

h→∞xh dx =∞

since 1/θ > 1. This contradicts (2.44). 2

Example 2.6.1. Assume that the mixture density ϕg has a uniform distributionon [a, b], where −1 < a < b < 1. By Proposition 2.6.1, the associated spectraldensity is analytic function on [−π, π] and can be easily calculated:

fg(λ) = σ2ε

2π(b− a)

∫ b

a

dx1− 2x cosλ+ x2

= σ2ε

2π(b− a) sin |λ|

(arctan

(b− cosλsin |λ|

)− arctan

(a− cosλsin |λ|

)), λ 6= 0,±π.

fg(0) = σ2ε(2π)−1(1− a)−1(1− b)−1, fg(±π) = σ2

ε(2π)−1(1 + a)−1(1 + b)−1.

We obtain the following corollary.

Corollary 2.6.1. Let ϕ1(x; d) (2.32) and ϕg(x) be the mixture densities asso-ciated with spectral densities f1(λ; d) (2.30) and analytic spectral density g(λ),respectively. Assume that supp(ϕg) ⊂ [−a∗, 0], 0 < a∗ < 1, and

f(λ) = 12π

(2 sin |λ|2

)−2dg(λ). (2.45)

Then the mixture density ϕ(x), x ∈ [−a∗, 1] associated with f is given by equality

ϕ(x) = C−1∗

(ϕ1(x; d))

∫ 0

−a∗

ϕg(y)(1− xy)(1− y/x) dy+ϕg(x)

∫ 1

0

ϕ1(y; d)(1− xy)(1− y/x) dy

),

whereC∗ :=

∫ 1

0

(∫ 0

−a∗

ϕ1(x; d)ϕg(y)1− xy dy

)dx.

2.7. Appendix. Mixture density associated with FI(d) spectral density 61

2.7 Appendix. Mixture density associated withFI(d) spectral density

Proposition 2.7.1. Mixture density associated with FI(d) spectral density

f(λ; d) = 12π

(2 sin |λ|2

)−2d, 0 < d < 1/2,

is given by equality

ϕ(x) = C(d)xd−1(1− x)1−2d(1 + x)1[0,1](x), (2.46)

whereC(d) = Γ(3− d)

2Γ(d)Γ(2− 2d) = 22d−2 sin(πd)√π

Γ(3− d)Γ((3/2)− d) .

The variance of the noise isσ2ε = sin(πd)

C(d)π .

Proof. Equalityf(λ; d) = 2−d−1π−1(1− cosλ)−d

implies that 1− cosλ = (π2d+1f(λ; d))−1/d. Hence, rewriting

|1− xeiλ|2 = (1− x)2(1 + 2x

(1− x)2 (1− cosλ)),

and assuming supp(ϕ) = [0, 1], we obtain that the spectral density of aggregatedprocess is of the form

σ2ε

2π

∫ 1

0

ϕ(x)|1− xeiλ|2

dx = σ2ε

2π

∫ 1

0

ϕ(x)(1− x)2

(1 + 2x

(1−x)2 (1− cosλ)) dx (2.47)

= σ2ε

2π

∫ 1

0

ϕ(x)(1− x)2

(1 + 2x

(1−x)2 (π2d+1f(λ; d))−1/d) dx.

The change of variables y1/d = 2x/(1− x)2 implies

dy = d2dxd−1(1 + x)(1− x)2d+1 dx. (2.48)


Consider the density ϕ defined by

dy = ϕ(x)C(d)(1− x)2

dx, (2.49)

where C(d) is some constant. Then (2.48) becomes

σ2ε

2π

∫ 1

0

ϕ(x)|1− xeiλ|2

dx = σ2εC(d)2π

∫ ∞0

dy(1 + y1/d(π2d+1f(λ; d))−1/d)

= f(λ; d)σ2ε2dC(d)

∫ ∞0

dz1 + z1/d

after the change of variables z = yπ2d+1f(λ;d) . Therefore, FI(d) is an aggregated

process and, by (2.48)–(2.49), the mixture density has a form

ϕ(x) = C(d)d2dxd−1(1− x)1−2d(1 + x), (2.50)

and the variance of the noise is (see formula 6.1.17 in Abramowitz and Stegun(1972))

σ2ε = 2−d(C(d))−1

(∫ ∞0

dz1 + z1/d

)−1

= 2−d(dC(d))−1(B(d, 1− d))−1

= 2−d(dC(d))−1 sin(πd)π

.

Finally, it remains to calculate the constant C(d) to ensure that the mixturedensity ϕ given in (2.50) integrates to one over the interval [0, 1]. We have

∫ 1

0ϕ(x)dx = C(d)d2d

( ∫ 1

0xd−1(1− x)1−2ddx+

∫ 1

0xd(1− x)1−2ddx

)= C(d)d2d

(B(d, 2− 2d) +B(d+ 1, 2− 2d)

)= C(d)d21+d Γ(d)Γ(2− 2d))

(2− d)Γ(2− d)

= C(d)d22−d√π

sin(πd)Γ((3/2)− d)

Γ(3− d) .

Hence,

C(d) = 1d2d+1

Γ(3− d)Γ(d)Γ(2− 2d) = 2d−2

d

sin(πd)√π

Γ(3− d)Γ((3/2)− d)

and C(d) = C(d)d2d. 2

Chapter 3Asymptotic normality of LOPV

estimator

The purpose of asymptotic theory in statistics issimple: to provide usable approximations beforepassage to the limit.

David R. Brillinger

Abstract

The chapter concerns the asymptotic distribution of the mixture densityof LOPV estimator in the aggregation, disaggregation problem of randomparameter AR(1) process described in the preceding chapters. First, wegive a brief review of previously proved results. Then we prove that, un-der mild conditions on the (semi-parametric) form of the mixture density,the LOPV estimator is asymptotically normal. The proof is based on thelimit theory for the quadratic form in linear random variables developedby Bhansali et al. (2007). The moving average representation of the ag-gregated process is investigated.

3.1 Consistency of LOPV estimator

Among the statistical properties of any estimator some key facts are usuallychecked at the first place. In the preceding result by Leipus et al. (2006) the con-vergence in L2 and the uniform convergence on compact sets of LOPV estimatorwere primarily obtained, i.e. the consistency property of LOPV estimator hasbeen established.

Since the LOPV estimator is based upon the Gegenbauer expansion of theauxiliary function ζ in L2(ω(α)) it is natural to begin with the convergence in thisspace.

64 3. Asymptotic normality of LOPV estimator

Theorem 3.1.1. [Leipus et al. (2006), Theorem 1] Let Xt be an aggregatedprocess associated to a mixture density satisfying (2.3), and suppose that α > −1is chosen so that condition (2.8) holds. If Kn satisfies

Kn = [γ log n] with 0 < γ < (2 log(1 +√

2))−1, (3.1)

then, for ϕn defined in (2.9) and (2.10), it holds

limn→∞

∫ 1

−1

E(ϕn(x)− ϕ(x))2

(1− x2)α dx = 0. (3.2)

Two remarks about the assumptions of the theorem can be concluded. Firstly,noticing that, under the semi-parametric assumption for the form of mixturedensity (2.7), condition (2.8) is equivalent to α < 2 min(1 − 2d1, 1 − 2d2) + 1,so that this condition is satisfied for all α in (−1, 1]. Moreover if the functionϕ is bounded, condition (2.3) implies that (2.8) again holds for all values ofα ∈ (−1, 1]. It is true, for example, in the case of densities used in Oppenheimand Viano (1999). Secondly, returning to the convergence (3.2), it is clear thatthe denominator (1 − x2)α gives a better fit for the estimation at ±1 for largevalues of α.

Next result provides a rate of L2-convergence for mixture densities regularenough. For the sake of simplicity, the result is given in the semi-parametricsituation (2.7). It is noted that the result could be formulated in a more generalsetting as well.

Theorem 3.1.2. [Leipus et al. (2006), Theorem 2]1 Suppose that the mixturedensity has the form

ϕ(x) = (1− x)1−2d1(1 + x)1−2d2ψ(x), d1 < 1/2, d2 < 1/2, (3.3)

where ψ(x) has a continuous derivative on [−1, 1] and does not vanish at ±1.For every α > −1 such that

min(1− 2d1, 1− 2d2) >α

2 + 34

1Here the d1, d2 domains are changed from d1 > 0, d2 > 0 as in Leipus et al. (2006)to those common in the long memory literature d1 < 1/2, d2 < 1/2, where LOPV and LM“memory” parameters are linked by dLOPV = 1− 2dLM .

3.2. Asymptotic normality: main result 65

and ϕn being defined in (2.9) and (2.10), with Kn satisfying (3.1), we have

∫ 1

−1

E(ϕn(x)− ϕ(x))2

(1− x2)α dx = O

(1

(log n)3

).

Finally, the uniform convergence of the mixture density estimator given bythe semi-parametric form (2.7) is proved.

Theorem 3.1.3. [Leipus et al. (2006), Theorem 3] Let ϕ be a mixture density ofthe form (3.3), where ψ is analytic on the open unit disk, continuous in [−1, 1],and does not vanish at ±1. Denote d = max(d1, d2). Let us consider ϕn(x), theestimator defined in (2.9) with Kn satisfying (3.1).

a) If 0 ≤ α < 3/2− 2d, then for any δ ∈ [0, 1) we have

E(

supx∈[−δ,δ]

(ϕn(x)− ϕ(x))2)≤ C(δ)

(log n)3−α−d . (3.4)

b) If 5/2 ≤ α < 3/2− 2d, the result holds with δ = 1.

3.2 Asymptotic normality: main result

In this chapter, we further study the properties of the proposed LOPV mixturedensity estimator. In order to formulate the theorem about the asymptotic nor-mality of estimator ϕn(x), we will assume that aggregated process Xt, t ∈ Zadmits the following linear representation.

Assumption A Assume that Xt, t ∈ Z is a linear sequence

Xt =∞∑j=0

ψjZt−j, (3.5)

where the Zt are i.i.d. random variables with zero mean, finite fourth momentand the coefficients ψj satisfy

ψj ∼ cjd−1, |ψj − ψj+1| = O(jd−2), 0 < d < 1/2 (3.6)

with some constant c 6= 0.

We also introduce the following condition on the mixture density ϕ(x).


Assumption B Assume that mixture density ϕ has a form

ϕ(x) = (1− x)1−2dψ(x), 0 < d < 1/2, (3.7)

where ψ(x) is a nonnegative function with supp(ψ) ⊂ [−1, 1], continuous atx = 1, ψ(1) 6= 0.

Note that, omitting in (2.35) the factor responsible for the seasonal part, wethus obtain the corresponding ’long memory’ spectral density with singularity atzero (but not necessary at ±π) and the corresponding behavior of the coefficientsψj in linear representation (3.16).

Theorem 3.2.1. Let Xt, t ∈ Z be the aggregated process satisfying AssumptionA and corresponding to the mixture density given by Assumption B. Assume that(2.8) holds, and d and α satisfy the following condition

− 1/2 < α <52 − 4d. (3.8)

Let Kn be given in (2.11) with γ satisfying

0 < γ < (2 log(1 +√

2))−1(1−max

α + 4d− 3

2 , 0). (3.9)

Then for every fixed x ∈ (−1, 1), such that ϕ(x) 6= 0, it holds

ϕn(x)− Eϕn(x)√Var(ϕn(x))

d−→ N(0, 1). (3.10)

Proof of the theorem is given in Section 3.4.

Remark 3.2.1. Suppose that ϕ(x) satisfies Assumption B. Then assumption(2.8) is equivalent to

∫ 1−1 ψ

2(x)(1 + x)−αdx < ∞ and α < 3 − 4d. The lastinequality is implied by (3.8).

Example 3.2.1. Assume two mixture densities

ϕ(x; d) = C1(d)xd−1(1− x)1−2d(1 + x)1(0,1](x), 0 < d < 1/2, (3.11)

where C1(d) = Γ(3−d)2Γ(d)Γ(2−2d) , and

ϕg(x;κ) = C2(κ)|x|κ1[−a∗,0](x), κ > 0, (3.12)

3.2. Asymptotic normality: main result 67

where 0 < a∗ < 1, C2(κ) = (κ+ 1)(a∗)−κ−1.According to Dacunha-Castelle and Oppenheim (2001), the spectral density

corresponding to ϕ(x; d) is FARIMA(0,d,0) spectral density

f(λ; d) = 12π

(2 sin |λ|2

)−2d. (3.13)

Also, since the support of ϕg lies inside (−1, 1), the spectral density g(λ;κ)corresponding to ϕg(x;κ) is analytic function (see Proposition 2.6.1).

Consider the spectral density given by

f(λ) = f(λ; d)g(λ;κ), λ ∈ [−π, π]. (3.14)

It can be shown that the mixture density ϕ(x) associated with f(λ) (3.14) issupported on [−a∗, 1], satisfies Assumption B with ψ(x) which is continuousfunction on [−a∗, 1] and at the neighborhood of zero satisfies ψ(x) = O(|x|d).This implies the validity of condition (2.8) needed to obtain the correspondingα-Gegenbauer expansion. For the proof of this example and precise asymptoticsof ψ(x) at zero see Appendix A.

Finally, the aggregated process X, obtained using such mixture density ϕ(x),satisfies Assumption A by Proposition 3.3.2, which shows that assumptions A andB are satisfied under general ’aggregated’ spectral density f(λ) = f(λ; d)g(λ),where g(λ) is analytic function on [−π, π] and the associated mixture density issupported on [−a∗, 0] with some 0 < a∗ < 1.

Remark 3.2.2. Note that the ’FARIMA mixture density’ (3.11), due to factorxd−1, does not satisfy (2.8) and a "compensating" density such as ϕg(x;κ) in(3.12) is needed in order to obtain the needed integrability in the neighborhoodof zero. Obviously, for the same aim, other mixture densities instead of ϕg(x;κ)(3.12) can be employed.

Remark 3.2.3. If one considers a subclass of long memory generalized integratedprocess GI(d+) introduced in Jin Lung-Lin (1991) corresponding to a family ofmixture densities ϕ(x) = C(d, ω)xω−d(1−x)1−2d(1+x)I[−a∗;1], where 0 < a∗ < 1,ω > d − 1/2, associated with the spectral densities from this subclass, suchdensities clearly satisfy (2.8) and are somewhat equivalent to the product mixturedensity from the Example 3.2.1. However they are not convenient proving themain result.


3.3 Moving average representation of the aggre-gated process

In order to obtain the asymptotic normality result in Theorem 3.2.1, an importantassumption is that the aggregated process admits a linear representation withcoefficients decaying at an appropriate rate (see Bhansali et al. (2007)). Therelated issues about the moving average representation of the aggregated processare discussed in this section.

From the aggregating scheme follows that any aggregated process admits anabsolutely continuous spectral measure. If, in addition, its spectral density, say,f(λ) satisfies ∫ π

−πlog f(λ)dλ > −∞, (3.15)

then the function

h(z) = exp 1

4π

∫ π

−π

eiλ + z

eiλ − zlog f(λ)dλ

, |z| < 1,

is an outer function from the Hardy space H2, does not vanish for |z| < 1 andf(λ) = |h(eiλ)|2. Then, by the Wold decomposition theorem, correspondingprocess Xt is purely nondeterministic and has the MA(∞) representation (seeAnderson ((1971), Ch. 7.6.3))

Xt =∞∑j=0

ψjZt−j, (3.16)

where the coefficients ψj are defined from the expansion of normalized outerfunction h(z)/h(0), ∑∞j=0 ψ

2j <∞, ψ0 = 1, and Zt = Xt − Xt, t = 0, 1, . . . (Xt is

the optimal linear predictor of Xt) is the innovation process, which is zero mean,uncorrelated, with variance

σ2 = 2π exp 1

2π

∫ π

−πlog f(λ)dλ

. (3.17)

By construction, the aggregated process is Gaussian, implying that the innova-tions Zt are i.i.d. N(0, σ2) random variables.

Next we focus on the class of semi-parametric mixture densities satisfyingAssumption B. As it was mentioned earlier, this form is natural, in particular itcovers the mixture densities ϕ1(x; d) and ϕ(x) in Example 3.2.1.

3.3. Moving average representation of the aggregated process 69

Proposition 3.3.1. Let the mixture density ϕ(x) satisfies Assumption B. As-sume that either

(i) supp(ψ) = [−1, 1] and ψ(x) ≡ ψ(x)(1 + x)2d−1 is continuous at −1 andψ(−1) 6= 0 with some 0 < d < 1/2,

or

(ii) supp(ψ) ⊂ [−a∗, 1] with some 0 < a∗ < 1.

Then the aggregated process admits a moving average representation (3.16), wherethe Zt are Gaussian i.i.d. random variables with zero mean and variance (3.17).

Proof. (i) We have to verify that (3.15) holds. Rewrite ϕ(x) in the form

ϕ(x) = (1− x)1−2d(1 + x)1−2dψ(x).

Proposition 2.4.1 implies

f(λ) ∼ C1|λ|−2d, |λ| → 0,

with C1 > 0. Hence log f(λ) ∼ logC1 − C2 log |1 − eiλ|, |λ| → 0, where C2 > 0.For any ε > 0 choose 0 < λ0 ≤ π/3, such that

− log f(λ)− logC1

C2 log |1− eiλ|− 1 ≥ −ε, 0 < λ ≤ λ0.

Since − log |1− eiλ| ≥ 0 for 0 ≤ λ ≤ π/3, we obtain

∫ λ0

0log f(λ)dλ ≥ λ0 logC1 − C2(1− ε)

∫ λ0

0log |1− eiλ|dλ > −∞ (3.18)

using the well known fact that∫ π0 log |1− eiλ|dλ = 0. Similarly,

∫ π

π−λ0log f(λ)dλ > −∞. (3.19)

When λ ∈ [λ0, π − λ0], there exist 0 < L1 < L2 <∞ such that

L1 ≤1

2π|1− xeiλ|2≤ L2

uniformly in x ∈ (−1, 1). Thus, by (??), L1 ≤ f(λ) ≤ L2 for any λ ∈ [λ0, π−λ0],and therefore ∫ π−λ0

λ0log f(λ)dλ > −∞. (3.20)


(3.18)–(3.20) imply inequality (3.15).

The proof in case (ii) is analogous to (i) and, thus, is omitted. 2

Lemma 3.3.1. If the spectral density g(λ) of the aggregated process Xt, t ∈ Z isanalytic function on [−π, π], then Xt admits representation

Xt =∞∑j=0

gjZt−j,

where the Zt are i.i.d. Gaussian random variables with zero mean and variance

σ2g = 2π exp

12π

∫ π

−πlog g(λ)dλ

(3.21)

and the gj satisfy |∑∞j=0 gj| <∞, g0 = 1.

Proof. From Proposition 2.6.1 it follows that there exists 0 < a∗ < 1 such that

g(λ) = σ2ε

2π

∫ a∗

−a∗

ϕg(x)|1− xeiλ|2

dx. (3.22)

For all x ∈ [−a∗, a∗] and λ ∈ [0, π] we have

1|1− xeiλ|2

≥ C3 > 0,

where C3 = C3(a∗). This and (3.22) imply∫ π0 log g(λ)dλ > −∞. Finally,

|∑∞j=0 gj| <∞ follows from representation

g(λ) =σ2g

2π

∣∣∣∣ ∞∑j=0

gjeijλ∣∣∣∣2

and the assumption of analyticity of g. 2

Proposition 3.3.2. Let Xt, t ∈ Z be an aggregated process with spectral density

f(λ) = f(λ; d)g(λ), (3.23)

where f(λ; d) is FARIMA spectral density (3.13) and g(λ) is analytic spectraldensity. Then:

(i) if mixture density ϕg(x) associated with g(λ) satisfies supp(ϕg) ⊂ [−a∗, 0]with some 0 < a∗ < 1, then ϕ(x), associated with f(λ), satisfies Assumption B.

3.3. Moving average representation of the aggregated process 71

(ii) Xt admits a linear representation (3.16), where the Zt are Gaussian i.i.d.random variables with zero mean and variance

σ2 = 2π exp 1

2π

∫ π

−πlog f(λ)dλ

= exp

12π

∫ π

−πlog g(λ)dλ

=σ2g

2π

and the coefficients ψj satisfy

ψj ∼∑∞k=0 gkΓ(d) jd−1, |ψj − ψj+1| = O(jd−2), (3.24)

where ψ0 = 1. (Here, the gk are given in Lemma 3.3.1.)

Proof. (i) By Corollary 2.6.1, the mixture density associated with the "product"spectral density (3.23) exists and has a form

ϕ(x) = C−1∗

(ϕ(x; d)

∫ 0

−a∗

ϕg(y)dy(1− xy)(1− y/x) + ϕg(x)

∫ 1

0

ϕ(y; d)dy(1− xy)(1− y/x)

),

(3.25)with

C∗ :=∫ 1

0

(∫ 0

−a∗

ϕ(x; d)ϕg(y)1− xy dy

)dx, (3.26)

where ϕ(x; d) is given in (3.11) and is associated with the spectral density f(λ; d),and ϕg(x) is associated with the spectral density g(λ). Clearly, this implies thatAssumption B is satisfied.

(ii) We have

f(λ; d) = 12π

∣∣∣∣∣∞∑j=0

hjeijλ∣∣∣∣∣2

with hj = Γ(j + d)Γ(j + 1)Γ(d)

and, recall,

g(λ) =σ2g

2π

∣∣∣∣∣∞∑j=0

gjeijλ∣∣∣∣∣2

,∞∑j=0

g2j <∞

since, by Lemma 3.3.1,∫ π−π log g(λ)dλ > −∞. On the other hand,∫ π

−π log f(λ)dλ > −∞ implies

f(λ) = 12π

∣∣∣∣ ∞∑j=0

ψjeijλ∣∣∣∣2, ∞∑

j=0ψ2j <∞


and, by uniqueness of the representation,

ψk = σg√2π

k∑j=0

hk−jgj.

It easy to see that,k∑j=0

hk−jgj ∼ hk∞∑j=0

gj ∼ C4kd−1, (3.27)

where C4 = Γ−1(d)∑∞j=0 gj. Indeed, taking into account that hk ∼ Γ−1(d)kd−1,we can write

k∑j=0

hk−jgj = Γ−1(d)kd−1∞∑j=0

ak,jgj,

where ak,j = hk−jΓ(d)k1−d1j≤k → 1 as k →∞ for each j. On the other hand,we have |ak,j| ≤ C(1 + j)1−d uniformly in k and, since the gj decay exponen-tially fast, the sum ∑∞

j=0(1 + j)1−d|gj| converges and the dominated convergencetheorem applies to obtain (3.27).

Hence, we can write

f(λ) =σ2g

(2π)2

∣∣∣∣ ∞∑j=0

ψjeijλ∣∣∣∣2, ψ0 = 1,

where ψj = ψj√

2π/σg ∼ C4jd−1. Thus, representation (3.16) and the first

relation in (3.24) follows.

Finally, in order to check the second relation in (3.24), it suffices to note that

ψj − ψj+1 =j∑i=0

(hj−i − hj+1−i)gi − gj+1,

where hj − hj+1 ∼ C5jd−2 and the gj decay exponentially fast. 2

3.4 Proof of the main result

In order to prove Theorem 3.2.1, we use the result of Bhansali et al. (2007), whoconsidered the following quadratic form

Qn,X =n∑

t,s=1dn(t− s)XtXs,

3.4. Proof of the main result 73

where theXt are linear sequences satisfying Assumption A and the function dn(k)satisfies the following assumption.

Assumption C Suppose that

dn(k) =∫ π

−πηn(λ)eikλdλ

with some even real function ηn(λ), such that, for some −1 < β < 1 and asequence of constants mn ≥ 0, it holds

|ηn(λ)| ≤ mn|λ|−β, λ ∈ [−π, π]. (3.28)

Denote by En a matrix (en(t− s))t,s=1,...,n, where

en(t− s) = 2π∫ π

−πηn(λ)f(λ)eiλ(t−s)dλ (3.29)

and let ‖En‖2 = ∑nt,s=1 e

2n(t− s).

Theorem 3.4.1. [Bhansali et al. (2007)] Suppose that assumptions A and C aresatisfied. If 2d+ β < 1/2 and

rn = o(‖En‖), (3.30)

where

rn =

mnnmax(0,2d+β) if 2d+ β 6= 0,

mn log n if 2d+ β = 0,(3.31)

then, as n→∞, it holdsVar(Qn,X) ‖En‖2

andQn,X − EQn,X√

Var(Qn,X)d−→ N(0, 1).

(Here for an, bn ≥ 0, an bn means that C6bn ≤ an ≤ C7bn for some C6, C7 > 0.)

Proof of Theorem 3.2.1. First of all, note that

σ2n,ε

P−→ σ2ε ,

which easily follows using Theorem 3 in Hosking (1996). Hence, to obtain con-


vergence (3.10), we can replace the factor σ2n,ε by σ2

ε in the definition of ϕn(x).Without loss of generality assume that σ2

ε = 1.Rewrite the estimate ϕn(x) in a form

ϕn(x) = (1− x2)αKn∑k=0

k∑j=0

g(α)k,j (σn(j)− σn(j + 2))G(α)

k (x)

= (1− x2)αKn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j

∫ π

−π(eiλj − eiλ(j+2))In(λ)dλ

= (2πn)∫ π

−πηn(λ;x)In(λ)dλ, (3.32)

whereηn(λ;x) := (1− x2)α

2πn

Kn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j (eiλj − eiλ(j+2)) (3.33)

and In(λ) = (2πn)−1|∑nj=1Xjeijλ|2, λ ∈ [−π, π] is the periodogram.

Now the proof follows from Assumption A and the results obtained inLemma 3.4.1 and Lemma 3.4.2 below, which imply that, under appropriate choiceof mn and β, all the assumptions in Theorem 3.4.1 are satisfied. In particular,by Lemma 3.4.1, the following bound for the kernel ηn(λ;x) holds

|ηn(λ;x)| ≤ mn|λ|−β, (3.34)

wheremn = C8n

γ log(1+√

2)−1, β = α

2 −34 , (3.35)

C8 is a positive constant, depending on x and α. Clearly, (3.8) implies that−1 < β ≤ 1

2 − 2d < 12 and 2d+ β < 1

2 .Consider the cases 2d + β ≤ 0 or 0 < 2d + β < 1/2. In the case 2d + β ≤ 0,

from (3.31), (3.35) we obtain

rn = C8

nγ log(1+

√2)−1 if 2d+ α

2 −34 < 0,

nγ log(1+√

2)−1 log n if 2d+ α2 −

34 = 0.

Hence, by Lemma 3.4.2 below, rn‖En‖−1 → 0 because γ log(1 +√

2) < 1/2.Assume now 2d+ β > 0. Then

rn = C8nγ log(1+

√2)+2d+α

2−74

and rn‖En‖−1 → 0 by (3.9). 2

3.5. Appendix A. Proof of Example 3.2.1 75

The following lemma shows that the kernel ηn(λ;x) given in (3.33) satisfiesinequality (2.18) with mn and β given in (3.35).

Lemma 3.4.1. For quantity ηn(λ;x) given in (3.33) and for every fixed x ∈(−1, 1), 0 < |λ| < π it holds

|ηn(λ;x)| ≤ C9nγ log(1+

√2)−1|λ|(3−2α)/4

(1− x2)α/2−1/4 if α > −1/2,

(1− x2)α if −1 < α < −1/2,

where C9 depends on α, and γ is given in (2.11).

Lemma 3.4.2. Assume that a mixture density ϕ(x) satisfies condition (2.8) andlet Kn →∞. Then for every x ∈ (−1, 1), such that ϕ(x) 6= 0 it holds

‖En‖2 ≥ C10n−1(1 + o(1)), (3.36)

where C10 > 0 is positive constant depending on α and x.

Proof of these two lemmas are given in Appendix B.

3.5 Appendix A. Proof of Example 3.2.1

By Corollary 2.6.1, the mixture density ϕ(x), x ∈ [−a∗, 1] associated with f(λ)(3.14) is given by equality (3.25), where ϕg(x) ≡ ϕg(x;κ). Clearly, in this case,(3.25) can be rewritten in form (2.32) with

ψ(x) = C(ψ1(x) + ψ2(x)), (3.37)

where C = C1(d)C2(κ)C−1∗ is positive constant,

ψ1(x) := xd−1(1 + x)1(0,1](x)∫ 0

−a∗

|y|κ

(1− xy)(1− y/x) dy, (3.38)

ψ2(x) := |x|κ(1− x)2d−11[−a∗,0](x)∫ 1

0

yd−1(1− y)1−2d(1 + y)(1− xy)(1− y/x) dy. (3.39)

Denote by F (a, b; c;x) a hypergeometric function

F (a, b; c;x) = Γ(c)Γ(b)Γ(c− b)

∫ 1

0tb−1(1− t)c−b−1(1− tx)−adt,


with c > b > 0 if x < 1 and, in addition, c − a − b > 0 if x = 1. Then thecorresponding integrals in ψ1(x) and ψ2(x) can be rewritten as

∫ 0

−a∗

|y|κ

(1− xy)(1− y/x) dy,

= aκ+1∗

κ+ 1x(F (1, κ+ 1;κ+ 2;−a∗x)− F (1, κ+ 1;κ+ 2;−a∗/x))

1− x2

∼ aκ+1∗

κ+ 1 x, as x→ 0+,

and∫ 1

0

yd−1(1− y)1−2d(1 + y)(1− xy)(1− y/x) dy

= Γ(d)Γ(2− 2d)Γ(2− 2d)

F (1, d; 2− d; 1/x)− xF (1, d; 2− d;x)1− x

∼ Γ(d)Γ(1− d)|x|d, as x→ 0−,

where the last asymptotics follow from the well known properties of the hyper-geometric functions (see Abramovitz and Stegun (?)).

Thus, from (3.38)–(3.39) we obtain that

ψ1(x) ∼aκ+1∗

κ+ 1 xd, as x→ 0+, (3.40)

ψ2(x) ∼ Γ(d)Γ(1− d)|x|κ+d, as x→ 0− . (3.41)

(3.37) and relations (3.40)–(3.41) complete the proof. 2

3.6 Appendix B. Proofs of lemmas 3.4.1–3.4.2

Proof of Lemma 3.4.1. By (3.33),

(1− x2)−αηn(λ;x) = (2πn)−1Kn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j (eiλj − eiλ(j+2))

= (2πn)−1(1− e2iλ)Kn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j eiλj

= (2πn)−1(1− e2iλ)Kn∑k=0

G(α)k (x)G(α)

k (eiλ).

3.6. Appendix B. Proofs of lemmas 3.4.1–3.4.2 77

This and Lemma 3.6.1 below implies

(1− x2)−α|ηn(λ;x)| ≤ C11 n−1|λ|−(2α−3)/4

Kn∑k=0|G(α)

k (x)|(1 +√

2)k. (3.42)

Now, using the fact that for all −1 < x < 1

|G(α)k (x)| ≤

C12(1− x2)−α2− 14 if α > −1/2

C12 if α < −1/2, α 6= −3/2,−5/2, . . .

(see inequality (7.33.6) in Szego (1967) and (3.9) in Leipus et al. (2006)) and(2.11), we get from (2.39)

(1− x2)−α|ηn(λ;x)| ≤ C13 n−1|λ|−(2α−3)/4(1 +

√2)Kn

= C13 n−1|λ|−(2α−3)/4eKn log(1+

√2)

≤ C9|λ|−(2α−3)/4nγ log(1+√

2)−1.

2

Lemma 3.6.1. For all k ≥ 0, α > −1, (α 6= −1/2) and 0 < |λ| < π it holds

|(1− e2iλ)G(α)k (eiλ)| ≤ C11(1 +

√2)k|λ|−(2α−3)/4,

where constant C11 depends on α.

Proof. Theorem 8.21.10 of Szego (1967) implies that for the usual (nonnormal-ized) Gegenbauer polynomials with α > −1, α 6= −1/2 it holds

C(α+1/2)k (eiλ) =

Γ(k + α + 12)

Γ(k + 1)Γ(α + 12)zk(1− z−2)−α−1/2 +O(kα−3/2|z|k), (3.43)

where the complex numbers w = eiλ and z are connected by the elementaryconformal mapping

w = 12(z + z−1), z = w + (w2 − 1)1/2, (3.44)

and z satisfies |z| > 1 (thus, λ 6= 0,±π).

Recall that the normalized Gegenbauer polynomials G(α)k (z) are linked to


C(α+1/2)k (z) by equality

G(α)k (z) = γ

−1/2k C

(α+1/2)k (z), where γk = π

22αΓ(k + 2α + 1)

(k + α + 12)Γ2(α + 1

2)Γ(k + 1) .

Therefore, in terms of the normalized Gegenbauer polynomials, (3.43) reads asfollows

G(α)k (eiλ) = sgn(α + 1/2)2α

π1/2 bkzk(1− z−2)−α−1/2 +O(k−1|z|k), (3.45)

wherebk = (k + α + 1/2)1/2Γ(k + α + 1/2)

Γ1/2(k + 1)Γ1/2(k + 2α + 1) → 1 as k →∞.

From (3.44) we obtain for w = eiλ

w2 − 1 = 14z

2(1− z−2)2,

which together with (3.45) yields

(1− e2iλ)G(α)k (eiλ) = −sgn(α + 1/2)2α

4π1/2 bkzk+2(1− z−2)−α+3/2 +O(k−1|z|k).

Since |z| > 1 and z2 − 1 = 2(e2iλ − 1) + 2e3iλ/2(eiλ − e−iλ)1/2, we have

|1− z−2| ≤ |z2 − 1|

≤ 2|e2iλ − 1|+ 2|eiλ − e−iλ|1/2

= 4| sin λ|+ 2√

2| sin λ|1/2

≤ (4 + 2√

2)|λ|1/2. (3.46)

So that, by (3.45)–(3.46),

|(1− e2iλ)G(α)k (eiλ)| ≤ C14bk|z|k|λ|−(2α−3)/4, (3.47)

where C14 = C14(α).

Finally, the straightforward verification shows that

supλ∈[−π,π]

|eiλ + (e2iλ − 1)1/2| = 1 +√

2.

This completes the proof of lemma. 2


Proof of Lemma 3.4.2. Using (3.29), (3.33) rewrite the coefficients of En

en(t− s) = n−1(1− x2)αKn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j

∫ π

−πf(λ)(eiλ(t−s+j) − eiλ(t−s+j+2))dλ.

Using the expression of the covariance function of an aggregated process, we havefor t− s+ j ≥ 0

∫ π

−πf(λ)(eiλ(t−s+j) − eiλ(t−s+j+2))dλ = σ(t− s+ j)− σ(t− s+ j + 2)

= σ2ε

∫ 1

−1yt−s+jϕ(y)dy.

Thus, assuming σ2ε = 1, for t− s ≥ 0 we have

en(t− s) = n−1(1− x2)αKn∑k=0

G(α)k (x)

k∑j=0

g(α)k,j

∫ 1

−1yt−s+jϕ(y)dy

= n−1(1− x2)αKn∑k=0

G(α)k (x)

∫ 1

−1yt−sϕ(y)

k∑j=0

g(α)k,j y

jdy

= n−1(1− x2)αKn∑k=0

G(α)k (x)

∫ 1

−1yt−sϕ(y)G(α)

k (y)dy.

Integral∫ 1−1 y

mϕ(y)G(α)k (y)dy (m is a nonnegative integer), appearing in the last

expression is nothing else but the kth coefficient, ψm,k, in the α-Gegenbauerexpansion of the function

ψm(x) = xmϕ(x)(1− x2)α , (3.48)

which obviously satisfies ψm ∈ L2(w(α)). Therefore,

en(t− s) = n−1(1− x2)αKn∑k=0

G(α)k (x)ψ|t−s|,k

= n−1(1− x2)α(ψ|t−s|(x)−

∞∑k=Kn+1

G(α)k (x)ψ|t−s|,k

)


and, denoting Rn(m) := ∑∞k=Kn+1G

(α)k (x)ψ|m|,k, |m| < n, we have

(1− x2)−2α‖En‖2 = n−2 ∑|m|<n

(n− |m|)(ψ|m|(x)−

∞∑k=Kn+1

G(α)k (x)ψm,k

)2

= n−2 ∑|m|<n

(n− |m|)ψ2|m|(x)− 2n−2 ∑

|m|<n(n− |m|)ψ|m|(x)Rn(m)

+n−2 ∑|m|<n

(n− |m|)R2n(m) =: A1,n − 2A2,n + A3,n.

Now, we prove that, as n→∞,

A1,n ∼ C15n−1, (3.49)

where C15 = C15(x) > 0 is some positive constant, and

A2,n = o(n−1). (3.50)

Since the last term A3,n is nonnegative by construction, this will prove (3.36).

At points x where ϕ(x) 6= 0 we have

A1,n = n−2 ϕ2(x)(1− x2)2α

∑|m|<n

(n− |m|)x2|m|

∼ n−1 ϕ2(x)(1 + x2)

(1− x2)2α+1 , as n→∞,

which gives (3.49). Consider term A2,n. By (3.48),

A2,n = n−2 ∑|m|<n

(n− |m|)ψ|m|(x)∞∑

k=Kn+1G

(α)k (x)ψ|m|,k

= ϕ(x)(1− x2)α n

−2∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y)∑|m|<n

(n− |m|)(xy)|m|dy

= ϕ(x)(1− x2)α (B1,n −B2,n −B3,n),


where

B1,n := n−1∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y)∞∑

m=−∞(xy)|m|dy,

= n∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y)1 + xy

1− xy dy,

B2,n := n−2∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y)∑|m|<n

|m|(xy)|m|dy,

B3,n := n−1∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y)∑|m|≥n

(xy)|m|dy.

Since, by (2.8),ϕx(y) ≡

ϕ(y)(1− y2)α

1 + xy

1− xy

satisfies ϕx ∈ L2(w(α)) and Kn → ∞, the sum ∑∞k=Kn+1 in B1,n vanishes (as

the tail of the convergent series). So that, B1,n = o(n−1) and, similarly, B3,n =o(n−1).

Finally,

B2,n ∼ n−2∞∑

k=Kn+1G

(α)j (x)

∫ 1

−1ϕ(y)G(α)

k (y) 2xy(1− xy)2 dy = o(n−2)

using the similar argument as in the case of term B1,n. This completes the proofof (3.50) and of the lemma. 2

Chapter 4Simulations and empirical applications

Torture numbers, and they’ll confess to anything.

Gregg Easterbrook

Abstract

In this chapter we present the empirical part of the thesis. Firstly, weprovide a set of Monte Carlo simulations, which aim to strengthen thetheoretical findings proposed in the preceding chapters. Besides that anapplication to G7 aggregated consumption data is investigated.

Applications are the key reason to do any theoretical research. Simple, yetuseful techniques, besides real data applications, include Monte Carlo experi-ments. While it is not a universally agreed terminology, it seems reasonable todescribe Monte Carlo (MC) experiments as simulation since they will be con-ducted by simulating random processes using random numbers (with propertiesanalogous to those of the random processes). Generally speaking, Monte Carlomethods provides approximate solutions to a variety of problems by performingstatistical sampling experiments on the computer (Fishman (1995)).

The arguments in favour of using experimental Monte Carlo simulations forstudying econometric methods are simply that many problems are analyticallyintractable or analysis thereof is too expensive, and that the relative price of cap-ital to labour has moved sharply and increasingly in favour of capital. Roughlyspeaking, compared to a mathematical analysis of a complicated estimator ortest procedure, results based on computer experiments are inexpensive and easyto produce. However, being an experimental discipline, precision and certaintyare not among key features of Monte Carlo methods so the discussed above re-sults are some sort of illustration to the theoretical problems and thus have tobe carefully interpreted (Hendry (1984)).

Nevertheless the most important objective of the obtained results considerreal data applications. One of the key examples described in the aggregation

84 4. Simulations and empirical applications

literature is the analysis of aggregated consumption data in large economies suchas G7 countries: Germany, Italy, France, Japan, Canada, U.K., U.S.A. This studyaims to compare the results provided by different disaggregation techniques.

4.1 Asymptotic normality: a simulation study

In order to gain further insight into the asymptotic normality property of themixture density estimator (2.9), in this section we conduct a small Monte-Carlosimulation study. Several examples are considered, which correspond to themixture densities having different shapes (here we do not pose a question whichrigorous aggregating schemes lead to the latter).

The following two families of mixture densities

ϕ(x) = wϕ1(x) + (1− w)ϕ2(x), 0 < w < 1,

are considered:

• Beta-type mixture densities defined by

ϕ1(x) ∝ xp1−1(1− x)q1−11[0,1](x), p1 > 0, q1 > 0,

ϕ2(x) ∝ |x|p2−1(a∗ + x)q2−11[−a∗,0](x), p2 > 0, q2 > 0, 0 < a∗ < 1;

• mixed (Beta and Uniform)-type mixture densities defined by

ϕ1(x) ∝ xp3−1(1− x)q3−11[0,1](x), p3 > 0, q3 > 0,

ϕ2(x) = a−1∗ 1[−a∗,0](x), 0 < a∗ < 1.

Intending to construct the mixture density estimator, in the first step, theparameters Kn and α must be chosen. Preliminary Monte-Carlo simulationsshowed that the estimator ϕn(x) reaches the minimal mean integrated squareerror (MISE) when the parameter α is chosen to be equal 1−2d. The justificationof this interesting conjecture remains an open problem. This rule also ensuresthat (3.8) is satisfied. The number of Gegenbauer polynomials Kn is chosenaccording to (2.11). Note that, by construction, the estimator ϕn(x) is notnecessary positive, though it integrates to one. (However, we do not introducepositive part renormalization as in Leipus et al. (2006), since truncated andrenormalized estimator will be not normally distributed.)

4.1. Asymptotic normality: a simulation study 85

In Figure 4.1 , we present three graphs and corresponding box plots for themixture densities of the form above. Cases 1 and 2 correspond to the Beta-typemixture densities, Case 3 corresponds to the mixed (Beta and Uniform)-typemixture density. The parameter values are presented in Table 4.1. The boxplots are obtained by a Monte-Carlo procedure based on M = 500 independentreplications with sample size n = 1500 and bandwidth Kn = 3 (we aggregateN = 5000 i.i.d. AR(1) processes). Individual innovations ε(j)

t are i.i.d. N(0, 1).Note that the mixture density in Case 2 corresponds to Example 3.2.1 with theparameters d = 0.2, κ = 0.1 (in the sense of behavior at zero).

−1.0 −0.5 0.0 0.5 1.0

−1

01

23

x

−1

01

23

(a) Case 1

−1.0 −0.5 0.0 0.5 1.0

−1

01

23

x

−1

01

23

(b) Case 2

−1.0 −0.5 0.0 0.5 1.0

−1

01

23

x

−1

01

23

(c) Case 3

Figure 4.1: True mixture densities (solid line) and the box plots of the estimates.Number of replications M = 500, sample size n = 1500.

Box plots in 4.1 show that ϕn approximates the mixture density well whenn is sufficiently large. However, when the sample size is relatively small it is


w a∗ (p1, q1) (p2, q2) (p3, q3) d αCase 1 0.8 0.95 (3.0, 1.5) (2.0, 1.0) – 0.25 0.5Case 2 0.8 0.80 (1.2, 1.6) (1.3, 2.5) – 0.20 0.6Case 3 0.8 0.90 – – (2.0, 1.2) 0.40 0.2

Table 4.1: Parameter values in cases 1–3.

difficult to estimate the mixture density of the shape as in cases 2–3. Thiscan be explained by the construction of estimator which assumes rather smoothform of the mixture density around zero. On the other hand, it is clear thatthe AR(1) parameter values which are close to zero does not affect the longmemory property. For our purposes, an important fact is that the estimatorcorrectly approximates the density at the neighborhood of x = 1. This enablesus to estimate the unknown (in real applications) parameter d using a log–logregression on periodogram at the neighborhood of this point (for example Gewekeand Porter-Hudak, Reisen or Whittle-type estimators).

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 1, x = −0.5 Case 1, x = −0.5

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 1, x = 0.96 Case 1, x = 0.96

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 2, x = −0.5 Case 2, x = −0.5

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 2, x = 0.96 Case 2, x = 0.96

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 3, x = −0.5 Case 3, x = −0.5

−2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

Case 3, x = 0.96 Case 3, x = 0.96

−3 −2 −1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

Figure 4.2: QQ plots and histograms of the estimates at points x = −0.5 and x =0.96. Number of replications M = 500, sample size n = 1500.

Figure 4.2 supplements the earlier findings and shows that the distributionof estimator is approximately normal.1 QQ-plots and histograms are given forfixed values x = −0.5 and x = 0.96 correspondingly. We use the same number

1The Shapiro-Wilk test confirms that in most cases normality hypothesis is consistent withthe data.

4.2. Empirical comparison of alternative disaggregation schemes 87

of replications M = 500 and sample size n = 1500.

6.5 7.0 7.5 8.0 8.5

−4.

0−

3.5

−3.

0−

2.5

log(n)

(a) Case 1, x = −0.5

6.5 7.0 7.5 8.0 8.5

−3.

0−

2.5

−2.

0−

1.5

−1.

0

log(n)

(b) Case 1, x = 0.96

Figure 4.3: log-log scale regression of the variance in Case 1 .

The last Monte-Carlo experiment aims to show that the decay rate ofVar(ϕn(x)) is n−γ with γ = 1. This ensures that the variance is decreas-ing fast enough, thus agreeing with the theoretical results. To do this, wecalculate the log–log regression of variance on the length of time series n ∈500, 600, . . . , 1400, 1500, 2000, . . . , 5000. Figure 4.3 demonstrates the corre-sponding parameter estimates at different points and shows that γ ≈ 1.

4.2 Empirical comparison of alternative disag-gregation schemes

4.2.1 Monte Carlo simulation study

Next Monte Carlo simulation study aims to investigate and compare three alter-native methods of disaggregation in AR(1) aggregation scheme, namely: Chong’spolynomial density estimator (Chong (2006)), LOPV Gegenbauer expansion forthe model without common innovations (Leipus et al. (2006)), and an extensionof the LOPV estimator for the model with common innovations, which uses theidea for the moments’ estimation from Chong’s approach. Recall that we refer tothe latter as a hybrid approach. Here we examine the methods’ abilities to esti-mate the mixture density function under different individual model assumptions.


For a convenience recall that individual processes are given by

Y(j)t = a(j)Y

(j)t−1 + ηt + ε

(j)t , t = 1, 2, . . . , n, j = 1, 2, . . . , N,

with the same assumptions as in Chapter 2.Simulated mixture densities are described by the following attributes:

• a form of a mixture density − polynomial or Beta-type;

• a support of the autoregressive parameter’s density is either [0, 1] or [−1, 1];

• a modality − unimodal or bimodal (in general we can study any multi-modal case);

• assumptions put on innovations of the individual processes, namely: modelswith (1) or without (2) common innovations.

The polynomial densities (Poly3 and Poly4) are chosen equal to the estimatesfor Canada aggregated consumption data provided in Chong (2006), becausethe form of the latter is more suitable than that in corresponding Monte Carlosimulations’ section of Chong. Note, that Chong’s MC experiments considerpolynomial densities supported by [0, 1] up to the 3rd order, but their actualfunctional form was very close to a straight line.

Beta-type of densities are taken equivalent to those considered in Leipus etal. (2006). It is additionally assumed, that both the parameters d1, d2 in themixture density and the true order of the polynomial density m are a priorigiven. Besides that α, in accordance with the remark 2.2.2, is chosen equal tomin(1− 2d1, 1− 2d2).

Polynomial densities’ order is equal to m = 6 for the Beta experiments2 andto the true polynomial density’s order in other cases. The truncation parameterH is chosen equal to [0.5 n0.75] = 52.

Other characteristics of simulation experiments comprise: N=5000 for theunimodal and N=10000 for the bimodal case, and n=500. Each experiment con-sists of 100 independent copies. We assume that both εt and ηt are independentstandard normal.

Although the true underlying density is non-negative, the estimated densitycan be negative occasionally. Moreover, its integral is not necessarily equal to

2The usage of information criteria as in Chong was not found significant in the case of Betadistribution.


Case ϕ(x) MISEChong(m) LOPV(α) Hybrid(α)

Beta1 0.6(1− x)0.25(1 + x)0.75 0.244(6) 0.033(0.25) 0.007(0.25)Beta2* 0.6(1− x)0.75(1 + x)0.25 0.059(6) 0.032(0.25) 0.028(0.25)Beta3 2.21x2(1− x)0.25(1 + x)0.75 0.308(6) 0.131(0.25) 0.073(0.25)Beta4* 2.74x2(1− x)0.5(1 + x)0.75 0.169(6) 0.174(0.25) 0.024(0.25)Poly1 6x− 6x2 1.162(2) 0.107(0) 0.105(0)Poly2* 6x− 6x2 0.702(2) 0.303(0) 0.117(0)Poly3 1.27x3 + 1.2x2 + 0.03x+ 0.1 0.082(3) 0.171(0) 0.086(0)Poly4* 1.27x3 + 1.2x2 + 0.03x+ 0.1 0.118(3) 0.277(0) 0.123(0)

Table 4.2: Monte-Carlo simulations comparing three alternative methods of disag-gregation

1. The correction could be either ϕ+n (x) or ϕn(x) − min(ϕn(x)), renormalized

Riemann sums.To compare the different disaggregation approaches the mean integrated

square error (MISE) criteria is used:∫ 1

−1E(ϕ(x)− ϕn(x))2dx.

Its empirical estimates, where theoretical mean is changed by empirical one, aregiven in the table 4.2.1. In this table the * denotes the presence of commoninnovations, bold values denote the minimum of MISE criteria. From this tableit may be concluded that the hybrid approach is most of the times better in thesense of MISE or close to the preceding approaches.

The following two graphics illustrate these simulation results even further.Figures (4.2.1) and (4.2.1) shows the generated mixture densities and the 25 and75 per cent quantiles.

From the figures it follows that first of all LOPV estimator produce accept-able results for the Beta-type of distributions despite the lower MISE values ascompared to the hybrid method (this is generally the natural consequence of themixture density’s (2.27) form) and provides quite relevant approximations forthe polynomial densities. However the method performs worse in the presenceof common innovations, that supports the earlier remarks.

Chong’s method seems to be not suitable for the Beta-type densities at all.However Chong’s method performs better for the polynomial densities, exceptfor current simulations when the density is supported on [0, 1].

Hybrid method performs well and close to the former best approximationsof both Beta-type and polynomial densities in all simulated MC experiments.


Figure 4.4: Beta-type distributions

Figure 4.5: Polynomial type distributions

Besides it seems to be robust to the inclusion of common innovations.Since in the further subsection we investigate the real data whose length is

relatively short we would like to analyze what effects appears when the samplesize in the time domain is much shorter than in the previous experiments. Herewe consider Beta2* and Poly4* experiments as the most interesting and generalcases to be investigated.

From the estimation results presented in the table 4.2.1 it follows that eventhe small sample size may result in relatively close approximates of the true mix-ture density functions, however polynomial densities inconsistently require longeraggregated time series than under the assumption of a Beta-type of distribution.


MISE Beta2* MISE Poly4*n 0.6(1− x)0.75(1 + x)0.25 1.27x3 + 1.2x2 + 0.03x+ 0.1

Chong LOPV Hybrid Chong LOPV Hybrid50 0.141 0.144 0.136 0.450 0.539 0.446100 0.120 0.100 0.094 0.322 0.490 0.318200 0.077 0.059 0.054 0.185 0.347 0.183

Table 4.3: The influence of time series length on MISE

4.2.2 Empirical application: aggregated consumptionmodel

In this empirical exercise the aggregated households’ expenditure data for theG7 countries are considered (source of the data – OECD’s statistical issues anddatabases). By and large, this result is complementary to the earlier findingintroduced in Lippi and Zaffaroni (1998), and Chong (2006). This study investi-gates how well the alternative approaches approximate the mixture density.

First of all let us describe the context of the private consumption aggregation.Let ∆c(j)t denotes the change in total real consumption expenditures by individ-ual j. A rational consumer without liquidity constraint should have consumptionthat approximates a random walk or/and probably include a deterministic lin-ear trend. Thus the change of consumption as the variable of interest can beestimated by the following model

∆c(j)t = αj∆c(j)t−1 + ηt + ε(j)t . (4.1)

To compare the data and to reduce the effect of seasonality we apply the esti-mation procedure to the annual real household consumption data for G7 countriesas in Chong (2006). Sometimes for the cross-sectional comparison of the dataper-capita and/or logarithmic transformations are suggested. However withoutaffecting the estimation, we keep the original aggregated data, but move to thefirst differences to eliminate the random walk influence. Note, that (4.1) is lin-ear both in parameters and in variables, thus it corresponds to the deterministicaggregation as in Theil (1954).

As the first differences are considered the numbers of observations are reducedby one, i.e. for Canada, France, Germany, Italy, Japan, the U.K., and the U.S.A.they are respectively 36, 47, 37, 37, 36, 46 and 36 (the end of the period is either2006 or 2007 year). The differenced annual aggregated consumption data for G7countries are introduced in figure 4.6.


Figure 4.6: Differenced annual aggregated consumption data ∆c(j)t for G7 countries.

In accordance to our earlier proposals Gegenbauer expansion’s parameter α ischosen equal to min(1−2d1, 1−2d2), where d1 and d2 are estimated using a log–logregression on periodogram at the neighborhood of the corresponding point (±1).For the latter we apply the method suggested in Reisen (1994), the estimationresult α is given in the caption of the corresponding figures 4.7–4.13 below, whereupper middle and right graphics illustrate the Beta-type densities’ estimation,for the polynomial densities α = 0 is suggested, all bottom plots depict thepolynomial densities’ estimations. Note that for the Chong’s method both upperand bottom graphics are the same, this is done for a more convenient comparisonof the estimation results. The maximum possible order of polynomial density ischosen equal to 6. The optimal order is estimated as m = arg minQ(m), whereQ(m) is the distance measure between theoretical and empirical autocovariancefunctions.

Some interesting observations from the figures 4.7–4.13 are worth noting:

1. Since we have established earlier that hybrid method is robust to the pres-ence of common innovations, there have not to be significant differencesbetween upper and lower graphics. However LOPV despite sharing theGegenbauer expansion idea, is sensitive to the presence of common compo-nent. Comparing the bottom density’s estimations by LOPV and hybridmethods, we clearly see significant differences for Canada, France, Italy,and the U.S.A. Thus it seems that it is possible to construct a test for thepresence of common innovations based on this observation.

2. When the mixture density is close to polynomial the bottom and upper


Figure 4.7: Mixture density estimates for Canada: α = 0.0824, k = m = 4; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.

Figure 4.8: Mixture density estimates for the U.K.: α = 0.647, k = m = 2; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.

graphs for a hybrid approach supposed to be approximately the same. Thisis clearly the case for Canada, France, Germany and Japan. For the other

Figure 4.9: Mixture density estimates for France: α = 0.703, k = m = 3; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.


Figure 4.10: Mixture density estimates for Italy: α = 1.672, k = m = 2; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.

Figure 4.11: Mixture density estimates for Germany: α = 0.585, k = m = 3; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.

countries Beta-type of distributions seems to be more preferable.

3. The long memory property is associated with the corresponding estimations

Figure 4.12: Mixture density estimates for Japan: α = 0.339, k = m = 3; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.


Figure 4.13: Mixture density estimates for the U.S.A.: α = 0.324, k = m = 2; uppermiddle and upper right graphics illustrate the Beta-type estimates, others – polynomialestimates.

of d1 and d2 or equivalently with α estimates. Long memory property isstatistically significant whenever α ∈ (0, 1), where closer to zero valuesdenotes stronger long range dependence. For all countries, but Italy, itis found to be the case, thus the former countries indeed possess a longmemory even after taking the first differences.

4. Although Chong’s correction for the negative part of estimator due to itssmoothness seems to be more natural, in some situations it leads to misun-derstanding or even wrong conclusions. So, for example, for Italy a shortmemory aggregated process is considered to be relevant, but the Chong’scorrection shows a mass concentration around the point 1, which, accordingto Chong’s findings, will lead to a long memory. Correction ϕ+

n (x) as inLeipus et al. (2006) (see figure 4.10) illustrates that the mixture density forItally data does not reach 1 at all. On the other hand ϕ+

n (x) is not muchbetter for the other countries, so we keep Chong’s rescaling.

5. As far as the distribution of individual consumption behaviour is concerned,people in the U.K. and the U.S. exhibit similar consumption patterns, aswell as people in France and Germany. The consumption behaviour ofCanada, Italy and Japan seem to be very unique and different from theconsumption behaviour of the the first two pairs of industrial countries.All these findings, but the first one, differ from the conclusions made byChong (2006).

6. Further examination of the forms of the mixture densities reveals thathouseholds’ consumption patterns in Italy and Japan are more dense


around a zero point and unimodal, thus suggesting a certain homogene-ity for these countries. For the other countries bimodality in general maybe accepted. Though the largest mass of the densities goes to the positivepart of the distribution, a significant part of the households consumptiondynamics, clearly seen for Canada, Germany and France, resembles the os-cillatory behaviour. Finally in the U.K. and the U.S. people try to maintainthe increasing growth in consumption, corresponding “oscillating” parts aremuch smaller there.

Conclusions

A conclusion is the place where you got tired ofthinking.

Arthur McBride Bloch

In this thesis, different aspects of time series aggregation, disaggregation prob-lems and their relationships with long memory phenomena were investigated. Aparticular attention was paid to the disaggregation problems related to a generalAR(1) aggregation scheme with and without common innovations. Below wesummarize main results of the thesis.

In the first part of the thesis, a wide yet brief survey of key aggregation,disaggregation topics were presented. It was explained that AR(1) is indeed acrucial case in real applications, especially concerning the problems in economics.A general framework to study the aggregation problems and basic notions wereanalyzed. It could be concluded that the forecasting notion to the aggregationproblem suggested by Pesaran (2002) is the most flexible definition in spite of verymild assumptions put on the DGP of individual processes. Concerning the prob-lems of disaggregation, a doubly-stochastic approach and mixtures were foundto be crucial to solve fundamental questions of disaggregation. It was proved inDacunha-Castelle and Oppenheim (2001) that a wide range of long-memory pro-cesses are obtained through the aggregation of doubly-stochastic short memoryprocesses with densities in C(∞).

For the disaggregation of AR(1) dynamics the method to extract the mo-ments of random parameter from AR(∞) representation was chosen to improvethe preceding estimators suggested in Chong (2006) and Leipus et al. (2006). Inthe second and in empirical part of the thesis we analyzed these three alterna-


tive approaches to the disaggregation problem in AR(1) aggregation scheme. Asimulation study as well as empirical application to G7 aggregated consumptiondata revealed that the proposed augmentaition of LOPV and Chong’s method(a hybrid approach) is found to be more robust to any violation of the assump-tions which were crucial for the preceding estimators. However a more carefultheoretical investigation of the hybrid estimator has yet to be considered.

The sufficient conditions for the construction of mixture densities by the prod-ucts of “elementary” densities were established. The results are then applied toderive the mixture density form for the seasonal case. Although it is importantto mention that the result is proved only for the case of disjoint supports of theelementary processes being involved.

Another major part of the thesis was devoted to the study of asymptoticnormality of LOPV estimator. We proved that, under mild conditions on the(semi-parametric) form of the mixture density, the LOPV estimator is asymp-totically normal. The proof is based on the limit theory for the quadratic formin linear random variables developed by Bhansali et al. (2007). At the sametime, the moving average representation of the aggregated process was investi-gated. A small simulation study supported the theoretical findings concerningthe asymptotic normality issues.

To conclude, it is seen that even in AR(1) aggregation, disaggregation prob-lems a lot of questions are still open. For now it is not clear, what are the mosteffective procedures, to estimate the number of polynomials and other parame-ters in semi-parametric Gegenbauer expansion of the mixture density, should wemove to a more flexible expansion by the means of Jacobi polynomials, is it pos-sible to provide the similar methods for the general ARDL aggregation scheme?These and many other empirical and theoretical questions are the subject for thefurther research in this field.

Bibliography

Abramovitz, M. and Stegun, I.: 1972, Handbook of Mathematical Functions with For-mulas, Graphs, and Mathematical Tables, Dover Publications.

Andersen, T. and Bollerslev, T.: 1997, Heterogeneous information arrivals and returnvolatility dynamics: uncovering the long-run in high frequency returns, Journalof Finance 52, 975–1005.

Anderson, T.: 1971, The Statistical Analysis of Time Series, Wiley Series in Probabilityand Mathematical Statistics, New York.

Ando, A.: 1971, On a problem of aggregation, International Economic Review 12, 306–311.

Bary, N. K.: 1964, A Treatise On Trigonometric Series, Vol. I, Pergamon Press, Ox-ford.

Bhansali, R. J., Giraitis, L. and Kokoszka, P. S.: 2007, Approximations and limittheory for quadratic forms of linear processes, Stochastic Processes and their Ap-plications 117, 71–95.

Brockwell, P. J. and Davis, R. A.: 1991, Time Series: Theory and Methods, 2nd edn,Springer-Verlag, New York.

Celov, D., Kvedaras, V. and Leipus, R.: 2007, Comparison of estimation methods forthe density of autoregressive parameter in aggregated AR(1) processes (in lith.),Lithuanian Mathematical Journal (spec. issue) 47, 508–516.

Chambers, M. J.: 1998, Long memory and aggregation in macroeconomic time series,International Economic Review 39(4), 1053–1072.

Chong, T. T.: 2006, The polynomial aggregated AR(1) model, Econometrics Journal9, 98–122.

Dacunha-Castelle, D. and Fermìn, L.: 2006a, Aggregations of doubly stochastic in-teractive gaussian processes and Toeplitz forms of U-statistics, Dependence inProbability and Statistics 187.

100 BIBLIOGRAPHY

Dacunha-Castelle, D. and Fermìn, L.: 2006b, Disaggregation of long memory processeson C∞ class, Electronic Communications in Probability 11, 35–44.

Dacunha-Castelle, D. and Oppenheim, G.: 2001, Mixtures, aggregation and long mem-ory, Prepublications, Université de Paris-sud mathematiques.

Feller, W.: 1971, An Introduction to Probability Theory and its Applications. Vol II.2nd ed., Wiley Series in Probability and Mathematical Statistics. New York.

Fishman, G. S.: 1995, Monte Carlo: Concepts, Algorithms, and Applications, Springer-Verlag, New York.

Garderen, K. J. V., Lee, K. and Pesaran, M. H.: 2000, Cross-sectional aggregation ofnon-linear models, Journal of Econometrics 95, 285–331.

Geweke, J. and Porter-Hudak, S.: 1983, The estimation and application of long memorytime series models, Journal of Time Series Analysis 4(4), 221–238.

Giacomini, R. and Granger, C. W. J.: 2004, Aggregation of space time processes,Journal of Econometrics 118, 7–26.

Giraitis, L., Kokoszka, P. and Leipus, R.: 2000, Stationary ARCH models: dependencestructure and the central limit theorem, Econometric Theory 16, 3–22.

Giraitis, L. and Leipus, R.: 1995, A generalize fractionally differencing approach inlong-memory modelling, Lithuanian Mathematical Journal 35(1), 65–81.

Giraitis, L., Leipus, R. and Surgailis, D.: 2007, ARCH(∞) models and long memoryproperties, Handbook of Financial Time Series (to appear).

Giraitis, L., Leipus, R. and Surgailis, D.: 2008, Aggregation of random coefficientGLARCH(1, 1) process, Econometric Theory (to appear) .

Gonçalvez, E. and Gouriéroux, C.: 1988, Agrégation de processus autorégressifs d’ordre1, Annales d’Economie et de Statistique 12, 127–149.

Gorman, W.: 1953, Community preference fields, Econometrica 21, 63–80.

Gouriéroux, C. and Monfort, A.: 1997, Time series and dynamic models, Themes inModern Econometrics, Cambridge University Press.

Granger, C. W. and Ding, Z.: 1996, Varieties of long memory models, Journal ofEconometrics 73, 61–77.

Granger, C. W. J.: 1980, Long memory relationships and the aggregation of dynamicmodels, Journal of Econometrics 14, 227–238.

Granger, C. W. J.: 1990, Aggregation of time series variables, T. Barker and M. H.Pesaran (eds.): Disaggregation in Econometric Modelling, Routledge, London.

Granger, C. W. J. and Morris, M. J.: 1976, Time series modelling and interpretation,Journal of the Royal Statistical Society 139, No 2, 246–257.

Grunfeld, Y. and Griliches, Z.: 1960, Is aggregation necessarily bad?, Review of Eco-nomics and Statistics 42, 1–13.

BIBLIOGRAPHY 101

Hendry, D. F.: 1984, Monte carlo experimentation in econometrics, Z. Griliches andM. D. Intriligator (eds.): Handbook of econometrics vol. 2, North-Holland Pub-lishing company, Amsterdam.

Hosking, J.: 1996, Asymptotic distributions of the sample mean, autocovariances andautocorrelations of long-memory time series, Journal of Econometrics 73, 261–264.

Kazakevičius, V., Leipus, R. and Viano, M. C.: 2004, Stability of random coefficientARCH models and aggregation schemes, Journal of Econometrics 120, 139–158.

Kelejian, H.: 1980, Aggregation and disaggregation of non-linear equations, in: J.Kmenta and J. B. Ramsey (eds.): Evaluation of Econometric Models, AcademicPress, New York, pp. 135–153.

Klein, L.: 1953, A Textbook of Econometrics., Row Peterson and Company.

Koyck, L. M.: 1954, Distributed Lags and Investment Analysis, North-Holland Pub-lishing company, Amsterdam.

Leipus, R., Oppenheim, G., Philippe, A. and Viano, M.-C.: 2006, Orthogonal seriesdensity estimation in a disaggregation scheme, Journal of Statistical Planning andInference 136, 2547–2571.

Leipus, R. and Viano, M.-C.: 2000, Modelling long-memory time series with finite orinfinite variance: a general approach., Journal of Time Series Analysis 21(1), 61–74.

Lewbel, A.: 1994, Aggregation and simple dynamics, American Economic Review84, 905–918.

Lippi, M.: 1988, On the dynamic shape of aggregated error correction models, Journalof Economic Dynamics and Control 12, 561–585.

Lippi, M. and Zaffaroni, P.: 1998, Aggregation of simple linear dynamics: exact asymp-totic results, Preprint.

Lung-Lin, J.: 1991, Generalized integrated process and the aggregation of dynamictime series, Academia Economic Papers 19(2), 341–360.

Muellbauer, J.: 1975, Aggregation, income distribution and consumer demand, Reviewof Economic Studies 43, 525–543.

Oppenheim, G. and Viano, M.-C.: 1999, Obtaining long memory by aggregating ran-dom coefficients discrete and continuous time simple short memory processes.,Technical Report 49-V, Pub. IRMA, Lille.

Oppenheim, G. and Viano, M.-C.: 2004, Aggregation of random parameters Ornstein-Uhlenbeck or AR processes: some convergence results, Journal of Time SeriesAnalysis 25, 335–350.

Palma, W.: 2007, Long-Memory Time Series: Theory and Methods, Wiley, New Jersey.

102 BIBLIOGRAPHY

Pesaran, M. H.: 2002, On aggregation of linear dynamic models: an application tolife-cycle consumption models under habbit formation, Cambridge working papersin economics, Faculty of Economics, University of Cambridge.

Pollock, D. S. G.: 1999, Handbook of Time Series Analysis, Signal Processing andDynamics, Academic press, London.

Reisen, V. A.: 1994, Estimation of the fractional difference parameter in theARFIMA(p, d, q) model using the smoothed periodogram, Journal of Time Se-ries Analysis 15(1), 335–350.

Robinson, P.: 1978, Statistical inference for a random coefficient autoregressive model,Scandinavian Journal of Statistics: Theory and Applications 5, 169–172.

Robinson, P. and Zaffaroni, P.: 1998, Nonlinear time series with long memory: a modelfor stochastic volatility, Journal of Statistical Planning and Inference 68, 359–371.

Silvestrini, A. and Veridas, D.: 2008, Temporal aggregation of univariate and multi-variate time series models: a survey, Journal of Economic Surveys 22, 458–497.

Stocker, T.: 1984, Completeness, distribution restrictions, and the form of aggregatefunctions, Econometrica 52, 887–907.

Szegö, G.: 1967, Orthogonal Polynomials, American Mathematical Society, New York.

Theil, H.: 1954, Linear Aggregation of Economic Relations, North-Holland Publishingcompany, Amsterdam.

Viano, M.-C., Deniau, C. and Oppenheim, G.: 1995, Long-range dependence andmixing for discrete time fractional processes., Journal of Time Series Analysis16(3), 323–338.

Zaffaroni, P.: 2004, Contemporaneous aggregation of linear dynamic models in largeeconomies, Journal of Econometrics 120, 75–102.

Zaffaroni, P.: 2006, Contemporaneous aggregation of GARCH processes, Journal ofTime Series Analysis 28, 521–544.

Date post:	03-Jun-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

VILNIUSUNIVERSITY DmitrĳCelov TIMESERIESAGGREGATION ...celov/dc/dcelov thesis.pdf · as AR(1),...

Documents