+ All Categories
Home > Documents > Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Date post: 22-Dec-2016
Category:
Upload: james-davidson
View: 353 times
Download: 44 times
Share this document with a friend
558

Click here to load reader

Transcript
Page 1: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

ADVANCED TEXTS IN ECONOMETYCS

General Editors

C. W. J. Granger G. E. Mizon

Page 2: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 3: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

STOCHASTIC LIMIT THEORY

An lntroduction for Econometricians

JAMES DAVIDSON

(lxford llniversity Press

1994

Page 4: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Oxjord University Press, Walton Street , Oxjord oxa 6Dp

Oxford New YorkAthens Auckland Banghok Bombay

Calcutta Cape Tozvn Dar es Salaam DelhiFlorence Hong Kong Istanbul Karachi

Kuala Lumpur Madras Madrid MelbourneMexico City Nairobi Jtzrl Singapore

Ttzzc Tohyo Torontoand assoclted companies in

Berlin Ibadan

Published in the Unitei Statesby Oxford University Pyess l'nc., Nec York

(0 JamesDavidson, 1994

All rlk/lfl reserved. No part of this publication may be repvoduced,stored in a rdfrfcrtz/ system, or transmitted, in any

./brv

or .',Vtz'rly means,without the yrfor permsion 2'A; zvriting t)./ Oxford Ublbcrlf.y Press.

Within the UK, exceptions are allouled p respect of any fair dealing .JortheAzlr/t?ycoj research or private study, or crl'li'cl'lpl or reviezv, as permitted

under the Co/wrr/, Desns and Patents Act, zq'l', or fr the case Vveprographic reproduction in accordance zcff/l the terms V the licences

issued by the Copyright Licensing Agency. A'r/tzpzrc: concerntgrcyrotucfpn outside these terms and in other countries should be

sent to the Rights Depavtment, Oxjord University Press,at the address above

F/7l'.book f.ssold subject to the condition that it shall not, by zt?tz.'poj trade or otherwe, be lent, re-sold, Itived out or otheyulise circulated

without the publisher's prior consent s any formof binding (?r coveyother than that l'A; which 2'J is published and xithout a similar condition

i'ac/z/kg this condhion being imposed on the subsequent purchaser

srzk./z Librar.y Ctzftz/twlfag in Publication Data

Data available

Library of Congress Cataloging in Publication DataData available

ISBN 0-19-877402-8ISBN 0-19-877403-6 ('Pbk)

1 3 5 7 9 10 8 6 4 2

Printed in Great Britainon acid-jkee yl/cr by

Biddles Ltd., Guildjord and King's

Lynn

Page 5: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

ForLynette,

and Nicola.Julia,

Page 6: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

&. . . what in me is darkIllumine, what is 1ow raise and support,That, to the height of this great argument,1 may assert Eternal Providence,And justify the ways of God to men.'

Paradise f.,t?l/, Book 1, 16-20

Page 7: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conients

Preface

Mathematical Symbols and Abbreviations

xiiiAXA

Part 1: Mathematics

1. Sets and Numbers1.1 Basic Set Theory1.2 Countable Sets1.3 The Real Continuum1.4 Sequences of Sets1.5 Classes of Subsets1.6 Sigma Fields

2. Limits and Continuity2.1 The Topology of the Real Line2.2 Sequences and Limits2.3 Functions and Continuity2.4 Vector Sequences and Functions2.5 Sequences of Functions2.6 Summability and Order Relations2.7 Arrays

3. Measure3.1 Measure Spaces3.2 The Extension Theorem3.3 Non-measurability3.4 Ptoduct Spaces3.5 Measurable Transfcymations3.6 Borel Functions

4. lntegration4.1 Construction of the Integral4.2 Properties of the lntegral

4.3 Product Measure and Multipl lntepals4 4 The Radon-Nikodym

'theorem

5. Metric Spaces5.1 Distances and Metris

tjmptttt)xjj5.2 Separability and.. .

........

.

.........y

.. t(((...'.!y.)..)

y.. . . . j .5.3 Exampls E)rt)yy),,:yy't,y:r(,q)'yt2 ., r),

....

'''.q' .

E...

(qd

.

y'qd7*

.

-'

i.

-'

;. ..r!...E

. ...E..... .q. ......'.......

j.

38

10121315

20232729303133

364046485055

57616469

757882

Page 8: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Contents

5.4 Mappings on Metric Spaces5.5 Function Spaces

6. Topology6.1 Topological Spaces6.2 Countability and Compactness6.3 Separation Properties6.4 Weak Topologies6.5 The Topolpgy of Product Spaces6.6 Embedding and Metrization

8487

939497

101102105

Part 11: Probability

7. Probability Spaces7.1 Probability Measures7.2 Conditional Probability7.3 Independence7.4 Product Spaces

8. Random Variables8.1 Measures on the Line8.2 Distribution Functions8.3 Examples8.4 Multivariate Distributions8.5 Independent Random Variables

9. Expectations9.1 Averages and Integrals9.2 Expectations of Functions of X9.3 Theorems for the Probabilist's Toolbox9.4 Multivariate Distributions9.5 More Theorems for the Toolbox9.6 Random Variables Depending on a Parameter

10. Conditioning10.1 Conditioning in Product Measures10.2 Conditioning on a Sigma Field10.3 Conditional Expectations10.4 Some Theorems on Conditional Expectations10.5 Relationships between Subfields10.6 Conditional Distributions

11. Characteristic Functions11.1 The Distribution of Sums11.2 Complex Numbers

of Random Variables

111113114115

117117122124126

128130132135137140

143145147149154157

161162

Page 9: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Contents

11.3 The Theory of Characteristic Functions11.4 The Inversion Theorem11.5 The Conditional Characteristic Function

f;r

164168171

Part 111:Theory of Stochastic Processes

12. Stochastic Processes12.1 Basic Ideas and Terminology12.2 Convergence of Stochastic Sequences12.3 The Probability Model12.4 The Consistency Theorem12.5 Uniform and Limiting Properties12.6 Unifonn lntegrability

13. Dependence13.1 Shift Transformations13.2 Independence and Stationarity13.3 Invariant Events13.4 Ergodicity and Mixing13.5 Subfields and Regularity13.6 Strong and Uniform Mixing

14. Mixing14.1 Mixing Sequences of Random Variables14.2 Mixing Inequalities14.3 Mixing in Linear Processes14.4 Sufficient Conditions for Strong and Uniform Mixing

15. Martingales15.1 Sequential Conditioning15.2 Extensions of the Martingale Concept15.3 Martingale Convergence15.4 Convergence and the Conditional Variances15.5 Martingale Inequalities

16. Mixingales16.1 Definition and Examples16.2 Telescoping Sum Representations16.3 Maximal Inequalities16.4 Uniform Square-integrability

17. Near-Epoch Dependence17.1 Definition and zxamples17.2 Near-Epoc Dependence and Mixingales

177178179183186188

191192195199203206

209211215219

229232235238240

247249252257

261264

Page 10: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Contents

17.3 Near-Epoch Dependence and Transformations17.4 Approximability

267273

Part lV: The Law of Large Numbers

18. Stochastic Convergence18.1 Almost Sure Convergence18.2 Convergence in Probability18.3 Transformations and Convergence18.4 Convergence in Lp Norm18.5 Examples18.6 Laws of Large Numbers

19. Convergence in fp-Norm19.1 Weak Laws by Mean-square Convergence19.2 Almost Sure Convergence by the Method of Subsequences19.3 A Martingale Weak Law19.4 A Mixingale Weak Law19.5 Approximable Processes

20. The Strong Law of Large Numbers20.1 Technical Tricks for Proving LLNS

20.2 The Case of Independence20.3 Martingale Strong Laws20.4 Conditional Variances and Random Weighting20.5 Two Strong Laws for Mixingales20.6 Near-epoch Dependent and Mixing Processes

21. Uniform Stochastic Convergence21.1 Stochasti Functions on a Parameter Space21.2 Pointwise and Unifonn Stochastic Convergence21.3 Stochastic Equicontinuity21.4 Generic Uniform Convergence21.5 Unifonu Laws of Large Numbers

281284285287288289

293295298302304

306311313316318323

327330335337340

Part V: The Central Lint Theorem

22. Weak Convergence of Distributions22.1 Basic Concepts22.2 The Skorokhod Representation Theorem22.3 Weak Convergence and Transformations22.4 Convergence of Moments and Characteristic Functions22.5 Criteria for Weak Convergence22.6 Convergence of Random Sums

347350355357359361

Page 11: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Contents 11

23. The Classical Central Limit Theorem23. 1 The i.i.d. Case23.2 lndependent Heterogeneous Sequences23.3 Feller's Theorem and Asymptotic Negligibility23.4 The Case of Trending Variances

24. CLTS for Dependent Processes24.1 A General Convergence Theorem24.2 The Martingale Case24.3 Stationary Ergodic Sequences24.4 The CLT for NED Functions of Mixing Processes24.5 Proving the CLT by the Bernstein Blocking Method

25. Some Extensions25.1 The CLT with Estimated Normalization25.2 The CLT with Random Norming25.3 The Multivariate CLT25.4 En'or Estimation

364368373377

380383385386391

399403405407

Pa14 V1: The Functional Central Limit Theorem

26. Weak Convergence in Metric Spaces26.1 Probability Measures on a Metric Space26.2 Measures and Expectations26.3 Weak Convergence26.4 Metrizing the Space of Measures26.5 Tightness and Convergence26.6 Skorokhod's Representation

27. Weak Convergence in a Function Space27.1 Measures on Function Spaces27.2 The Space C11.3 Measures on C27.4 Brownian Motion27.5 Weak Convergence on C27.6 The Functional Central Limit Theorem27.7 The Multivariate Case

28. Cadlag Functions28.1 The Space D28.2 Metrizing D28.3 Billingsley's Vetric28.4 Measures on D

413416418422427431

434437440442447449453

456459461465

Page 12: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Xlt

28.5 Prokhorov's Metric28.6 Compactness and Tightness in D

Contents

467469

29. FCLTS for Dependent Variables29.1 The Distribution of Continuous Functions on D29.2 Asymptotic Independence29.3 The FCLT for NED Functions of Mixing Processes29.4 Transfonned Brownian Motion29.5 The Multivariate Case

30. Weak Convergence to Stochastic Integrals30.1 Weak Limit Results for Random Functionals30.2 Stochastic Processes in Continuous Time30.3 Stochastic Integrals30.4 Convergence to Stochastic Integrals

Notes

References

lndex

474479481485490

496500503509

517

519

Page 13: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Preface

Recent years have seen a marked increase in the mathematical sophistication ofeconometricresearch. While thetheory of linearparametric models which fonns thebackbone of the subject makes an extensive and clever use of matrix algebra, thestatistical prerequisites of this theory are comparatively simple. But now thatthese models are pretty thoroughly understood, research is concentrated increas-ingly on the less tractable questions, such as nonlinear and nonparametric estima-tion and nonstationary data generation processes. The standard econometlics textsare no longer an adequate guide to this new technical literature, and a soundunderstanding of the probabilistic foundations of the subject is becoming less andless of a luxury.

The asymptotic theory traditionally taught to students of econometrics isfounded on a small body of classical limit theorems, such as Khinchine's weak lawof large numbers and the Lindeberg-Lvy central limit theorem, relevant to thestationary and independent data case. To deal with linear stochastic differenceequations, appeal can be made to the results of Mann and Wald (1943a),but eventhese are rooted in the assumption of independent and identically distributeddisturbances. This foundation has become increasingly inadequate to sustain theexpanding editke of econometric inference techniques, and recent years have seena systematic attempt to construct a less restrictive limit theory. Hall andHeyde' s Martingale Limit Fdt??'.y and its Application (1980)is an importantlandmark, as are a series of papers by econometricians including among othersHalbert White, Ronald Gallant, Donald Andrews, and Herman Bierens. This workintroduced to the econometrics profession pioneering research into limit theot'yunder dependence, done in the preceding decades by probabilists such as J. L.Doob, 1. A. Ibragimov, Patrick Billingsley, Robert Serfling, Murray Rosenblatt,and Donald McLeish.

These ltter authors devised various concepts of limited dependence for generalnonstationary time series. The concept of a martingale has a long history inprobability, but it was primarily Doob's Stochastic Processes (1953)that broughtit to prominence as a tool of limit theory. Martingale processes behave like thewealth of a gambler who undertakes a succession of fair bets; the differences of amartingale (thenet winnings at each step) are unpredictable from lagged infor-mation. Powerful limit theorems are available for martingale difference sequencesinvolving no further restrictions on the dependence of the process. Ibragimov andRosenblatt respectively defined strong mixing and unbrm mixing as character-izations of <limited memory' , or independence at long range. McLeish defined thenotion of a mixingale, the asymptotic counterpart of a martingale difference,becomingunpredictable m steps aheadaspz becomeslarge. This is aweakerpropertythan mixing because it involves only low-order moments of the distribution, butmixingales oossess most of those attributes of mixine orocesses needed to make

Page 14: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Preface

limit theorems work. Very important from the econometrician's point of view isthe property dubbed by Gallant and White (1988)near-epoch dependence from aphrase in one of McLeish's papers, although the idea itself goes back toBillingsley (1968)andlbragimovtlg6z). Themixingproperty may notbepreservedby transformations of sequences involving an intinite number of lags, but near-epoch dependence is acondition under whichthe outputs of adynamic econometricmodel can be shown, given some further conditions, to be mixingales when theinputs are mixing. Applications of these results are increasingly in evidence inthe econometric literature', Gallant and White's monograph provides an excellentsurvey of the possibilities.

Limit theorems impose restzictions on the amount of dependence between se-quence coordinates, and on their marginal distributions. Typically, the probabil-ity of outliers must be controlled by requiring the existence of higher-ordermoments, but there are almost always trade-offs between dependence and momentrestrictions, allowing one to buy more of one at the price of less of the other.The fun of proving limit theorems has been to see how far out the envelope ofsufficient conditions can be stretched, in one direction or another. To complicatematters, one can get results both by putting limits on the rate of approach toindependence (therate of mixing), and by limiting the type of dependence (themartingale approach), as well as by combining both types of constraint (themixingale approach). Theresults now available areremarkably powerful,judged bythe yardstick of the classical theory. Proofs of necessity are elusive and thelimits to the envelope are not yet known with certainty, but they probably lie nottoo far beyond the currently charted points.

Perhaps the major development in time-serieseconometrics in the 1980s has beenthe theory of cointegration, and dealing with the distributions of estimators whentime series are generated by unit root processes also requires a new type of limittheory. The essential extra ingredient of this theory is the functionalcentrallimit theorem (FCLT). The proof of these weak convergence results calls for alimit theory for the space of functions, which throws up some interesting problemswhich have no connterpart in ordinary probability. These ideas were pioneered byRussian probabilists in the 1950s, notably A. V. Skorokhod and Yu. V. Prokhorov.lt turns out that FCLTS hold under properties generally similar to those for theordinary CLT (thoughwith a crucial difference), and they can be analysed with thesame kind of tools, imposing limitations on dependence and outliers.

The probabilistic literature wlch deals with issues of this kind has been seenas accessible to practising econometricians only with difticulty. Few concessions

are made to the nonspecialist, and the concerns of probabilists, statisticians,and econometlicians are frequently different. Textbooks on stochastic processes(Cox and Miller 1965 is a distinguished example) often give prominence to topicsthat econometricians would regard as fairly specialized (e.g.Markov chains,

processes in continuous time), while the treatment of important issues likenonstationarity gets tucked away under the heading of advanced or optionalmaterial if not omitted altogether. Probability texts are written for students ofmathemtics and ag&ume n fnmilinritv wifh fln/R fnllrlnr- nf tlao t.lxlor': 'lqne

Page 15: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

econometricians may lack. The intellectual investmnt required is one thatstudents and practitioners are often, quite reasonably, disinclined to make.

It is with issues of this sort in mind that the present book has been written.The t'irstobjective has been to provide a coherent and unified account of modernasymptotic theory, which can function as both a course text, and as a work ofreference. The second has been to provide a grounding in the requisite mathematicsand probability theory, making the treatment sufticiently self-contained that evenreaders with limited mathematical training might make use of it. This is not tosay that the material is elementary. Even when the mathematics is mastered, thereasoning can be intricate and demand a degree of patience to absorb. Profs fornearly all the results are provided, but readers should never hesitate to passover these when they impede progress. The book is also intended to be useful as areference for students and researchers who only wish to know basic things, likethe meaning of technical tenns, and the variety of limit results available. But,that said, it will not have succeeded in its aim unless the reader is sometimesstimulated to gain a deeper understanding of the material - if for no betterreason, because this is a theory abounding in mathematical elegance, and technicalingenuity which is often dazzling.

Outline of the Work

Part I covers almost a11the mathematics used subsequently. Calculus and matrixalgebra are not treated, but in any case there is little of either in the book.Most readers should probably begin by reading Chapters 1 and 2, and perhaps thefirst sections only of Chapters 3 and 4, noting the definitions and examples butskipping a11 but the briefest proofs initially. These chapters contain somedifticult material, which does not a11need to be mastered immediately. Chapters 5and 6 are strictly required only for Chapter 21 and Part VI, and should be passedover on tirst reading. Nearly everything needed to read the probability literatureis covered in these chapters, with perhaps one notable exception - the theory ofnormed spaces. Some treatments in probability use a Hilbel't space framework, butit can be pvoided. The number of pplications exploiting this approach seemedcurrently too small to justify the added technical overhead, although futuredevelopments may require this judgement to be revised.

Pm 11covers what for many readers will be more familiar territory. Chapters7, 8, and 9 contain essential background t be skimmed or studied in more depth,as appropriate. lt is the collections of inequalities in j9.3 and j9.5 that wewill have the most call to refer to subsequently. The content of Chapter 10 isprobably less familiar, and is very important. Most readers will want to studythis chapter carefully sooner or later. Chapter 11 can be passed over initially,but is needed in conjunction with Part V.

In Part 11Ithe main business of the work begins. Chapter 12 gives an introduc-tion to the main concepts arising in the study of stochastic sequences. Chapters13 and 14 continue the discussion by reviewing concepts of dependence, andChapters 15, 16, and 17 deal with s/ecializedclasses of sequence whoseproperties

Preface

Page 16: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

7771

make them amenable to the application of limit theorems. Nearly all readers will

want to study Chapters 12, 13, and the earlier sections of 14 and 15 before goingfurther, whereas Chapters 16 and 17 are rather technical and should probably beavoided until the context of these results is understood.

In Parts IV and V we arrive at the study of the limit theorems themselves. Theaim has been to contrast alternative ways of approaching these problems, and topresent a general collection of results ranging from the elementary to the verygeneral. Chapter 18 is devoted to fundamentals, and everyone should read thisbefore going further. Chapter 19 compares classical techniques for proving laws oflarge nmbers, depending on the existence of second moments, with more modernmethods. Although the concept of convergence in probability is adequate in manyeconometric applications, proofs of strong consistency of estimators are increas-ingly popular in the econometrics literature, and techniques for dependentprocesses are considered in Chapter 20. Uniform stochastic convergence is anessential cohcept in the study of econometric estimators, although it has onlyrecently been systematically researched. Chapter 21 contains a synthesis ofresults that have appeared in print in the last year or two.

Part V contrasts the classical central limit theorems for independent processeswith the modern results for martingale differences and general dependentprocesses. Chapter 22 contains the essentials of weak convergence theory forrandom variables. The treatment is re%onably complete, although one neglectedtopic, to which much space is devoted in the probability literature, is conver-gence to stable laws for sequences with infinite variance. This material has foundfew applications in econometlics to date, although its omission is anothet judge-ment that may need to be revised in the future. Chapter 23 describes the classicCLTS for independent processes, and Chapter 24 treats modern techniques fordependent, heterogeneous processes.

Pal't V1 deals with the functional central limit theorem and related convergenceresults, including convergence to limits that can be identified with stochasticintegrals. A number of new mathematical challenges are presented by this theory,and readers who wish to tackle it seriously will probably want to go back andapply themselves to Chapters 5 and 6 first. Chapter 26 is both the hardest goingand the least essential to subsequent developments. It deals with the theory ofweak convergence on metric spaces at a greater level of generality than westrictly need, apd is the one section where topological arguments seriously

i l one shoul go first to Chapter 27 referring back asintrude. Almost certa n y ?

needed for definitions and statements of the prerequisite theorems, and pursue therationale for these results further only as interest dictates. Chapter 28 islikewise a technical prologue to Chapers 29 and 30, and might be skipped over atfirst reading. The meat of this pal't of the book is in these last two chapters.Results are given on the multivariate invariance principle for heterogeneousdependent processes, paralleling the central limit theorems of Chapter 24.

A number of the results in the text are, to the author's knowledge, new. Theseinclude 14.13/14, 19.11, 20.18/19, 20.21, 24.6/7/14, 29.14/29.18, and 30.13/14,although some have now appeared in print elsewhere.

Preface

Page 17: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Preface A37II

Further Reading

There is a huge range of texts in print covering the relevant mathematics andprobability, but the following titles were, for one reason or another, the mostfrequently consulted in the course of writing this book. T. M. Apostol's Mathemat-ical Analysis (2nd edition) hits just the right note for the basic bread-and-butter results. For more advanced material, Dieudonn's Foundations ofModernAnalysis and Royden's Real Analysis are well-known references, the latter beingthe more user-friendly although the treatment is often fairly concise. Halmos'sclassic Measure Fdt?ry and Kingman and Taylor's Introduction to Measure andProbability are worth having access to. Willard's General Topology is a clear andwell-organized text to put alongside Kelley's classic of the same name. Halmos'sNaive Set Fct??'y is a slim volume whose main content falls outside our sphere ofinterest, but is a good read in its own right. Strongly recommended is Borowskiand Borwein's Collins Reference Dictionaty ofMathematics; one can learn moreabout mathematics in less time by browsing in this little book, and following upthe cross references, than by any other method 1 can think of. For a stimulatingintroduction to metric spaces see MichaelBarnsley's popularFmc/l/. Everywhere.

For further reading on probability, one might begin by browsing the slim volume

that stmed the whole thing off, Kolmogorov's Foundations of the F/ldtp?-yofProbability. Then, Billingsley's Probability and Measure is an inspiration, bothauthoritative and highly readable. Breiman's Probability has a refreshinglyinformal style, and just the right emphasis. Chung's A Course in ProbabilityFct??'y is idiosyncratic in parts, but strongly recommended. The value of theclassic texts, Love's Probability Fwr.y (4thedition) and Feller's An Introd-uction to Probability Fct??-y and its Applications (3rdedition) is self-evident,although these are dense and detailed books that can take a little time andpatience to get into, and are chiefly for reference. Cramr's Mathetnatical Methodsof Statistics is now old-fashioned, but still useful. Two more recent titles areShiryayev's Probability, and R. M. Dudley's tough but stimulating Real Analysisand Probqbility.

Of the more specialized monographs on stochastic convergence, the followingtitles (in order of publiation date) are a11 important: Doob, StochasticProcesses; Rvsz, The fzzw- of fzzrpd Numbers', Parthasarathy, ProbabilityMeasures on Metric Spaces', Billingsley, Convergence of Probability Measures',losifescu and Theodorescu, Random Processes and Learning; Ibragimov andLinnik, Independent and Stationaly Sequences of Random Variables', Stout,Almost Sure Convergence; Lukacs, Stochastic Convergence; Hall and Heyde,Martingale Limit F/lct??'yand its Application; Pollard, Convetgence ofstochasticProcesses; Eberlein and Taqqu (eds.),Dependence in Probability and Statistics.

Doob is the founding father of the subject, and his book its O1d Testament. Ofthe rest, Billingsley's is the most original and influential. Ibragimov andLinnik's essential monograph is now, alas, hard to obtain. The importance of Halland Heyde was mentioned above. Pollard's book takes up the weak convergence

Page 18: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

story more or less where Billingsley leaves off, and much of the materialcomplements the coverage of the present volume. The Eberlein-rraqqu collectioncontains up-to-date accounts of mixing theory, covering some related topicsoutside the range of the present work. The literature on Brownian motion andstochastic integration is extensive, butKaratzas and Shreve's BrownianMotion andStochastic Calculus is arecent and comprehensive source forreference, and Kopp'sMartingales and Stochastic Integrals was found useful at several points.

These items receive an individual mention by virtue of being between hardcovers. References to the journal literature will be given in context, but it isworth mentioningthatthe fourpapers by DonaldMctaeish, appearing between 1974and 1977, form possibly the most influential single contribution to our subject.

Finally, titles dealing with applications and related contributions includeSerfling, Approximation Theorems ofMathentatical Statistics', White, AsymptoticTheotyfor Econometricians', Gallant, Nonlinear Statistical Methods; Gallant andWhite, A Un6ed Fwr.y of Estimation and Inference for Nonlinear DynamicModels, Amemiya, Advanced Econometrics. Al1of these are highly recommendedfor forming a view of what stochastic limit theory is for, and why it matters.

Preface

AcknowledgementsThe idea for this book originated in 1987, in the course of writing a chapter ofmathmatical and statistical prerequisites for a projected textbook of econometrictheory. The initial, vel'y tentative draft was completed during a stay at theUniversity of California (San Diego) Depmment of Economics in 1988, whosehospitality is gratefully acknowledged. lt has grown a great deal since then, andgetting itnishedhasinvolvedas%gglewith competing academic commitments aswell as the demands of family life. My family deserve special thanks for theirforbearance.

My colleaguepeterRobinson has been agreat source of encouragement and help,and has commented on various drafts. Other people who have read portions of themanuscript and provided invaluable feedback, not least in pointing out my errors,include Getullio Silveira, Robertde Jong, and especially Elizabeth Boardman, whotook immense trouble to help me lick the chapters on mathematics into shape. l amalso most grateful to Don Andrews, Graham Brightwell, Sgren Johansen, DonaldMcLeish, Peter Phillips, Hal White, and a number of anonymous referees forhelpful conversations, comments and advice. None of these people is responsiblefor the various flaws that doubtless remain.

The book was written using the ChiWriter 4 technical word processor, and afterconversion to Postscript format was produced as camera-ready copy on a Hewlett-Packard Laserlet 451 printer, direct from the original iiles.I must particularlythank Cay Horstmann, of Horstmann Software Design Corp., for his technicalassistance with this task.

Lrndon, June 1994

Page 19: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematical Symbols and Abbreviations

In the text, the symbol (:a is used to terminate examples and definitions, and alsotheorems and lemmas unless the proof follows directly. The symbol w tenninatesproofs. References to numbered expressions are enclosed in parentheses. Ref-

erences to numbered theorems, examples etc. are given in bold face. References tochapter sections are preceded by j.

In statements of theorems, roman numbers (i), (ii), (iii),...are used to indi-cate the parts of a multi-part result. Lower case letters (a), (b), (c),... areused to itemize the assumptions or conditions specified in a theorem, and also thecomponents of a detinition.

The page numbers below refer to fuller definitions or examples of use, asa propriate.P

1,i-+

D

&

-, /S, 2<, >

<(

i1x(.)a.e.

ARARMAa.s., a.s.gpt1

absolute valuefw-normEuclidean norm;also fineness (of a q/rtition)weak convergence (of measures);also implicationmonotone convergenceconvergencealmost sure convergenceconvergence in distributionconvergence in Lp nol'mconvergence in probabilitymapping, functioncomposition of mappingsequivalence in order of magnitude (of sequencesl;also equiyalence in distribution (of r.v.s)addition modulo 1set differencepartial ordering, inequalitystrict ordering, strict inequalityorder of magnitude inequality (sequencesl;also absolutely continuous (measures)mutually singular (of measures)indicator functionalmost everywhereautoregressive processautoregressive moving average processalmost surely, (withresp. to p.m. g)

Page 20: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

AC

I A)-, (

40Gm

d OVBnbp)

CLTch-f.c-d.f.C'p,1jC, Qc D

iZ (n4dxny)ogc,jyDA:A6

eSS StlPE.4E. Ix)E. 1V)Hj+ J-1

FCLTF(.)9x(.)9,,,iffinfi-i-d.i-o.in pr.LIELII ,

limlimsup, limliminf, 1imLn)AP-NEDMAm.4

Symbols

complement of Aclosure of Aintedor of Astrong mixing coefficientaleph-nought (cardinalityof (N)dfor every'Binomial distribution with parameters n and pBorel fieldcentral limit theoremcharacteristic functioncumulative distribution functioncontinuous functions on the unit intervalset containmentstrict containmentchi-squared distribution with n degreeg of freedomdistance between x and ycadlag functions on the unit intervaldyadic rationalssymmetric differenceboundary of Aset membershipessential supremumexpectationconditional expectation (on variable x)conditional expectation (on c-field 5')<thereexists'positive, negative parts of ffunctional central limit theoremcumulative distributin f'unctioncharacteristic function of Xuniform mixing coefficienttif and only ifinfimumindependently and identically distributedinfinitely oftenin probability1aw of iterated expectations1aw of the iterated logarithmlimit (setsl;also limit (numbers)superior limit (sets);also superior limit (numbers)inferior limit (setsl;also inferior limit (numbers)slowly varying functionnear-epoch dependent in fw-nol'mmoving average processLebesgue measure

3

Page 21: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Symbols ui

m.d.m.g.f.

m.s.

N(jt,c2)

s(N()

rh,Om h #!

0(.)t?(.)Op.)ov.4

p-d.f.P.m.#(.)#(.IA)#(.IN)H, 1-1

7rj(.)(DF.V.

/?(0,1j.fR yRR+FF+R&S.e.

S.S.e.

SLLNsuPSxvz)

!Jn

c(C)c(mfdx

J/Wpo fdF

fdpz, E

martingale differencemoment-generating functionmean squarespace of measuresGaussian distribution with mean jt and variance (y2

natural numbersK QJ (0lintersectionminimum of m and nfBig Oh' , order of magnitude relationKtaittle Oh', strict order of magnitude relationstochastic order relationstrict stochastic order relationnull setprobability density functionprobability measureprobabilityconditional probability (on event A)conditional probability (on c-field N)product of numbers',also partition of an intervalproduct measurecoordinate projectionrational numbersrandom variablereal valued functions on g0,11relationreal linenon-negative realsextended real line, R t..p(-x,+x1R+kJ f+x)n-dimensional Euclidean spacestochastic equicontinuitystrong stochastic equicontinuitystrong law of large numbers

suptemume-neighbourhood, spheresum of random variablesvariance of Snc-tield generated by collection Cc-field generated by r.v. XLebesgue integralLebesgue-stieltjes integralexpected value (integralwith resp. to p.m.)sum

Page 22: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mlt

FkJ, UUka,bj

w,Vm V nMXLNsv.p.1Xnx, XZl . ll . 1T,l l . l l(x1I,:(la,b)

(f1,T)(f,',p,)(f1,F,F)(f,T;#)(S,*(X,*

( . )'>O

Symbols

shift transformationunionunifonn distribution on interval (tz,:4union of c-fieldsmaximum of m and nweak 1aw of lm'ge numberswith probability 1sample mean of sequence (XrJZCartesian productc-field of product setsintegersset designation', also sequence, arrayinfinite sequencesarraylargest integer S .t

closed interval bounded by a,b

open interval bounded by a,bmeasurable spacemeasure spacecomplete measure spaceprobability spacemetric spacetopological space

Common usages

A,#,C,D,...X,F,Z,...A',F.Z,...J,#,,...:,,qB,5'.I2,C,O,F',..9,V,J't,-..S,5-,X,...1,,Y,-- -

d.pT

setsrandom variables

random vectorsfunctionspositive constantsbounding constantscollections of subsetsc-fieldsspacesmeasuresmetricstopology

Page 23: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

1

MATHEMATICS

Page 24: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 25: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers

1. 1 Basic Set Theory

A set is any specified collection of objects. ln this book the objects in question

are often numbers, but they may also be functions, or other sets, or indeed whollyarbitrary, to be determined by the context in which the theory is applied. ln anyanalysis there is a set which defines the universe of discourse, containing al1the objects under consideration, and in what follows, sets denoted A, B etc., aresubsets of a set X, with generic element x.

Set membership is denoted by the symbol 61 '

,.x

q A meaning <x belongs to the setA' . To show sets A and B have the same elements, one writes A = #. The usual wayto define a set is by a descriptive statement enclosed in braces, so that forexample A = tx:x e #) defines membership of A in terms of membership of B, and

is an alternative way of writingA = #. Another way to denote set membership is bylabels. If a set has n elements one can write A = fxj, i = 1,...,n 1,but any setof labels will do. The statement A = lau, (x e C) says that A is the set ofelements bearing labels a contained in another set C, called the index set for A.The labels (indices)need not be numbers, and can be any convenient objects atall. Sets whose elements are sets (theword dcollection' tends to be preferred inthis context) are denoted by upper-case script characters. A e C denotes that theset A is in the collection C, or using indices one could write C = (Aa: a e Cl .

B is called a subset of A, written B c A, if al1 the elements of B are alsoelements of A. If B is a proper subset of A, ruling out B = A, the relation iswritten B c A. The union of A and B is the set whose elements belong to either orboth sets, written A k.p#. The union of a collection C, the set of elements belongring to one or more A e C, is denoted UA es,A, or, alternatively, one can write

UaecAz for the union of the collection fAa: (x l C). The intersection of A and Bis the set of elements belonging to both, written A rn #. The intersection of acollection C is the set of elements commop to a1l the sets in C, written f-tlsc/tor OascAa. ln pmicular, the union and intersection of (Aj ,A2, ..., Au) arewritten U7=1Ajand O7=1Aj.When the index set is implicit or unspecified we maywrite just U(xAa, OfAf or similar.

The dterence of sets A and #, written A - B or by some authors A #, is theset of elements belonging to A but not to B. The symmetric tft//rcrcc of two setsis A A# = (A - B) k.p (B -A). X-A is the complement of A in X, also denoted Acwhen X is understood, and we have the general result that A - B = 4 rnBc. The null

set (or empty set) is O = M, the set with no elements. Sets with no elements incommon (havingempty intersection) are called disjoint. A partition of a set is a

Page 26: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

4

collection of disjoint subsets whose union is the set, such that each of itselements belongs to one and only one member of the collection.

Here are the basic rules of set algebra. Unions and intersections obey commuta-tive, associative and distributive laws:

A t.pB = B QJ A,

A f'n B = B rn A,

(A t..pB4tp C = A t.p (B ? C),

(A fa #) rn C = A f-h CBf'A C), (1.4)A fa (# tp C) = (A fa B4tp (A rn C), (1.5)

Mathematics

A tp (# fa C) = (A tp #) r'h(A kp C.There are also rules relating to complements known as de Morgan's laws:

(A tp B4c =AC

fa Bc

(A rhB)c = Ac t.pBc. (1.8)Venn diagrams, illustrated in Fig. 1.1, are a useful device for clarifying rela-tionships between subsets.

Fig.

The distributive and de Morgan laws extend to general collections, as follows.

1.1 Theorem Let C be a collection of sets, and B a set. Then

(i) UA rnB = U(A f-h #),AGC AGN

(ii) OA tp B = r)(A kp B),AGC dgC

C

(iii) f-lzt zc (-Jic,

AeC AEC

Page 27: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers

C

(iv) UA = OAc. uAeC AeC

The Cartesian product of two sets A and #, written A x B, is the set of a1l pos-sible ordered pairs of elements, the first taken from A and the second from #; wewrite A x B = (@,y):.:' e A, y G #) . For a collection of n sets the Cartesian pro-duct is the set of al1 the n-tuples (orderedsets of n elements, with the th ele-ment drawn from A/), and is written

nAf =

.(

(x1, x2, ...,Ak): xi e Af,

1- = 1,...,n ) .

/=1(1.9)

If one of the factor sets Af is empty, X2=1Xfis also empty.Product sets are important in a variety of different contexts in mathematics.

Some of these are readily appreciated', for example, sets whose elements ares-vectors of real numbers are products of copies of the real line (seej1.3). Butprodct sets are also central to the mathematical formalization of the notion ofrelationship between set elements.

Thus: a relation R on a set A is any subset of A xA. If @,y)e #, we usuallywrite x#y. R is said to be

reflexive iff xRx,symmetric iff xR y implies yRx,antisymmetric iff xRy and y Rx implies x = y,transitive iff xRy and yRz implies xR z,

where in each case the indicated condition holds for every x, y, and z e A, as thecase may be. (Note: Kiff'

means<if and only if

.)

An equivalence relation is a relation that is reflexive, symmetric, and transi-tive. Given an equivalence relation R on A, the equivalence class of an element

x e A is the set Ex = (y e A: xRy . If Ex and Ey are the equivalence classes ofelements x and y, then either Ex fa Ey = 0, or Ex = Fy. The equality relation .x

= yis the obvious example of an equivalence relation, but by no means the only one.

A partial ordering is any relation that is reflexive, antisymmetric, and transi-tive. Partial orderings are usually denoted by the symbols f or k, with the under-standing that .x k y is the same as y f x. To every partial ordering there corre-sponds a strict ordering, defined by the omission of the elements @,x)for a11

x e A. Strict orderings, usually denoted b: < or >, are not reflexive or antisym-metric, but thy are transitive. A set A is said to be linearly ordered by apartial ordering f if one of the relations x < y, x > y, and x = y hold forr everypair @,y)e A xA. If there exist elements a e A and b G A such that a < x foral1 x e A, or x S b for all x e A, a and b are called respectively the smallestand largest elements of A. A linearly ordered set A is called well-orderd ifevery subset of A contains a smallest element. It is of course in sets whoseelements are numbers thai the ordering concept is most familiar.

Copsider two sets X and F, which can be thought of as representing the universalsets fr a pqir of related problems. The following bundle of definitions contains

Page 28: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

b

the basic ideas about relationships between the elements of such sets. A tnapping

(or transformation or function)T: X F-> F

is a rule that associates each element of X with a unique element of F; in otherwords, for each a7e X there exists a specified element y e F, denoted F@). X iscalled the domain of the mapping, and F the codomain. The set

Gw = (@,y):.z'

e X, y = W@)) Xx F (1.10)is called the graph of T. For :4

? X, the set

T(4) = (F@): x e /t J q F (1. 11)

is called the image of ,4 under F, and for B c 1', the set

-1 F(x) e #J c X (1.12)F (#) = fx:

Mathematcs

is called the inverse rrlzwe of B under F. The set T(m is called the range of F,and if F(m = 1,the mapping is said to be from X onto F, and otherwise, into K Ifeach y is the image of one and only one x e X, so that F(x1) = Tx if and onlyif xl =

.n, the mapping is said to be one-to-one, or 1-1.The notions of mapping and graph are really interchangeable, and it is permiss-

able to say that the graph is the mapping, but it is convenient to keep a distinc-tion in mind between the rule and the subset of Xx F which it generates. The termfunction is usually reserved for cases when the codomain is the set of real num-bers (seej1.3). The term correspondence is used for a nlle connecting elements of

-1X to elements of F where the latter are not necessarily unique. T is a con'e-spondence, but not a mapping unless F is one-to-one. However, the term one-to-

one correspondence is often used specitically, in certain contexts that will arisebelow, to refer to a mapping that is both 1. 1 and onto. lf partial orderings aredefined on both X and F, a mapping is called order-preserving if F(x1) G Tx iff

.;qS .u. On the other hand, if X is partially ordered by f, a 1- 1 mapping inducesa partial ordering on the codomain, defined by <F@I) S F(x2) iff xj f x1' . And ifthe mapping is also onto, a linear ordering on X induces a linear ordering on F.

The following is a miscellany of useful facts about mappings.

1.2 Theorem

(i) For a collection (Aa c aY), F UAa = UF(Xa);(: '

(ii) for a collection (#a i F), T-' Ufa = UF-1(#a);

-1sc) . w-1(s)c.(iii) for # i', F ( ,

-1 z(u)).(iv) for A i X, A g F ( ,

-1 #)) #. u(v) for B g F, F(F (

-1B)c means X- F-1(#). Using de Morgan's laws, properties (ii) andHere, T (

Page 29: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers

(iii) are easily extended to the inverse images of intersections and differences',for example, we may show that the inverse images of disjoint sets are alsodisjoint. However, F- F(A) =

F(A)C#

F(AC), in general. Pal'ts (iv) and (v) areillustrated in Fig. 1.2, where X and i' both correspond to the real line, A and Bare intervals of the line, and F is a function of a real variable.

Fig. 1.2

d 1-1 and onto) so is F-1 These properties thenWhen F is a 1-1 con-espon ene ( .

hold symmetrically, and the inclusion relations of parts (iv)and (v)also becomeequalities for a1l A X and B c K lf Z is a third set, and

U1 F F--> Z

is a further mapping, the composite mapping

UoF: X F-- Z

takes each x e X to the element U(F(x)) e Z. UoT operates as a simple transform--1

ation from X to Z, and 1.2 applies to this case. For C c Z, (&J07) c' =

-1-1

F (& (C)).

0-x E .

. .. . . . .

' '

. .

'

.

'

.

'

.

j c jF(z4)i.1

+'

X1Z)Y

Fig. 1.3

1.3 Example lf X = 0- x E is a product space, having as elements the ordered pairs.x = (0,(), the mapping

F: 0-x E i- E ,

Page 30: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

8

defined by F(0,() = (, is called the projection tnapping onto E. The projection of

a set A 0-x E onto E (respectively,0-) is the set consisting of the second(resp., first) members of each pair in A. On the other hand, for a set B e E,

-1 0- x # It is a useful exercise to verify 1.2 for this case, and also toF B4 =.

check that F(A #T(AC) in general. In Fig. 1.3, 0- and E are line segments and

0-x E is a rectangle in the plane. Here, F(A)C is the union of the indicated linesegments, whereas T(A9 = E. u

Mathematics

The number of elements contained in a set is called the cardinality or cardinalnumber of the set. The notion of %number' in this context is not a primitive one,but can be reduced to fundamentals by what is called the dpigeon-hole' principle.A set A is said to be equipotent with a set B if there exists a 1-1 correspondenceconnectingA and #. Think in terms of taking an element from each set and placingthe pair in a pigeon-hole. Equipotency means that such a procedure can neverexhust one set before the other.

Now, think of the number 0 as being just a name for the null set, 0. Let thenumber 1 be the name for the set that has a single element, the number 0. Let 2denote the set whoseelements are the numbers 0 and 1.And proceeding recursively,1et n be the name for the set (0,...,n- 11.Then, the statement that a set A has

n elements, or has cardinal number n, can be interpreted to meanthat A is equipo-tent with the set n. The set of natural numbers, denoted N, is the collection (n:n = 1,2,3,... 1.This collection is well ordered by the relation usually denoted <,where n S m actually means the same as n m under this definition of a number.

1.2 Countable SetsSet theory is trivial when th: number of elements in the set is finite, but for-malization becomes indispensable for dealing with sets having an intinite numberof elements. The set of natural numbers (Nis a case in point. If n is a member sois n + 1, and this is true for every n. None the less, a cardinal number isformally assigned to iN, and is represented by the symbol Mtl (Kaleph-nought').

When the elements of an infinite set can be put into a one-to-one correspondencewith the natural numbers, the set is said to have cardinal number d(), but, morecommonly, to be countable, or equivalently, denumerable. Countability of a setrequires that a scheme can be devised for labelling each element with a uniqueelement of (N.This imposes a well-ordering on the elements, such that there is a'first' element labelled 1, and so on, although this ordering may have signifi-cance or be arbitrary, depending on the circumstances. It is the pigeon-hole prin-ciple that matters here, that each element has its own unique label.

With infinite sets, everyday notions of size and quantity tend to break down.Augmenting the natural numbers by the number 0 defines the set (Ntl= (0,1,2,3,... 1.The commonplace observation that (Ntlhas <one more' element than ENis contra-dicted by the fact that INand (Nnare equipotent (labeln - 1 e Ntlby n e IN).Stillmore suprisingly, the set of even numbers, E = (2n,n e IN), also has an obviouslabelling scheme demonstrating equipotency with EN.The nave idea that there aredf.xxzlr.o eac meanxz' xlomontc ln N re 1n IF- ;t Axzithnllf lfatyinnl fnllnclntlnn

Page 31: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers 9

infinite subset A of ENhas a natural well-ordering and is equipotent with ENitself, the label of an element x e A being the cardinal number of the set (y e A:A'V A l .

Turning to sets apparently Elarger' than EN, consider the integers, I =

(...,-1,0,

1,2,... ), the set containing the signed whole numbers and zero. These

are linearly ordered although not well ordered. They can, however, be paired with

the natural numbers using the Ezig-zag' scheme:

(1,0),(2,1),(3,-1), (4,2),...,(n, (n/21(-1)''),...,where gxJdenotes the largest whole number below x. Thus, N and I are equipotent.Then there are the rational numbers.

(I) = f.x:.x

= alb, a e 1, b e 1, b # 0) . (1.13)

We can also show the following.

1.4 Theorem Q is a countable set.

Proof We construct a 1-l correspondence between I x J and N. A 1-l correspon-dence between I x I and I x: is obtained by the methodjust used to show I count-able, and one between I x: and ENx: is got by the same method. Then note that

3b INis uniquely associated with each pair a,b4 e INxN ; the rulethe number 2 e'3b is < et a as the number of divisions by 2for recovering a and b from 2 g

' The collectionrequired to get an odd number, and the number so obtained is 3 .

'3b ENb G EN) c ENis equipotent with ENitself as shown in the precedingf2 : a e , ,

paragraph. The composition of a1l these mappings is the desired correspondence. w

Generalizing this type of argument leads to the following fundamental result.

1.5 Theorem The union of a countable collection of countable sets is a countableset. n

The concept of a sequence is fundamental to a11 the topics in this book. Asequence can be thought of as a mapping whose domain is a wellrordered countableset, the index set. Sinc there is always an order-preserving 1-1 mapping from INto the index set, there is usually no loss of generality in considering thecomposite mapping and thinking of N itself as the domain. Another way to char-acterize a sequence is as the graph of the mapping, that is, a countablecollection of pairs having the ordering conferred on it by the elements of thedomain. The ranges of the sequences we consider below typically contain either

sets or real numbers', the associated theory for these cases is to be foundrespectively in j1.4 and j2.2.

The term sequence may also be applied to mappings having I or another linearlyordered set as index set. This usage broadens the notion, since while such sets

can be re-indexed by N (seeabove) this cannot be done while preserving theoriginal ordering.

Page 32: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

10 Mathematics

1.3 The Real Continuum

The real-number continuum R is such a complex object that no single statement ofdefinition can do itjustice. One can emphasize the ordinal and arithmetic proper-ties of the reals, or their geometrical interpretation as the distances of points

on a line from the origin (the point zero). But from a set-theoretic point ofview, the essentials are captured by defining R as the set of countably infinitesequences of decimal digits, having a decimal point inserted at exactly oneposition in the sequence, and possibly preceded by a minus sign.

Thus, the real number x can be written in the formX

'x =,,,(x)1(/)7-)j(.x)10-f

i=1(1.14)

where the sequence (J1(.x), J2@),...) consists of decimal digits (elementsof theset (0,1,...,9 J), px) e (Ntldenotes the position of the decimal point in thestring (thedecimal exponent), and mlx) = +1 if .x k 0, and

-1

otherwise (thesign). When dix) = 0 for all but a finite number of terms, the decimal expansionof x is said to terminate and the final Os are conventionally omitted from therepresentation.

The representation of .x by (1.14) is not always unique, and there exists a 1-1correspondence between elements of R and sequences (>,#,#1,#2,#a,...l only aftercertain of the latter are excluded. To eliminate arbitrary leading zeros we muststipulate that dj y: 0 unless p = 0. And since for example, 0.49999... (thesequence of 9s not terminating) is the same number as 0.5, we always take theterminating representation of a number and exclude sequences having di = 9 in al1but a finite number of places. R is of course linearly ordered, and in terms of(1.14) the ordering corresponds to the lexicographic ordering of the sequencesfmsmpnmdk,mdznmdL...l.

The choice of base 10 in the definition is of course merely conventional, and ofthe alternative possibilities, the most important is the binary (base2) represn-tation,

2d*J>7n(x)2-i (1.15)x = >(x),

f=l

where the bi are binary digits, either 0 or 1, and qx) is the binary exponent.The integers have the representation in (1.14) with the strings terminating

after px) digits. The rationals are also elements of R, being those which eitherterminate after a finite number of places, or else cycle repeatedly through afinite sequence of digits beyond a certain tinitepoint. The real numbers that arenot rational are called irrational. The irrational numbers are overwhelmingly morenumerous than the rationals, representing a higher order of infinity. Thefollowing is the famous ddiagonal'

argument of Cantor.

1.6 Theorem The jet R is uncountable.

Proof Assume a 1-1 correspondence between R and ENexists. Now construct a real

Page 33: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers

number in the following way. Let the first digit be different from that of thereal number labelled 1, the second digit be different from that of the real numberlabelled 2, and in general the nth digit be different from that of the real number

labelled n, for every n. This number is different from every member of the labell-ed collection, and hence it has no label. Since this construction can be performedfor any labelling scheme, the assumption is contradicted. w

We say that d () < c, where c is the cardinal number of R .

The linear ordering on R is of interest to us chiefly for providing the basisfor constnlcting the fundamental subsets of R, the intervals. The set A = (.x:a <

< &) is called an open interval, since it does not contain the end points,xwhereas the interval # = (x:a f x S /71is said to be closed. Common notations aregJ,pj, asb), (tz,:j and Lanb)to denote closed, open and half-open intervals. Aset containing just a single point a is called a singleton, written ftz)

. Unboundedintervals such as C = (x: a <

.x1,

defined by a single boundary point, are written(J,+x), (-x,:), and (J,+x), (-x,:j for the open and closed cases respectively,

where the infinities +x and -=

are the fictional Kpoints' (notelements of R) with

the respective properties .r

< +x and x > -x, for al1x e R . An important exampleis the positive half-line (0,+x), which we will denote subsequently by R+.

1.7 Theorem Every open intelwal is uncountable.

Proof Let the interval in question be anb). lf a < b, there exists n k 0 suchthat the n + llth term of the sequence msmpntndjsmdzn...) in the expansion of(1.14) defining b exceeds that in the corresponding sequence for a, whereas thet'irst n digits of each sequence are the same. The elements of a,b4 are thosereals whose expansions generate the same initial sequence, with the (n+ llthterms not exceeding that of b nor being exceeded by that of a. If a and b aredistinct, n is finite. The result follows on applying the diagonal argument in 1.6tb these expansions, beginning at position n + 2. w

Other useful results concerning R and its intervals include the follpwing.

1.8 Theorem The points of any open interval are equipotent with R .

Proof This might be proved by elaborating the argument of 1.7, but it is simplerjust to exhibit a 1-1 mapping from R onto asb). For example, the function

a + b (b - tzlx

y = +2 2(1 + IxI).+

(1.16)

for x e R fulfils the requirement. .

1 9 Theorem The reai plane R2 = R x R is equipotent with R ,*

Proof In view of the last theorem, it suffices to show that the unit intervalo(0,1q is equipotent with the unit square (0,11.Givep points x, y e g0,1J,define

the point z e (0,1j according to the decimal expansion in (1.14),by the rule

Page 34: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

di-vblllx),i odddiz) = i = 1,2,3,...

Jf/2@), i even(1.17)

In words, construct z by taking a digit from x and y alternately. Such a z existsfor every pair x and y, and, given z, x and y can be uniquely recovered by setting

(1.18)2 O 1j as required. wThis defines a 1-1 mapping from (0,11 onto g , ,

2 Rn for any n e N.This argument can be extended from R to ,

1.10 Theorem Every open intelwal contains a rational number.

Proof This is equivalent to the proposition that if x < y, there exists rational rwith x < r < y. First suppose x 0. Choose q as the smallest integer exceeding1/(v

-x),

such that qy > qx + 1, and choose p as the smallest integer exceedingqy. Then x < (,p- 1)/ < y. For the case

.x < 0 choose an integer n >-x, and then

.x < r- n < y, where r is the rational satisfying n.-x

< r < n +y, found asabove. .

1.11 Corollary Every collection of disjoint open intervals is countable.

Proof Since each open interval contains a rational appearing in no other intervaldisjoint with it, a set of disjoint open intervals can be placed in 1-1 correspon-dence with a subset of the rationals. w

The supremum of a set A c R, when it exists, is the smallest number y such that

x S y for every x e A, written supA. The inpmum of A, when it exists, is thelargest number y such that .x k y for every x e A, written inf A. These may or maynot be elements of A. In particular, infgtz,hj = inftc,#) = a, and supgl,') =

suptc,:) = b. Open intervals do notpossess largest or smallest elements. However,every subset of R which is bounded above (resp.below) has a supremum (resp.intimum). While unbounded sets in R lack suprema and/or infima, itis customary todefine the set V = R t.p (

-x,+x),

called the extended real line. In V,every set hasa supremum, either a tinite real number or +x, and similarly, every set has aninfimum. The notation 2-+will also be used subsequently to denote R+k.p (+x).

l.4 Sequences of Sets

Set sequences fA1,A2,A3,...)

will be written, variously, as (An'.n e IN), (A,,)7,

or just fAnl when the context is clear.A monotone sequence is one that is either non-decreasing, with each member of

the sequence being contained in its successor (Anc An+l,V n), or non-increasing,with each member containing its successor (An+1i An, V n4. We also speak ofincreasing (resp.decreasing) sequences when the inclusion is strict, with c(resp. D) replacing c (resp.Q). For a non-decreasing sequence, define the set A =

U';=1A,,and for a non-increasing sequence the set A = fV=IA,; = (U';=1Aj)f. These

Page 35: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers 13

sets are called the limits of the respective sequences. We may write An'1'

A and d,,1 A, and also, in general, limrl-yx,As= zt and A,,

--->

A.

1.12 Example The sequence ((0,1/n1, n e (Nl is decreasing and has as limit thesingleton f0). In fact, limn-+x(0,1/n) = (0) also, whereas 1im,,-4x(0,1/n) = 0. Thedecreasing sequence of open intervals, ((tz- 1/n, b + 1/n), n e EN), has as itslimit the closed interval (J, :1. On the other hand, the sequence ((tz+ 1/n,

b - 1/nj, n e IN) is increasing, and its limit is a, b4. a

Consider an arbitrary sequence (zt,,1.The sequence Bn = U,a'='=,#4,,is non-increas-

ing, so that B = limn-oon exists. This set is called the superior limit of the

sequence(Anl, written limsupnAn, and also as limnAa. Similarly, the limit ofthe non-decreasing sequence Cn = F1e;=,,A,,,is called the inferior limit of thesequence,written liminfndn, or limnAr,. Formally: for a sequence (A,,, n e EN),

oo Cr

limsupz= O UA,s (1.19)n n=1 m=n

liminfAn = UOA,u1

. (1.20)n=1 m=n /

De Morgan's laws imply that liminfnAn = (limsuwzvlf.The limsup is the set ofelementscontained in infinitely rrltwz of the An, while the liminf is the setbelongingto all but a Jl/c number of the An, that is, to every member of the

sequence from some point onwards.These concepts provide us with a criterion for convergence of set sequences in

genergl. LiminfnAs ? limsupnAn, and if these two sets differ, there are elementsthat belong to infinitely many of the An, but also do not belong to infinitely

many of them. Such a sequence is not convergent. On the other hand, if liminfuA,l

= limsupnAn = A, the elements of A belong to infinitely many of the An and do notbelong to at most a finite number of them. Then the sequence (An) is said toconverge to A, and A is called the limit of the sequence.

1.5 Classes of Subsets

X is called the power set of X, dnoted 2X TheThe set of a11the subsets of .

n

power set of a set with n elements has 2 elements, which accounts for its nameand representation. ln the case of a countable set, the power set is thought of

20 l ments. One of the fundamental facts of set theory isfprmally as having 2 e etht the number of subsets of a given set strictly exceeds the number of itselments. For finite sets this is obvious, but when extended to countable sets it

Mg?mntj to the claim that 2 > do.

113 Theorem 2B0= c.

.Prof The proposition is proved if we can show that 2 is equipotent with R or,E

. . . .. .'

. ..

.

.

'. .

..

. q E ; xqivalently (in view of 1.8), with the unit interval (0,1j. For a set A G 2 ,.: (.2.1.

..' ..' . ( ' ' ' .

))t,' liitsftqt th sequence of binat'y digits (:1,hz,,...) according to the rule, Lbn

td

(..it)(lit'(..i

.

'

()')'.

.':-.

-''

()..... ......'.

'

.... .. .

Page 36: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

14

= 1 if n e 4, bn = 0 otherwise'. Using formula (1.15)with m = 1 ad q = 0, letthis sequence detine an element XA of (0,1)(the case where bn = 1 for a1l ndefines 1). On the other hand, for any element x e (0,1q, construct the set Ax G2N according to the rule, 'include

n in Ax if and only if the nth digit in thebinal'y expansion of .x is a 1'. These constructions define a 1-1 correspondence

EN d (0 11 wbetween 2 an , .

When studying the subsets of a given set, particularly their measure-theoreticproperties, the power set is often too big for anything very interesting or useful

to be said about it. The idea behind the following detinitions is to specify sub-X h t are large enough to be interesting, but whose characteristics maysets of 2 t a

be more tractable. We typically do this by choosing a base collection of sets withknown properties, and then specifying certain operations for creating new setsfrom existing ones. These operations permit an interesting diversity of class mem-bers to be generated, but important properties of the sets may be deduced fromthose of the base collection, as the following examples show.

1.14 Definition A ring S is a nonempty class of subsets of X satisfying(a) O q S.(b) lf A and B e R then A wB e S, A (AB G R and A - B e X. n

Mathematics

One generates a ring by specifying an arbitrary basic collection C, which mustinclude 0, and then declaring that any sets that can be generated by the specifiedoperations also belong to the class. A ring is said to be closed under the opera-tions of union, intersection and difference.

Rings lack a crucial piece of structure, for there is no requirement fclr the setX itself to be a member. lf X is included, a ring becomes aheld, or synonymouslyan algebra. Since X-A =

AC, this amounts to including all complements, and, inview of the de Morgan laws, specifying the inclusion of intersections and differ-ences becomes redundant.

1.15 Dennition A tield 5 is a class of subsets of X satisfying(a) X e 5.(b) lf A 6 5 then Ac 6 T.(c) If A and B G 5 then A t..p# e @. n

A field is said to be closed under complementation and finite union, and henceunder intersections and differences too; none of these operations can take oneoutside the class.

'

These classes can be very complex, and also very trivial. The simplest case of aring is (OJ. The smallest possible t'ield is tX,O). Scarcely less trivial is thefield (-Y,A, AC, O ), where A is any subset of X. What makes any class of setsinteresting, or not, is the collection C of sets it is declared to contain, which

we can think of as the tseed' for the class. We speak of the smallest tieldcontaining C as

tthe field generated by C'.Rings and fields are natural classes in the sense of being defined in terms of

the simple set operations, but their structure is rather restrictive for some of

Page 37: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers 15

the applications in probability. More inclusive definitions, carefully tailored toinclude some important cases, are as follows.

1.16 Definition A semi-ring,/

is a non-empty class of subsets of X satisfying(a) O e $.(b) If A, B e

,/

then A /'n B e.f.

(c) If A, B e,/

and A i #, Hn < cxa such that B - A = U7=1G',where . e./

and . f''h.'

= O for each j, j'. n

More succinctly, condition (c)says that the difference of two '-sets has a finitepartition into '-sets.

1.17Dennition A semi-algebra,/

is a class of subsets of X satisfying(a) X e

.f.

(b) If A, B e,/

then A chB e.l'.

(c) If A e $, 3 n < x such that Ac = Uyzlcj,where . e.t/?

and C) ch Cj' = Ofor each j, j'. a

A semi-ring containing X is a semi-algebra.

1-18 Example Let X = R, and consider the class of all the half-open intervalsI = (tz,!?Jfor -n < a S b < +C,o, together with the empty set. If h = (J1,:1) and Iz

= (tza,:2),then fl f-'h h is. one of h, h, (tz1,:2),(J2,:1q, and 0. And if fl c 11

so that a) f tz1 and bj f b1, then h - h is one of 3, (tz2,J11, (:1,:2j,(az,atj t.p (:1,&2J,and h. The conditions defining a semi-ritzg are therefore sat-isfied, although not those detining a ring.

If we now let R be a member of the class and follow 1.17, we find that the half-

open iniervals, plus the unbounded intervals of the form-xtbl

and (tz,+x),plusO artd R, constitute a semi-algebra. In

6 ji Ta Fields1. g. .

. ...

.

Whn we' say tht a field contains the complements and finite unions, the qualifierhit desyrves explanation. It is clear that z4l

, ..., A,, e 5 implies that U&A ..# j=3 ).' . . . . . .'

.,. '

.' . o . .

'

.

e 5 by slmple n-fold lteratlon of palrwlse unlon. But, glven the constructlv:t Et t : o (jj'I tjue (je jm. tjon 1t 1s no t legitimate W 1*thOut a further stiPul atiO n toE q

,np ttlr . )Eqk. . E

g r.'. ( .) . ... . (.: . '. 7.!Lq':j).:'t?.L''!;'j

. . .: : ..:

. ...

.. . (. . ( ...

... . .

.:.. .

.: ... . ... .'' . . '

.!..p(.

.' .41@.;:)'f.r.(ir..... ...

:!)

. .. E..y. .... .

.. .. 7 . . .

. . . .. .'

... ' g'.

.' '(. . ' : .

' L. ' ' ....

L...'...2 .' ' ;''. ' ...; ' ( (.' '.'

.'

.,

.' ' <. 1p 41 :1 dp dp 4)

EE a.,g,,u..km,,,,o,k,,,.,r+y,,,.,,a,ytt,rtlh n.operatlon can be taken to the llmlt. By maklng thls addltlonal.

'.

... . . j ..

y'

; . g. . .

j'

.j,. . . . . ...... . .... . . . .

.

..

...:

...j ;... ))(,

.E.q).jt..L..j .q.(.....(

.. ... . .. .

...... . E.. ... . . .

...

. .(j;)

.:.. .

. : .. .. 4 ,4k... .:f.(a z( ..k ; E

..i

E/.Lj ) .

(j :.

. ..

.E- '.tr-

-'.

.-..

i'-',-''

.

-'L'

'''.-..'

-'-'

-..

)'

.''.E.

-,'

.-

y'I'EEEII.'.

'..

'-'112::..

:-'.?'t,,(,t''

'

''

!!I1L.,,-

-.'

E..t!1IIp:,I.i....-.

-'

.

.'

.

.

'.

,'

;'

.--...

q'IlkEjIIk,'.

..r,,;-.

'

,:..-t.'E!IIk..'..-:--

..-

-.tt'$'12ii!I.-.'.

-..''-.

:)')'-'

f.

k.-'-'''Ill::iEl1I-..-:'-'

-'r'

-,'.-y'-'

.

t.''-11,;::2:r11.-*.

.'..

t'

-

--''-'

'..--'.

j'

.',''-r.-'II::7!IIk...

'.:;'

.rr

.-'(('

-:-

-'.

---

;,'-'L''

@.' '....--

(.'.

.,.jy-.i

-'L3--';

'...

'''bii'''bi''-''-'

.

--.-

'

.r-...'-'..

E'

.

.-'Iki:t:i;'..

.......

-;.'

...:..--.

.-'.

-

-'-

-'..-.

-

:'.'')'''';:E:!!.,-.

E...

'.'.-..

-'-'

.

4111E::::,..*..-

.'.-.

.'.

..

-.

''IItEL.FX

..-

-'-':I.EIII.-.CEII-.'.

.. .

-'-'

..:111:7:!1.-...'....

.!112::*.

.

.

q!IkE7'r!'k..'.

.

''Ikiii!r:-'.

. .-'.EEE:!!:.,11::2::,1.-:111:7:!1.-.

''k::E:!:.'''Ikii:rr:-d(!11:::1,-*4:11::.*

.11:2:::,...

'1I(ii''''Iii!'.k.d

-11::::::,11;-.......'

.:!1211-----*:11...

'lk:!r:-.

':!1L.*.

'lk::Ei!lk..

.-''

'.Jl.iE:':'E:.. .' .;E.) '.'.E

.: :E... j:j)j ;.; j ...

..q/.:I.jLL. ...g...q.( t):;.......

;.r......

y..(! ....

. (.... ......:

'

g' ...'

.

'. .'.

' .' ' .E.'.p.:

...;. j.(....' .'...:;... ... y.jjj.(q...j(;Ejrr:k..j..g .tltzLLL....r jyy;;.jjy.....jj.lyytjr.y : ;.g,..yt:jj,j...gy..

.y . .:yy: y ... y.

. ,

...t.....

. :.. .

. . . y . y ..

.. . . . ,,

,, ,,

'i.7EyEi. ..i'.. ..

LjL.L..kLsl/tq(E/. ;:,.. .

.yt($.?E.r(r ,Ej.,lujut9'rt,ywpy

;bs? '

uljjjtlAqoujyyjd

(tsajgebra)5 ls a class Of subsets Of X satlsfylng..*1.:'E.LLj;L.Li.E k'JE:7.:..

. ..LC4jIIE..

:';'

ryjj''.'gt)..2.2

.:jy.yj......y;'y'y...y(iy(y)(yy:jj..yy..:y(j....

,y :..,.

.. ....:.......,..y

u.....r.

. ...;...:. j7.

.. . .. q . .. ... . .

k..

3.k()('..(:.g).-.Ef .l ;i.EE-)!'(;ttjii;.'t;..'.j'tp:('.())

yqqyr;J)t)?.p:....;;.;,,

..y).

...,

.. ..;. . y

. .. ( ..

-

.

... .. . . . . . .

.jryly?)...j!.y..yy.yq..:)...q)....y.2..2...,..J):

. -y..F..:..:..:.

.;..LtEL...;.,k :- . :. .(y..

.,k;)jjjtyy........

q..;..y....

.jjggggyy;:

... y.!kjjj;;r;;).,. ....

C--qy;'E

:-'

.LLLq'.

..

j'kt'qi'.

(.

F';'k'

:. E.

i.'.

.-'E..E

;..('

..

''

.

-'

''.,...E-.('.

.-

.

........E.i'.

4j'

'.4:CE.;.

E'.

.E

;d

r

F'-j.

j

-'jd

y..r.

.d

q----.

''.j'

.q

)-''E;LL.'

..

.:.

('

.q

(j.,'(';'

.

!111:(q*-

.,.

;.....

g''

):;1211E11.....y

.'

j,

s.d

.

.

,'

(lzzllp'l.'y.

.'

-jy.g-....i

y't':.

.

-'.

.

..g

..

...

r'.';'.y'

.y..

r'.

-

j'yrd.'.

yj.yr.j

j;dy';q'

.

.)Ey......

;'

.q.' j

r'

...

....,.....

.. . .

y..y

('.

.

,.......q

i--d

..

.. .

.....'.

Eitttliyitztlr rtjyEiljzqttigryyfkoyyyyyyjgy )j; yyyjj yju yyjajjyyyu y ts

g tjuyyy gc y y.

)'()'

''.

:('.

.k-

g'

'.E

'

!':;'

..!'

y')'

E'('

('.

.Ei..:.

q'd

!!!'

.'

i.q:$.ig.

r'

..

gE.jys,lt.)-

jjli'.

i

(('t''

jjy.-

gy

. .

24jg*

..ty.

(';':')'r',

yy.

;'yyd

rtyj.

y'

..jjjj:jjj'.

g(g.

'jjjj:yr:jjj,';

.

y..

y(jj:jjj....

:

j..'

.. .yy, ..

j'

.

;'

- y.y

j'

.

;'

-

y'

.., . - r.

.- ,.y.....'..y...

..'

.jy.....

..... . .

y'y'

. ...'

.....'

.....,,,'

l.E;. i:q;y.'#:'l:'j':.j'.jr:!.k:::r.)(

'rit:-.()..r.

.; ; yty.tk-.tr-

j,-

yy.rykittr;:L:);y..

.Lj.,L..LjL;

j-

.

..y;)

.;.y.....yy.;....q. t. .-.

y y...

,.

...

:....

.jL'L'q(jLL';I.t: '.t;tEqq;;'l.,.);.-') '

:.q(gE.y))gyk!r/.:i..tt'.j:r.tii)'i;r...,j,q2..

t)..;).(,LL.;..;...(..).

..;. rylil.t:..r..r)...-,... j;.......q .. y.;... . .

,.. ..

y.p;

y.r,.. . . ..,, . . ,:,.,:,,

;;)).y;..)-k.t8.l9)-.,:.(

r.@'ry'.'y..r?i..E.!:.....

.(::.-tgy..tr..;y(jttyrykt',.E..yi.).;7..i-tlgy'tt..kq.j4:yy.-.-..,.it::!:j-.,..(j;jj..,....'.,-L..).;;;.)-L66LLj6q?'''.-t;,

. .)L;jt..).kjjjyjq(y,jj,d!!1i-,;

....)

.

L,tjtLL( ?.jj,....E..-y..j'tj'L.- ,...EEEEEE.-.

y jljrhhhjj.E(gjj...-..(jl.:jjjgCIE1j,.jjjj;,442!2:rjj:rljjs.1k.k,j.41222:;(jlr:jl.4(222:,jjzkr:i 44:22,),rjjjq-rjjj;;!;;q,,... :jjj;,4k22::;.jry,

:jjji,,rjrr,(jrrjl.4(22::;(jrr:lk.1k-,,,,-Jj(jj

.j(rtrp1k.

4(4E-0.

#':)'.(@.-)--81'*

'.)..'l.s'

'-8)*

t'i-

-(r)'.

E'

)'''

.?-'E;.!-EE (!

r'

iIILLL,.:LL..).L.

.)r,'

j(.i.

:L';''

@i.y-1...

i@'.

.t.

g')j;-',',--))'-'

jr.

;'

.-.tgg

;;'

.-.

):'.

..:,y.r

?.'

.....i.-..'.;

j.'

.

''.

jygy..-

y'(y'.

..j;.

-,.t;t.

-')'

;!......;:..

,'

p-

k'

..

-'

E..

)'

--.'.rFti..

.-

E. .-.

(E. .

E'

.'...

. ..E.

-'

...

:'

.-.

..

.

.

...'.'

...............:.1-:.9.:;:;::::;'

iip'.l!:d

'E

2)1-.*:::*--

j)r

Ft@-'.

.

133*.*

pi

j'i'.i:11,*.lrrq))d

.ql.j--

@'.

.E!

@@'

E..

-'

(.)i.

;'

El(;..?-

-':'

;.r,r,.:r

rjq.))-...'r:)..-'

. .kyk..

yy..,';.-:

.kr

y;j-yy;',.

.jyy.i

(,'.:y'-'

...rj..

,'..(..

.

(t'

.-j-y

yy.'-r)'.'

.-.-:

ty...'.j'

-. -,t-.r.

;'-'

.

.:

k'y'ijd'jy

...;

jr'r'j,'

.-.;.-

.-'.('

.

-

'

q'...,'

(..-....-.'..

-..

. y...'

.j...

-.

-'

.

-'

...' -

.

. .y.

,'

..-.........

.

.

.

. .

..

..

..

..

y'

.

-,-'

,,,-,,

rjqj-dltdllt.-l'ttdytd)tI-'.

'

py'i'l'g

kL.

-,.'tp..-.(t.

;.

;);.;-'

E'--d)'

.'-'k'

qE:18-.'.''.i

-.'

.

;'r'))'.

t'F-i'''''..

!!)p'.('

.

.-tq'-'-q'

(y;j--.

(k,-':';'j,'t)'

i . ;

.:'t'

-;

-'

.. -t;

;'

..

,'

. E..

-.-

-iji

(-y

tjd

.q;,..

.'

r'

.

jt'.?';E-;r'.'

.E

-'

.''.

()..'

..-.' .

E

r'

...E....

-q'-'

.-..-.

-'

.... ;.

.-.' ;...- . .

... .

.'':.('..1.?t..'tlt.t.,t.iy.'t?.'rtA)'.i.')y.j.

yj).r'!jz'.,..,jp'.'..-''..-

.

.l,'.k')'L..y.-y1,y-,,.f33f.-y.,.-.g.1.j..-..d..,)....y.;,..y...r.-y.-.st$y-.

s.t.).....,-.y....ytt,..c...y.-....1.-,y.,4tE:1'.....,....,.)s,j.,,..-.e......,4(i!I.....-!1-2Ik.

..1ty!1...d...

.e...L6:-.. 112).h...dki:r. .:E:j..jll:)p4t::.

.11:-:5'k1L11.4:::,.llrlutdk4(22).:lir4::::4:::).llrqlrrkpqlL4:2::IIF:I::IL1ki2:llrl.1t):5,k1l:(1.4(22).qlrl.IEEIkIIFI.1:EIL4::2-.::;7.u.;p';!rJlrck!q;l-!rdtgr11.,1.Jlr!l.ql..::2).Ilrl,)''-ki-b.'':$l8t,E-.k;#)),pl-(i'.-.'-

k,'-.pt..:.,2,,

j,b.slbtl.#-''-it---tl),,:...,.)ki-f-L..;L.-.?-.....--.--;,-,. ,,,gt;:t!jj:jjj.j.r)k.-...y--L(-q...L.t..-.-y.

..----.t.......p.-..

#.....t...

.....';-.7.

...'. --..'-...

.

'. .'..

.' . . . . .%)'ttytt/ptil.4.)qtthk.tp,yut.gcqqyytyth,.; 4)',, d..,'',,M. rgn laws, of countble 1ntersecti on also. A c-n'ng can be

it:ipd

Cit-'.'''

t:'.-'-ll)ll'-'

''

qrrrdkiy--i---.d-,jtls'ltdkdi.lhlikirrstipp.:fl.rdijyipl,.dtsil----'k-ljptjs..!rrl.q'.)'

.'''.

.7.y-.'

-:tr!)i...ti:,jp':F'--)?;t--'''

.

;;!,'-:'.-j--!,r,!tk'.

,-jb'L)j-..L'tjLkL33,?.,.q-',i-..-j,qk..,;r'j-.t!)'--,[email protected],ttd

.

--y!j;;'-j-

t-.

.y)'.

..y,..

yjd

,;j),

. 2.kkE

.:'..

yy.;'.-'

...':jls')d.

pFq'.,'

t('y',;'

..(j,-.E.:.

k'

-g-

j..j'-'.t(...y

y'

.--...'rpr'y,-.'-

E

qF.

,y..

.j'

..y.

,'

.

-'.'

.

yjjjjjyryjjj.,,d

..'.k'(.E

E.E.

....'.'

...-.'

..

E.

E'

-.

..

.::.

.jC.y.E,...-..r.;E-.

-'

..

.

..'

..:.

....'

...'. --,.

,-,'

..

. .,,,''4kk,,.,)'#',,,..)4i;w..t , t

''j-.k,

.t,

) ,.'.

; , , ,ar.,t

yttykk,yyal.. t ,p.(), 'tj

g14thls ls tlot a concept we shall need ln the sequel .iis@(r,t--k...

..

)j.rq(.. t;-j,;, . - . . , . . .

J?'1'lt7l7)li)'yE--jp:ytq-l'h@i'-.--.j#i.tt)r:-.)k2!l$,j);..

.,.

..Lti-.(yy....hk-...'i'..-.-i'.---j;.r-,-,Ljb;LLb6..-?,--y)k-ttg.-tjE),,r.tg2s;-...:,:.5-r):(t..---r@t()kttr;jpki-):.y.t2r).-.,-.)j.Lb1,,.q--.r:kjt.-)ykyy.tjj..t:-.t.-)-t...?t-(..--.-..;-yy..,F.r-;..-...y..-.:..)...;..y..--.y.y....-...........'...-..-.--..:.--r-.(;.-),-.t....yqE.y......-..

.. ..

Eq.. .. .. . ,, .

.

.,,

.-- -,

.--7>1'h:4 ,, . ? -

l,.y, ,, .) jx,,s,,.tzk

oys,:yo,i, ,q xe.)ts..T?th: lnt:rsectlon of al1 the c-flelds contmnlng C ls33:13333,..6,31)31).. qtp. . .

.. . : . : :6 E-@jjjgjjk.:

pqy),jj-,, q?. .yyyyjjggk.,y. .

.

j,yj(. tygyj;.y .( jy.yjyyt;,y(sjy.y,rty..y..j,,.yyy....y,y.y j.j..j....)...j....j.,..

.

,,,...

.

., . yjyg.. .... . . . . ... .y . .

(E.'t(l!jt)t2),. 2L2t1jjk1,(1....,i(-$;ydjy-..

.

... .qk.... ..,. ..

:;)..y.,@,

;(y.tjrj.;,?,..,4j,j)),,;jjj,-j)))jj)..,;;,)j,,j...

,.

kf,.

?jjy-.$,,;gjjjr..tyy(,.k;.y;jy;j.-(j(:,.tjyyy-LILLIL)jyjj,,jjjjjj,yyyyyjy..ty.jjyjjy.t.yjgy,yyyyy)),.jjj

.,.)......),),.....,).j

y.s,.y,.y

yj.,..y...

........-........, ..g...s.sy. ..... . .

..

.. s.. ...

...

....- . T. .r.. .

.

.

. ..

: ,.

'!. .. .

.. . :

...

). ... . . jlj.. :jjjy... . ..

.:

.. .j3b;,jbbb;$LLL,).jtfjj;6;3$j(...)g-jjjy)j.).)4jj..j)j.:.,;j.);.j,j;.jj;.)j,.j.L..,.;..j

y,.;.y.j.q,jj.k.jj,..,. j..

, ... . ... . . . . .. . ,';-. ..

. ..'r-,-iiq.-r),-iyr,!!--..!it,.. .'''......

:7',t.,..;r,,#.-.q$3t...$.-<...,.=-..'..-,,ibn

);,-':'!'tk-.,- ... ,

.....,.--

.'

.

Page 38: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

16

calledthe c-field generated by C, customarily denoted c(C).The following theorem establishes a basic fact about c-fields.

Mathematics

1.20 Theorem If C is a finite collection c(C) is tinite, otherwise c(C) is alwaysuncountable.

Proof Define the relation R between elements of X by Ry iff x and y areelements of the same sets of C' . R is an equivalence relation, and hence definesan equivalence class & of disjoint subsets. Each set of & is the intersection ofa11the C-sets containing its elements and the complements of the remainder. (For

2 & is the partitionexample, see Fig. 1.1. For this collection of regions of R ,

defined by the complete network of set boundaries.) lf C contains n sets, & con-tains at most ln sets and c(C), in this case the collection of all unions of&-sets, contains at most lln sets. This proves the first pm of the theorem.

Let C be infinite. If it is uncountable then so is c(C) and there is nothingmore to show, so assume C is countable. ln this case every set in & is a countableintersection of C-sets or the complements of C-sets, hence & g c(C), and hencealso U4&) c(C), where flX&)is the collection of a11the countable unions of&-sets. If we show t/(&) is uncountable, the same will be true of c(C). We mayassume that & is countable, since otherwise there is nothing more to show. So 1etthe sets of & be indexed by N. Then every union of &-sets corresponds uniquelywith a subset of N, and every subset of ENcorresponds uniquely to a union of&-sets. ln other words, the elements of U4&) are equipotent with those of 2N,

which are uncountable by 1.13. This completes the proof. .

1.21 Example Let X = R, and let C = ((-x,rj, r G (E1), the collection of closedJltilt/'-ff/r6ywith rational endpoints. c(C) is called the Borel jield of R, generallydenotd B. A number of different base collections generate T. Since countableunions of open intervals can be closed intervals, and vice versa, (compare1.12),the set of open half-lines, ((-x,r), r G (EI), will also serve. Or, letting (rnJ b:a decreasing sequence of rational numbers with rn 2, x,

(1.21)

Such a sequence exists for any.x

G ER(see2.15), and hence the same c-field isgenerated by the (uncountable)collection of half-lines with real endpoints,((-x,x), x e R ) . lt easily follows that various other collections generate S,including th open intervals of R, the closed intervals, and the half-openintervals. n

1 22 Example Let X = F, the extended real line. The Borel field of F is easilygiven. It is

T = (#, B tp (+xJ, B t.p (-x4, B QJ (+x) t-p (-x): # e Sl,where B is the Borel field of R. You can verify that S is a c-field, and is gener-ated by the collection F' of 1.21 augmented by the sets (-x) and V. n

1.23 Example Givn an interval f of the line, the class BI = LBf-h f: B e B ) is

Page 39: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers

called the restriction of T to 1, or the Borel field on 1. ln fact, T; is thec-field generated from the collection C = ((-x,r1f-h f: r e () ). n

Notice how c(r) has been defined from the outside' . It might be thought that c(r)could be defined tfrom the inside', in terms of a specified sequence of the opera-tions of complementation and countable union applied to the elements of r. But,despite the constructive nature of the definitions, 1.20 suggests how this may beimpossible. Suppose we define .Wlas the set that contains r, together with thecomplement of every set in T' and a11 the finite and countable unions of the setsof r. Of course,

.d1

is not c(C) because it does not contain the complements of theunions. So let

.4z

be the set containing W1together with all the complements andfinite and countable unions of the sets in W1.Defining 45,

.44,...

in the samemanner, itmight be thoughtthat themonotone sequence (W,,) would approach c(C)as n

--->

x; but in fact this is not so. ln the case of the class fm,l!, forexample, it can be shown that Wxis strictly smaller than c(C) (seeBillingsley

b aller than IX This fact is1986: 261. On the may e sm .other hand, c(C)demonstrated, again for fm,1), in j3.4.

The union of two c-fields (theset of elements contained in either or both ofthem) is not generally a c-field, for the unions of the sets from one tield with

those from the other are not guaranteed to belong to it. The concept of union forc-fields is therefore extended by adding in these sets. Given c-fields ; and N,the smallest c-field containing a11the elements of 5 and all the elements of Wisdenoted 5 v V,called the union of 5 and V. On the other hand, 5 r'A W= (A:A G .

and A e V') is a c-field, although for uniformity the notation 5 ZN V may be usedfor such intersections. Fonnally, 5 zq N denotes the largest of the c-fields whoseelements belong to both 5 and V. Both of these operations generalize to the count-able case, so that for a sequence of c-tields 5n, n = 1,2,3,... we may defineV= 5 and 0* l;a.

n = 1 n n =

Without going prematurely into too many details, it can be said that a largepart of the intellectual labour in probability and measure theory is devoted toproving that particular classes of sets are c-tields. Problems of this kind willarige throughout this book. It is usually

ever We 5, but the requirement to show that a class contains the countable unionscan be tough to fulfil. The following material can be helpful in this connection,

not too hard to show that Ac e 5 when-

A monotone class A is a class ofwith limit 4, and An e A for a1l n, then A G A. If (Aa) is non-decreasing, then zl

LU7=1An.lf it is non-increasing, then A = O0J=Iz4,,. The next theorem shows that,

sets such that, if tAn) is a monotone sequence

to dtenine whether or not we ar: dalinf with a c-field, it is sufficient toctmsl'detwhdherthelimits of monotone sequences belong to it, which should oftenb ' esiet to establish than the general case..' . .'... ' . . . ' . .

. . ...'...'.....' jr

'

. .. '. . . .

.. .. ..'.'.'...'..

g.'...,

'

... .

. .l.'.'

('

g...(::.''..

''t'

llj.q)

.y..... .

'.l.

..

'. .

..'..'

.'..'

''''.

(E E 1);,4Flrqh:oykm

?;; is a c-field iff it is both a field and a monotone class.. E.t:)2

.

.:.y :

y.

.r . . .. . . .

k'

t .j ' ' .. ''..Eq(

r p,,k))

tjtjjyThe..ftjjjjy, ito patt of the theorem is immediate. For the 'if'

part definei ( : ( ( ; .;

.( . .

,,',.tE'jt.ty

;, t.ri.jjryry.yi.,yuity(t.:jy;)

yq y. j.r,. r'.1ttlt:';(.):t..,.(.,)i'.:..'..y.;tt.t,jiyA......,y-.yy)j,ytj,.i...y.yE.y;;r.,...myt;y.U,)......)......j,qbqf...,.ni;..L.,;.,jfL:..;

..g.j......,. :jkq-.j(jjji.

....

..:..,..........j1r:lg.(yj,?-..

s.u...1:;(jj..,,krIg....4, )1:41..4:;::e.(.

-jlqilii;,;j;r

...4EEEE!t;i:;r,, jpp:p-;,r,(EEIj!..Ijj., tlij;rji.jlrrjs4:2::422:::g;pC;:ji.:;; igjl.qlj?ii.jkj:r(1q1:2j1.,,

-k,(

,b, 4:22EEtjjii:;sjjy-plrqjjjjr-iylrjrrqlsk:j?,.

ild

''td't'ld

)''ikE

ttdp'

E'

);')'

.yji.q..

.'E

.t.y().

jyyd

;.gj;

t'.

..-..

.

.'

- (jr

trytdkrd

Ey-

,'

;;;(-(.;..i..-E

jr';ty

y-.tj

lr'yyy

yy.rtg-

-.

-

. .-.

.

.

y'

--

.r(.;jj-

y'yyy'.yyjjjjyjjjjjjiy.d

(...:.

:t'-'.

y'

..,

r'

.

C'

.qy...E-....;

y,d

).

-'

...

...'

...-.

.

.

y:-.

..

'

......E-.

r'

...

.

.

..'

.........-;::,---::-- .

.tttttLL);,;jtzjj ulyytjtr tjtytytyt,x t rj y ) is a monotone sequence with limit U =1A,,e ;, by'-L$1t..-))i-.-)')r'--t)--... tki);.y-):-),::::r,:--..-,.-(jkyr))-;-..---,r.

t;))---..L3LLLi..-.j/l-rjt--j,--;.,-(.$;.,q;-.,-LLLL1L---(t)-.)rrtt.,..,)))-y)#)---.r),-.-.;-)y)-.-)jjt.-.-t;)--?--.--,r...r-....;.LL;L;.,.;j;;-),.,;b;qj,.,.-yt.-..,7r-.-.--......(-.......).-....-......,......-.... .. -......-.......

........ .

.....

.

.

.. . . .

. . .-----.-'--------8-'-:'''*''''%'''--'f'':tt--'''-?'''t''-'ttl'i-plt:'-it).Eti,-3$bqL-;--....').L.!.#,..-t,)kr.?.-.-;r-..t...)k.-(--..-t..-.ti..-)-.,)---)'.-.-.t.....-..t-.,....-..--...-..

.....

........

....-.........- .....

.-....--

-p

. ..-..

..

-. yryjyti..... #,,.. . . .yyyiyyyi-jy-j-t,yry..14441111@11.......... llt))k...-.....t'y4?.i@''... Li.::?)i..-..$kr'-'t)L'k.tlll.q-?..'..I3;......;-;t.-b.)- . ..

..-... . .

,. - ...-.-. . . . , ...

Page 40: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

18 Mathetnatics

assumption.Uo;=1An= Ur1=lAk, so the theorem follows. .

' theorem.l To develop this result, we defineAnother useful trick is Dynkin s zr-two new classes of subsets of X.

1.25 Definition A class T is a z-system if A and B e T implies A rn B e T. A class.f is a h-system if

(a) X e 1.(b) If A and B e

,f and B c A, then A - B e 1.(c) If fAn e

,f)

is a non-decreasing sequence and Au'1'

A, then A e . n

Conditions (a)and (b)imply that a l-system is closed under complementation (putX = m.Moreover, since (b) implies that Bn = A,,+1- A,i e

.f

for each n, (c)implies that a countable union of disjoint d-sets is in 1. In fact, these implica-tions hold in both directions, and we have the following.

1.26 Theorem A class,f

is a l-system if and only if(a) X e 1.() If B e

.f

then Bc e 1.(c) If (Aa e

,f)

is a disjoint sequence, then UnAn e . Ia

ln particular, a c-field is a X-system, and moreover, a class that is both aa-system and a

-system

is a c-tield. This follows by 1.24, because a-system

isa monotone class by 1.25(c), and by de Morgan's laws is closed under unions ifclosed under both intersections and complemntation.

The following result makes these dehnitions useful.

1.27 Dynkin's x-A theorem (Billingsley 1979, 1986: th. 3.2) If T is a a-system,ctfis a l-system, and P c 2, then c(T) c 1.

Proof Let kTj denote the smallest-system

containing P (theintersection of a11the

-systems

containing P), so that in particular, (T) c M.We show that 3.(:P)isa l-system. By the remarks above, it will then follow that (T) is a c-field, andhence that c(T) c &T) 2, as required.

For a set A e 1(T), let Vztdenote the class of sets B such that A rn B e (T). Weshall show that gk is a

-system.

Clearly, X e Vz, so that condition 1.25(a) issatisfied. Let #1, Bz e 5k, and Bj c #2; then A rnfl e 14T) and A fa #2 e (T),and (A r''h#1) c (A fa #2), which implies that

(Arn #2) - (A t'''h#1) = A f'n (#2- #1) e 14T). (1.22)But this means that Bz - #1 e Vx, and condition 1.25(b) is satisfied. Lastly,suppose A f''h Bi e ,4T) for each i = 1,2,... and Bi

'1'

#. Then zt f-h # e (T) by1.25(c), which means that 1.25(c) holds for Vx,and skis a

-system

as asserted.Suppose A e P. Then B e P implies A PB e T p is a a-system) and since P i

(T), this further implies B c 5k. Hence P c 5k. Since Nxis a-system,

and '4T)is the smallest

-system

containing P, we also have (T) c gk in this case. So,whn A e T, # e VT) implies B e V',A and hence A fa B e (T).

We can summarize the last conclusion as:

Page 41: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Sets and Numbers 19

(1.23)(A e P, B e 1(T)) = (A OB e (T)).

Now defining Vs by analogy with 9%, so that

(# e (T), yt PB e (T)) = (A e Vs), (1.24)

we see that (1.23)and (1.24)together yield P c T. Since Vs is also a-system

by the same argument as held for Vx, and contains T, 1(T) c Ns by definition of(T).Thus, suppose B e (T) and C E (T). Then Ce Vs, which means that B f''h Cq(T). So 1(T) is a n-system as required. .

+.

Page 42: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

2Limits and Continuity

2.1 The Topology of the Real Line

The purpose of this section is to treat rigorously the idea of inearness',

as itapplies to points of the line. The key ingredient of the theory is the distancebetween a pair of points, x, y e R, defined as the non-negative real numberIx- y l, what is formally called the Euclidean distance. ln Chapters 5 and 6 weexamine the generalization of this theory to non-Euclidean spaces, and find notonly that most aspects of the theory have a natural generalization, but that theconcept of distance itself can be dispensed with in their development. We arereally studying a special case of a very powerful general theory. This fact may behelpful in making sense of certain ideas, the definition of compactness forexample, which can otherwise appear a little puzzling at first sight.

An e-neighbourhood of a point .z7

e R is a set Sx, = fy: Ix-yI < :1, for somes > 0. An open set is a set d c R such that for each .'r

e A, there exists for somee > 0 an e-neighbourhood which is a subset of d. The open intervals defined inj1.3 are open sets since if a < .7 < b, : = minl 1b

-x1

, la-x1

l > 0 satisfies thedefinition. R and O are also open sets on the definition.

The concept of an open set is subtle and often gives beginners some diftkulty.Naive intuition strongly favours the notion that in any bounded set of pointsthere ought to be one that is <next to' a point outside the set. But open sets aresets that do not have this property, and there is no shortage of them in R. For acomplete understanding of the issues involved we need the additional concepts ofCauchy sequence and limit, to appear in j2.2 below. Doubters are invited tosuspend their disbelief for the moment and just take the definition at face value.

The collection of a1l the open sets of k is known as the topology of R. Moreprecisely, we ought to call this the usual topology on R, since other ways ofdefining open sets of R can be devised, although these will not concern us. (SeeChapter 6 for more information on these matters,) More generally, we can discusssubsets of (Rfrom a topological standpoint, although we would tend to use the termsubspace rather than subset in this context. If A c S ? R, we say that A is openin S if for each x G 4 there exists Sxtzj, : > 0, such that .(x,E) f''AS is asubset of A. Thus, the interval (0,) is not open in R, but it is open in (0,1).These sets define the relative topology on S, that is, the topology on S relative

to R. The following result is an immediate consequence of the definition.

2.1 Theorem lf 4 is open in R, A r''hS is open in the relative topology on S. l:n

A closure point of a set A is a point x e R such that, for every E > 0, the set

Page 43: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuity 21

A /'7,&(x,sl is not empty. The closure points of A are not necessalily elements of

A, open sets being a case in point. The set of closure points of A is called thelosure of A, and will be denoted X-or sometimes (A)- if the set is defined by anc

expression. On the other hand, an accumulation point of A is a point x e R wichis a closure point of the set A - f.x). An accumulation point has other points of Aarbitrarily close to it, and if x is a closure point of A and .x A, it must alsobe an accumulation point. A closure point that is not an accumulation point (theformer definition being satisfied because each E-neighbourhood of .x contains xitselfl is an isolated point of A.

A boundar.v point of a set A is a point .x

e X such that the setAC

/-7 Sxnz) isnot empty for any e:> 0 . The set of boundary points of A is denoted ?A, and X =

A t.p :z4.The interior of A is the setAO

= A - A. A closed set is one containing

all its closure points, i.e. a set A such that A = A. For an open interval A =

a,b) c R, X'= (tz,1.Every point of anb) is a closure point, and a and b arealso closure points, not belonging to asbj. They are the boundat'y points of botha,b) and gJ,?7).

2.2 Theorem The complement of an open set in R is a closed set. (:I

This gives an alternative definition of a closed set. According to the defini-tions, O (theempty set) and R are both open and closed. The half-line (-x,xqisthe complement of the open set @,+x)and is hence closed. Extending this resultto relative topologies, we have the following.

2.3 Theorem lf :4 is open in S c R, then S -A is closed in S. cl

In particular, a corollary to 2.1 is that if B is closed in R then S r-hB is closedin S. But, for example, the interval l, 1) is not closed in R, although it isclosed in the set (0,1),since its complement (0,r1)is open in (0,1).

Some additional properties of open sets are given in the following theorems.

2.4 Theorem (i) The union of a collection of open sets is open.(ii) If A and B are open, then A /-7 B is open. n

This result is be proved in a more general context below, as 5.4. Arbitrary inter-sections of open sets need not be open. See 1.12 for a counter-example.

2.5 Theorem Every open set A q R is the union of a countable collectin of dis-joint open intervals.

Proof Consider a collection (&x,:x),x e Ah where for each x, Ex > 0 is chosensmall enough that 5'@,:x) i A. Then Uxex5'tx,:xlA, but, since necessarily AUxeA.@,Exl, it follows that UxeA5'@,sxl = A. This shows that A is a union ofopen intervals.

Now define a relation R for elements of A, such that xRy if there exists anopen interval f q A with x e f and y E 1. Every x e A is contained in someinterval by the preceding argument, so that xRx for all x e A. The symmetry of Ris obvious. Lastly, if x,y e I A and y,z e 1' ? A, I /-7 1' is nonempty and henceI t.p1' is also an open interval, so R is transitive. Hence R is an equivalence

Page 44: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

22 Mathematics

relation, and the intervals I are an equivalence class partitioning A. Thus, A is

a union of disjoint open intervals. The theorem now follows from 1.11. w

Recall from 1.21 that f, the Borel field of R, is the c-field of sets generated byboth the open and the closed half-lines. Since every interval is the intersectionof a half-line (openor closed) with the complement of another half-line, 2.2 and2.5 yield directly the following important fact.

2.6 Theorem B contains the open sets and the closed sets of R . n

A collection r is called a covering for a set A i R if A UsswB. If each B is

an open set, it is called an open covering.

2.7 Lindelf's covering theorem If C is any collection of open subsets of R,there is a countable subcollection (Bi e C, i e N) such that

X

UB = U#.B e N f=1

Proof Consider the collection,/

= (Sk = Srkbs, rk e Q, sk e Q+) ; that is thecollection of a11 neighbourhoods of rational points of R, having rational radii.The set (E1xQ+ is countable by 1.5, and hence

,/

is countable; in other words.indexing by k e N exhausts the set. We show that, for any open set B c R and point

h is a set Sk G.$

such that .z'

e Skc #. Since .x has a E-neighbourhooda' e B, t ereinside B by definition, the desired Sk is found by setting sk to any rational fromthe open interval (0,1as),for s > 0 sufticiently small, and then choosing rk e5'(a',/) as is possible by 1.10.

Now for each x e UsewB choose a member of.$,

say Skp, satisfying x e Skp iB for any B e C. Letting k(x) be the smallest index which satisfies therequirement gives an unambiguous choice. The distinct members of this collectionform a set that covers Ussc#, but is a subset of

,/

and hence countable. Labellingthe indices of this set as k1,k2,..., choose Bi as any member of C containing Ski.Clearly, U:'xlffis a countable covering for U7=15k,and hence also for Usecs.w

lt follows that, if C is a covering for a set in R, it contains a countable sub-

covering. This is sometimes called the Lindelf property.The concept of a covering leads on to the crucial notion of compactness. A set A

is said to be compact if every open covering of A contains afinite subcovering.The words that matter in this definition are

Gevery' and Eopen'. Any open coveringthat has R as a member obviously contains a finite subcovering. But for a set tobe compact, there must be no way to construct an inrducible, infinite open cover-ing. Moreover, every interval has an irreducible infinite cover, consisting of thesingleton sets of its individual points; but these sets are not open.

2.8 Example Consider the half-open interval (0,11.An open covering is the eount-able collection (41/n,1j, n e NJ. lt is easy to see that there is no finite sub-collection covering (0,11 in this case, so (0,1j is not compact. EI

A set A is bounded if zt i 5'(x,s) for some x e A and ir > 0. The idea here is that

Page 45: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuity 23

: is a possibly large but tinite number. In other words, a bounded set must becontainable within a finite interval.

2.9 Theorem A set in R is compact iff it is closed and bounded. u

This can be proved as a case of 5.12 below, and provides an alternative definitionof compactness in R. The sufticiency part is known as the Heine-Borel theorem.

A subset B of A is said to be dense in A if B c A q X.Readers may think theyknow what is implied here after studying the following theorem, but denseness is aslightly tricky notion. See also 2.15 and the remarks following before coming toany premature conclusions.

2.10 Theorem Let A be an interval of R, and C c A be a countable set. Then A - Cis dense in A.

Proof By 1.7, each neighbourhood of apoint inA contains an uncountable numberof points. Hence for each .x

e A (whetheror not .v

e C), the set (A - C) f-h 5'(x,:)is not empty for every e > 0, so that .x is a closure point of A - C. Thus, A - C c(A - C) QJ C = A ? (A - C) . K

The k-fold Cartesian product of R with copies of itself generates what is calledThe points of Rk have the interpretation of k-vectors, orEuclidean k-space, R .

ordered k-tuples of real numbers, x = (x1,.n,...,.q)'.All the concepts definedabove for sets in R generalize directly to R . The only modification required isto replace the scalars x and y by vectors x and y, and define an e-neighbourhoodin a new way. Let Ilx-y 11be the Euclidean distance betweenx andy, where 1Icll=

Et q1/2 is the length of the vector a = (J1,...,JJ' and then define S(x,z) =I ,-1 ,

(y: Ilx-y 11< el, for som s > 0. An open set A of R2 is one in which every point3x e A can be contained in an open disk with positive radius centred on x. In R

the open disk becomes an open sphere, and so on.

2.2 Sequences and Limits

A real sequence is a mapping from E$into R. The elements of the domain are calledthe indice- and those of the range variously the terms, members, or coordinates ofthe sequence. We will denote a sequence either by tak,n e EN), or more briefly byIakI:', or just by (xnJ when the context is clear.

(xnl'kois said to converge to a limit x, if for every : > 0 there is an integerNv for which

Ixn - x I < e for all n > Nv.

Write xn-.->

x, or.x

= limu-jxak. When a sequence is tendinygo +x or-x it ij often

said to diverge, but it may also be said to converge in R, to distinguish thosecases when it is does not approach any fixed value, but is always wandering.

A sequence is monotone (non-decreasing,increasing, non-increasing, or decreas-ing) if one of the inequalities ak f xn+1, xn < xn+1, xn k ak+1, or xn > xnx.lholds for every n. To indicate that a monotone sequence is converging, one maywrite for emphasis either xn

'1'.x

or xn t, x, as appropriate, although xn -- a7will

Page 46: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

24

also do in both cases. The following result does not require elaboration.

2.11 Theorem Evel'y monotone sequence in a compact set converges. n

Mathematics

A sequence that does not converge may none the less visit the same point aninfinite number of times, so exhibiting a kind of convergent behaviour. If fak,ne N l is real sequence, a subsequence is fxnk,k e INl where fnk, k e N) is anyincreasing sequence of positive integers. If there exists a subsequence taksk e(h1) and a constant c such that xnk -- c, c is called a cluster point of thesequence. For example, the sequence f(-1)N,

n = 1,2,3,...) does not converge, butthe subsequence obtained by taking only even values of n converges trivially. c isusually a finite constant, but +x and -x

may be cluster points of a sequence if weallow the notion of convergence in F. If a subsequence. is convergent, then so isany subsequence of the subsequence, defined as (x,,u,k e EN) where fmk3 is anincreasing sequence whose members are also members of fnkl .

The concept of a subsequence is often useful in arguments concerning conver-gence. A typical line of reasoning employs a two-pronged attack', first one identi-ties a convergent subsequence (a monotone sequence, perhapsl; then one usesother characteristics of the sequence to show that the cluster point is actually alimit. Especially useful in this connection is the knowledge that the members ofthe sequence are points in a compact set. Such sequences cannot diverge to infin-ity, since the set is bounded; and because the set is closed, any limit points orcluster points that exist must be in the set. Specifically, we have two uscfulresults.

2.12 Theorem Every sequence in a compact set of R has at least one cluster point.

Proof A monotone sequence converges in a compact set by 2.11. We show thatevel'ysequence (ak,ne EN)hasamonotonesubsequence.Defineasubsequence (xnk)as follows. S'et nl = 1, and for k = 1,2,3,... let xnk..! = sup,on:ak if thereexists a finite ?7z+1 satisfying this condition', otherwise let the subsequenceterminate at nk. This subsequence is non-increasing. If it terminates, the sub-sequence (.v,n k ?u) must contain a non-decreasing subsequence. A monotonesubsequence therefore exists in every case. w

2.13 Theorem A sequence in a compact set either has two or more cluster points,or it converges.

Proof Suppose that c is the unique cluster point of the sequence fxn), but that xnzd-yc. Then there is an infinite set of integers (nz,L e EN) such that Iau-

c Is for some : > 0. Define a sequence fyk)by setting yz = xnk.Since (yz)is also asequence on a compact set, it has a cluster point c' which by construction isdifferent from c. But c' is also a cluster point of fx,,1, of which (ykl is asubsequence, which is a contradiction. Hence, xn

-->

c. w2 3 n j jj ta.gl2.14 Example Consider the sequence (1,x,x ,a: ,...,x

,...),

or more orma y ,

n e EN()1,where .'t7 is a real number. In the case IxI< 1, this sequence converges tozero,(I.x''lJbeing monotone on the compactinterval g0,11.The condition specified

Page 47: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

25

in (2.2) is satisfied for Nc = logtel/log IxI in this case. If x = 1 it convergesto 1, trivially. lf .x

> 1 it diverges in R, but converges in 2-to +x. If x =-1

itneither converges nor diverges, but oscillates between cluster points +1 and

-1.

Finally, if x < - 1 the sequence diverges in R, but does not converge in F. Ulti-mately, it oscillates between the cluster points +x and -x.

Ea

We may discuss the asymptotic behaviour of a real sequence even when it has nolimit. The superior limit of a sequence fxn) is

limsup xn = inf sup xm. (2.3)n n m kn

Limits and Continuity

(Alternative notation'. limn xn.) The limsup is the eventual upper bound of asequence. Think of (sup-znau, n = 1,2,... J as the sequence of the largest valuesthe sequence takes beyond the point n. This may be +x for every n, but in allcases it must be a non-increasing sequene having a limit, either +x or a finitereal number; this limit is the limsup of the sequence. A link with the correspond-ing concept for set sequences is that if xn = supA,, for some sequence of sets (4nc R ), then limsup xn = supA, where A = limsupsAn. The inferior limit is definedlikewise, as the eventual lower bound:

liminf xn = - limsupt-ak) = sup inf xm,n n m k n

(2.4)

also written limnak. Necessarily, liminfaak ; limsupnak. When the limsup andliminf of a sequence are equal the sequence is convergent, and the limit is equalto their common value. If both equal +x, or

-x, the sequence converges in V.The usual application of these concepts is in arguments to establish the value

of a limit. lt may not be pennissible to assume the existence of the limit, butthe limsup and liminf always exist. The trick is to derive these and show them tobe equal. For this purpose, it is sufticient in view of the above inequality toshow liminfn xn limsupn xn. We often use this type of argument in the sequel.

To determine whether a sequence converges it is not necessary to know what thelimit is; 4he relationship between sequence coordinates <in the tail' (as tlbecomes large) is sufficient for this purpose. The Cauchy criterion for conver-gence of a real sequence states that fakl converges iff for every e:> 0 ('E'iNe suchthat Ixn- xmI < : whenever n > Nc and m > Ne. A sequence satisfying this cri-terion is called a Cauchy sequence. Any sequence satisfying (2.2)is a Cauchy

sequence, and conversely, a real Cauchy seqqence must possess a limit in R. Thetwo definitions are therefore equivalent (in R, at least), but the Cauchy condi-tion may be easier to verify in practice.

The limit of a Cauchy sequence whose members a1lbelong to a setA is by defini-tion a closure point of A, though it need not itself belong to A. Conversely, for

every accumulation point .7 of a set A there must exist a Cauchy sequence in theset whose limit is x. Construct such a sequence by taking one point from each ofthe sequence of sets,

(A rn S(x.$In4, n = 1,2.3,..1,

Page 48: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

26

none of which are empty by definition. The term limit point is sometimes usedsynonymously with accumulation point.

The following is a fundamental property of the reals.

Mathetnatics

2.15 Theorem Every real number is the limit of a Cauchy sequence of rationals.

Proof For finite n 1et xn be a number whose decimal expansion consists only ofzeros beyond the nth place in the sequence. lf the decimal point appears atposition m, with m > n, then xn is an integer. If m f n, removing the decimalpoint produces a tinite integer a, and xn = a

/10'J-'Nso ak is rational. Given any

real x, a sequence of rationals (ak) is obtained by replacing with a zero everydigit in the decimal expansion of .v beyond the ath, for n = 1,2,... Sincejxa..l -xn l < 10-/, lak) is a Cauchy sequence and xn

.--->

x as n-->

x. w

The sequence exhibited is increasing, but a decreasing sequence can also be con-structed, as f-yn) where (yn) is an increasing sequence tending to

-x. If x isitself rational, this construction works by putting ak = x for every n, whichtrivially defines a Cauchy sequence, but certain arguments such as in 2.16 belowdepend on having xn # x for every n. To satisfy this requirement, choose the <non-

terminating' representation of the number', for example, instead of 1 take0.9999999..., and consider the sequence (0.9, 0.99, 0.999,

...)

. This does notwork for the point 0, but then one can choose f0.1, 0.01, 0.001,...1.

One lnterestingcorollary of 2.15 is that, since every E-neighbourhood of a realnumber must contain a rational, (E)is dense in R. We also showed in 2.10 that R - (Dis dense in R , since Q is countable. We must be careful not to jump to the conclu-sion that because a set is dense, its complement must be tsparse'

. Another versionof this proof, at least for points of the intelwal E0,1), is got by using thebinary expansion of a real number. The dyadic rationals are the set

D = (ilzn, i = 1,...,2* - 1, n c EN) . (2.5)The dyadic rationals corresponding to a finite n define a covering of E0,11 byintervals of width 1/2N, which are bisected each time n is incremented. For any xe g0,1J, a point of the set filln' i = 1,,..,22 - 1) is contained in 5'(x,eJfor :

< 2/21 so the dyadic rationals are dense in (0,11. D is a convenient analytictool when we need to detine a sequence of partitions of an interval that is becom-ing dense in the limit, and will often appear in the sequel.

Another set of useful applications concern set limits in R .

2.16 Theorem Every open interval is the limit of a sequence of closed sub-intervals wiih rational endpoints.

Proof If a,b) is the interval, with a < &, choose Cauchy sequences of' rationals

an.1e

a and bn''

b, with tzI < l)j (alwayspossible by 1.10). By definition, forevery x e anbj there exists N k 1 such that x gtzr,,lulfor all n k N, and henceanb) i liminfnlln,lul. On the other hand, since an > a and b > bn, abbjc

(Jn,:,,1f for all n 2 1, so that a,bjc i liminfagtzn,hnlc. This is equivalent tolimsupagtzn,:nj atb). Hence limnltzn,:nl exists and is equal to a,b). .

Page 49: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

27

This shows that the limits of sequences of open sets need not be open, nor thelimits of sequences of closed sets closed (takecomplements above). The only hardand fast rules we may 1ay down are the following corollaries of 2.4(i): the limitof a non-decreasing sequence of open sets is open, and (bycomplements) the limitof a non-increasing sequence of closed sets is closed.

2.3 Functions and Continuity

hfunction of a real variable is a mapping J:S F-->T, where S c R, and 'F c R. Byspecifying a subset of R as the codomain, we imply without loss of generality that/S) = T', such that the mapping is onto 7.

Consider the image in F, under f, of a Cauchy sequence fxnlin S converging to.x. lf the image of every such sequence converging to x e S is a Cauchy sequencein 'F converging to

.f@),

the function is said to be continuous at.x. Continuity is

formally defined, without invoking sequences explicitly, using the 6r - approach.# is continuous at the point x e S if for any e > 0 D > 0 such that Iy

-xl

<

implies lfl - fx) I< e, whenever y e S. The choice of here may depend on x.lf f is continuous at every point of S, it is simply said to be continuous on S.

Perhaps the chief reason why continuity matters is the following result.

-1 j ju s2.17 Theorem lf J: S F-> 'T is continuous at a1l points of S, f (A) s open-1 4) is closed in S whenever 4 is closed in F. (:awhenever A is open in 7J,and f (

Limits and Continuit.y

This important result has several generalizations, of which one, the extension tovector functions, is given in the next section. A proof will be given in a still

more general context below; see 5.19.Continuity does not ensure that J(A)is open whenA is open. A mapping with this

property is called an open mapping, although, since fAC)#

J(X)C in general, wecannot assume that an open mapping is also a closed mapping, taking closed sets toclpsed sets. However, a homeomorphism is a function which is 1-1 onto, contin-

.j

uous, and has a continuous inverse. If f is a homeomorphism so is f , and henceby 2.17 it is both an open mapping and a closed mapping. It therefore preservesthe structum of neighbourhoods, so that, if two points are close in the domain,their images are always close in the range. Such a transformation amounts to arelabelling of axes.

If fx + ) has a limit as h i 0, this is denoted f@+4.Likewise, J@-)denotesthe limit of f(x - ). It is not necessary to have x e S for these limits toexist, but if fx) exists, there is a weaker lkotion of cpntinuity at x. f is said

to be right-continuous at the point x e S if, for any s > 0, R 6 > 0 such thatwhenever 0 f < 8 and x- h e S,

Iflx-h) - fx) I< E. (2.6)lt is said to be left-continuous at x if, for any e:> 0, 3 8 > 0 wch that when-

ever0 < h < and x - e S,

If@)- J@- )I< z. (2.7)

Page 50: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

28

Right continuity at x implies J@)= J@+)and left continuity at x implies fx) =

fx-). If fx) = f(x+) =.f@-),

the function is continuous at x.Continuity is the property of a point x, not of the function f as a whole.

Despite continuity holding pointwise on S, the property may none the less breakdown as certain points are approached.

2.18 Example Conjider fx) = 1/x, with S = T' = (0, x). For 6: > 0,

Mathematics

1

Iflx- ) - fx) I =

o< E iff <

j - sxx(x+and hence the choice of depends on both 6: and x. flx) is continuous for all x >

0, but not in the limit as.x

--> 0. l:l

The function J:S F-> 'F is unformly continuous if for every E > 0 H > 0 suchthat

Ix-yl < = Ifx, -.f(A')1< e: (2.8)for every x,y e S. ln 2.18 the function is not uniformly continuous, for whichever

is chosen, we can pick x small enough to invalidate the detinition. The problemarises because the set on which the function is defined is open and the boundarypoint is a discontinuity. Another class of cases that gives difticulty is the onewhere the dofnain is nbounded, and continuity at x is breaking down as x

-->

x.

However, we have the following result.

2.19 Theorem lf a function is continuous everywhere on a compact sets, then it isbounded and uniformly continuous on S. In

(For proof, see 5.20 and 5.21.)Continuity is the weakest concept of smoothness of a function. So-called Lip-

schitz conditions provide a whole class of smoothness properties. A function f issaid to satisfy a Lipschitz condition at a point x if, for any y e 5'@,) fot some8 > 0, H M > 0 such that

If@l - fx, IK ftlx -yI) (2.9)here : R+ F-> R+ satisfies hld) 1 0 as d 1,0. f is said to satisfy a uniformW

Lschitz condition if condition (2.9)holds, with fixed M, for a1l x,y e S. Thetype of smoothness imposed depends on the function . Continuity (resp.unifonncontinuity) follows from the Lipschitz (resp.uniform Lipschitz) property for anychoice of h. Implicit in continuity is the idea that some function (.):R+ F-> R+exists satisfying 6(E) 1 0 as 6: 1 0. This is equivalent to the Lipschitz condition

-1 B imposig some degree of smoothness onholding for some (.),the case h = 8 . yh - making it a psitive power of the argument for example - we impose a degreeof smoothness on the function, forbidding sharp corners' .

The next smoothness concept is undoubtedly well known to the reader, althoughdifferential calculus will play a fairly minor role here. Let a function J: S F-> 'Fbe continuous at x e S. lf

Page 51: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuity

fx-h) - fx)/:(x)= limhi0

29

(2.10)

exists, A'@)is called the left-hand derivative of f at x. The right-hand deriva-tive, /1(x),is detined correspondingly for the case h

'1'

0. lf J+'(x)= f-'(x),thecommon value is called the derivative of f at x, denoted f'x) or df/dx, and f issaid to be dterentiable at x. lf J': S F-.> R is a continuous function, f is saidto be continuously dterentiable on S.

A function f is said to be non-decreasing (resp. increasing) if f@) flx)(resp. f@)> Ax))whenever y > x. lt is non-increasing (resp.decreasing) if

-f

is non-decreasing (resp.increasing). A monotone function is either non-decreasing

or non-increasing.'When the domain is an interval we have yet another smoothness condition. A

function f.. (J,:1 F-+ R is of bounded variation if 3 M < x such that for everypartition of gt7,/71by tinite collections of points a = xo < x1 < ... < xn = b,

n

77lfxi) - .f(.xf-.1)I< M.k=1

(2.11)

2.20 Theorem If and only if f is of bounded variation, there exist non-decreasing

functions h and h such that f =.42

-

.f1.

n

(For proof see Apostol 1974: Ch. 6.) A function that satisfies the uniformLipschitz condition on gJ,hl with (I.x- y I) = Ix- y I is of bounded variation on(J,:1.

2.4 Vector Sequences and Functions

A sequence (xs) of real k-vectors is said to converge to a limit x if for every E

> 0 there is an integer Nc for which

Ilxn-x1I< E for all H > Ne. (2.12)The sequence is called a Cauchy sequence in R iff Ilxn-auIl< e whenever n >

Ne and m > Nv.A function

fl S i--y 1-,k d 7 R associates each point of S with a unique point of 7. 1tswhere S c R , an ,

#

graph is the subset of S x'r consisting of the k + ll-vectors (x, fx) ) for eachx e S. f is continuous at x l S if for any E > 0 H > 0 such that

IIII < = I/.(x+) - fx) I< e (2.13)whenever x + b e S. The choice of may here depend on x. On the other hand,f is uniformly continuous on S if for any E > 0, 3 > 0 such that

IIII< c:':> sup l/'tx+h)-.f@)

I< e. (2.14)x e S,x+: c S

Page 52: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

A vector f = (h,...,fm)'of functions of x is called, simply enough, a vector2c tinuity concepts apply element-wise to f in the obvious way. Thefunction. on

function. s F-. s s c Rf. ,

-1is said to be one-to-one if there exists a vector function f : S F-.-:hS, such that

-1 S An example of a 1-1 continuous function is thef (/.1)) = x fOr each x e .

3affine transformation

./12) = X2 + bfor constants b k x 1) and A (kxk) with IA l :y: 0, having inverse

./-1/)

=

-1 b4 ln most other cases the function/-l does not possess a closed form,A (y-

.

but there is a generalization of 2.17, as follows.

k d 1- R'/ f-1A) is2.21 Theorem Iff / S F- 1- is continuous, where S i R an i ,

-1 A) is closed in S when A is closed in -T.(:1open in S when A is open in I-, and.f (

2.5 Sequences of Functions

Let fn1f F- 7, 'T c R, be a function, where in tb-is case fl may be an arbitraryset, not necessarily a subset of R. Let f/'n,n e (N) be a sequence of such func-tions. If there exists f such that, for each e f, and E > 0, R Nvjl such that1Jnt)l - J((J)) l < e:when n > Asro,then fnis said to converge to f,pointwiseon f1.As for real sequences, we use the notations fn

-->

f, fn'1'

f, or fn1 J, as approp-riate, for general or monotone convergence, where in the latter case the mono-tonicity must apply for eveyy (.t) e f2. This is a relatively weak notion of conver-gence, for it does not rule out the possibility that the convergence is breakingdown at certain points of f1. The following example is related to 2.18 above.

2.22 Example Let fnx) = nllnx + 1), .x

e (0,x). The pointwise limit of fnx) on(0,x) is 1/x. But

1 1fnlxt- - = ,

x xtrl-x+ 1)

and llxNcxx + 1)) < e:only for Nu > (1/a - 1)(1/x). Thus for given E, Nvx-->

cxo

as x -- 0 and it is not possible to put an upper bound on Nu such thatlfnx)- 1/xl < s, n 2 Ncx, for every x > 0. nTo rule out cases of this type, we define the stronger notion of unifrm conver-gence. If there exists a function f juch that, for each : > 0, there exists N suchh tt a

suplfnkbt- J((,))I < E when n > N,eD

fn is said to converge to J unifonnly on f1.

Page 53: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuity

2.6 Summability and Order Relations

The sum of the tenns of a real sequence faklTis called a series, written X'izzlxa(or just EAk). The terms of the real sequence (X'=lau,n e (N) are called thepartial sums of the series. We say that the series converges if the partial sumsconverge to a tinitelimit. A series is said to converge absolutely if the mono-tone sequence (Z'=1IxmI, n e Ehl converges.

31

T x? This converges to 1/(1 - x)2.23 Example Consider the geometric series, Zy.1.

when 1x1< 1, and also converges absolutely. It oscillates between cluster points0 and 1 for x =

-1,

and for other values of x it diverges. u

2.24 Theorem lf a series converges absolutely, then it converges.

Proof The sequence (X;.1lxmI, n e EN)is monotone, and either diverges to +x orconverges to a finite limit. ln the latter case the Cauchy criterion implies thatlxnl + .... + Ixn..mI

-->

0 as m and n tend to infinity. sinceIxn I+ .... + Ixn..mIlak+ .... +xn.vm I by the triangle inequality,4 convergence of (X)=1au,n E EN)

follows by the same criterion. w

An alternative tenninology speaks of summability. A real sequence taklTissaid to be summable if the series Xzkconverges, and absolutely summable if( Ixnll7issummable. Any absolutely summablesequenceis summableby 2.24,and

any summable sequence must b converging to zero. Convergence to zero does notimply summability (see2.27 below, fof example), but convergence of the tail sumsto zero is necessary and suftkient.

2.25 Theorem 1ff (xn)7is summable, Zrlzznx--->

0 as n-->

x.

x n-1

sx j sincefor any : > 0Proof For necessity, write IZ,,=l.zu! f IZ,a=1auI+ I m-nxm .

there exists N such that IE,7ox,ul< e: for n k N, it follows that lZp1=laulSE#-1 I+: < x. Conversely, assume summability and let A = Eklxn. ThenI ,,=,14%

n

x n-1

() .-.y . sZ,n=n-Ym = X - Y,,=1.<,a-->

as n .

-1 nA sequence fakJTis Cesro-summable if the sequence f?z Eru=laulqconverges.This is weaker than ordinary convergence.

i 6 Theorem lf (akl:' cbnverges to x, its Cesro sum also converges to x. n.2

But a sequence can be Cesro-summable in spite of not converging. The sequencef(-1)/)7 converges in Cesro sum to zero, whereas the partial sum sequence(r=()(-1)''' J7converges in Cesro sum ttl (compare2.14).

Various notations are used to indicate the relationships between rates of diver-gence or convergence of different sequences. If (xnlTis any real sequence, fJnl!'

is a sequence of positive real numbers, and there exists a constant B < x suchthat IakIIan ; B for a1l n, we say that xn is (atmost) of the order of magnitudeof an, and write xn = Oa. If takIanl converges to zero, we write xn = oan),4nd s>y that xn is of smaller order of magnitude than an. an can be incfeasing ordecr:aing, so this notation can be used to express an upper bound either on thefte of eropth of a divereent seauence. or on the rate of convergence of a

Page 54: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

32

sequenceto zero. Here are some rules for manipulation of 0(.), whose proofa) and y,j = Onq), thenfollowsfrom the definition. lf xn = On

Mathematics

.;t +ya = t?tzynwelxtfts;n )

x =(?(Itz+f5)

'n ,

xls= onsq), whenever a'fz is defined.

(2.15)

(2.16)

(2.17)

An alternative notation for the case xn 0 is xn f< an, which means that there isa constant, 0 < B < x, such that xn f Ban for al1 n. This may be more convenientin algebraic manipulations.

The notation ak - an will be used to indicate that there exist N k 0, and t'initeconstants A > 0 and B k A, such that infapvak lan) 2 A and supazxtak Ian) S B.This says that txnl and (Jnl grow ultimately at the same rate, and is differentf'rom the relation xn = Oan), since the latter does not exclude xnIan .-.-)

0. Someauthors use xn - an in the stronger sense of xn lan .--> 1.

2.27 Theorem If (ak)is a real positive sequence, and xn - n,n 1+a.(i) if a >

-1

then Zpual.u- n ,

(ii) if ('x =-1

then Erlxlau- log n;x (j xx . ()(sl+a)(iii) if (x <

-1

then )Lxlau< (x an mzmxm .

Proot By assumption there exist N 2 1 and constants A > 0 and B k A such thatAn < xn K Bnz for n k #, and hence Amnzmt < mnzxm S Bxmnzm. The limitof 1x.l?'?ztzas n

..4

x for different values of a defines the Riemann zetahmctionfor a <

-1,

and its rates of divergence for a k-1

are standard results', see e.g.Apostol (1974:Sects. 8.12-8.13). Since the sum of terms from 1 to N-1 is tinite,their omission cannot change the conclusions. .

lt is common pratice to express the rate of convergence to zero of a positivereal sequence in tenns of the summability of the coordinates raised to a givenpower. The following device allows some f'urtherrefinement of summability condi-

If &(vx)/&(v) -- V as v -- x (0)fortions. Let U(v) be a positive function of v..x> 0 and -x < p < +=, U is said to be regularly vatying at inhnity zero). lf apositive function Lv) has the property L(%'x)lLM) -->

1 for x > 0 as v-->

x (0),it is said to be slowly varying at frl/W/y zero). Evidently, any regularlyvarying function can be expressed in the form &(v) = vpfXvl,where L(v) is slowlyvarying. While the definition allows v to be a real variable, in the cases ofinterest we will have v = n for n e N, with U and L having the interpretation ofpositive sequences.

2.28 Example (logvltxis slowly varying at infinity, for any G. u

On the theol'y of regular vqriation see Feller (1971),or Love (1977).The impor-tant property is the following.

2.29 Theorem If L is slowly varying at infinity, then for any > 0 there exists Nk 1 such that

Page 55: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuity 33

-8 8 11v > N (a (2.18)v < Atvl < v , a .

Hence we have the following corollary of 2.27, which shows how the notion of aconvergent power series can be refined by allowing for the presence of a slowlyvarying function.

2.30Corollary If xn = OnLnl) then'ixlak

< x for a11(x <-1

and al1functionsLn) which are slowly varying at infinity. u

On the other hand, the presence of a slowly varying component can affect thesummability of a sequence. The following result can be proved using the integraltest for series convergence (Apostol 1974: Sect. 8.12).

1+8 ith > 0 then E= lak < x. If 6 = 0, then2.31 Theorem If ak - l/gntlog n) j w , nzz

X&l.x,,, - log 1ogn. nm=

2.32 Theorem (Feller 1971: 275) lf a positive monotone function &(v) satisfies

&(v.x)-- v(.x),al1 a7 e D,f7(v) (2.19)

where D is dense in R+, and 0 < v(x)< x, then v@)= .x P for -x < p < x. u

To the extent that (2.19) is a fairly general property, we can conclude thatmonotone functions are as a rule regularly varying.

2.33 Theorem The derivative of a monotone regularly varying function is regu-larly varying at cra.

Proof Given &(v) = vPf-(v), write

' v) =pvP-1f-(v) +vPfJ(v) =

vP-1(p1,(v) +vr(v)).U ( (2.20)

If f7(v) --> 0 there is no more to show, so assume liminfvfxv) > 0. Then

Kdvjfzotjvxxilj=Ly'vl jf7,(vAl

-

f-tv-lj...0 (),

(v) L (v) fXv) (2.21)

which implies f7(vx)/f7(v) -- 1. Thus,

U'Mx) p-lpfatvx) + vxc'tvxl p,

= x ,

--)

a7 . wU (v) pAtvl +vA (v)#

(2.22)

2.7 ArraysArguments concerning stochastic convergence often involve a double-indexing ofelements. An array is a mapping whose domain is the Cartesian product of count-able, linearly ordered sets, such as ENx ENor r x (N,or a subst thereof. A realdouble array, in particular, is a double-indexed collection of numbers, or, alter-natively, a sequence whose members are real sequences. We will use notation juch

as (fakt,t E zJ, n e (N), or just fau) when the context is clear.

Page 56: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

34

A collection of tinite sequences (fau, f = 1,...,k ), n e IN), where kn-?'

x as n--+x, is called a triangular array. As an example, consider array elements of theform xnt = ytln, where tys t = 1,...,n) is a real sequence. The question ofwhether the series ( :=1.x,,,,n e Nl converges is equivalent to that of the Cesroconvergence of the original sequence', however, the array formulation is frequentlythe more convenient.

Mathematics

2.34 Toeplitz's lemma Suppose (y,;) is a real sequence and yn .-.-: y. lf ((.'xkf,t =

1,...,lk), n e ENl is a triangula.r array such that

(a) xnt -- 0 as n -- x for each fixed /,

kn(b) lim X Iak,lS C < x,

n-yoo >1

knlim A'-lau= 1,n-+x >1

k F = O (c) can be omitted.thenE/i,lxn,y,--- y. or y ,

Proof By assumption on fyn), for any s > 0 5 Ne k 1 such that for n > NvsIyn-y I < 8C. Hence by (c), and then (b) and the triangle inequality,

kn kn

lim Xxna,-y

= lim X-v/t-'w-y)

n-yx >1 n-yx >1

Nz

< lim Txntyt- y) + s = :,wjn--kx

(2.23)

in view of (a).This completes the proof, since E is arbitrary. x

A particular case of an array fxn/)satisfying the conditions of the lemma is xnt

= (Z''=1y,)-1y/,where ty/lis a positive sequence and Z'lsxlys-->

=.S

A leading application of this result is to prove the following theorem, a funda-mental tool of limit theory.

2.35 Kronecker's Iemma Consider sequences (tz,)';'and tx/lTof positive realnumbers, with at

-?'

=. If Ll=jxtlat-- C < cxa as n-->

x,

1 n

-Xx,--) 0.

Jn>1

(2.24)

Proof Detining co = 0 and cn = Xntcjxtlatfor n G N, note that xt = atct- cr-1),

t = 1,...,n. Also define tzo = 0 and bt = at - fh-l for t = 1,...,n, so that an =

ntzzkbt.Now apply the identity for arbitrary sequences ao,...,an and co,...,cn,n n

77(c,- c,-1) = 77(tz,-1- zflcr-l + ancn -

zoco.

,.-1 ,-1

(This is known as Abel's partial summation formula.) We obtain

(2.25)

Page 57: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Limits and Continuit.v 35

1 n 1 n

-- X xt = - X atlc t- ct- 1)

an an>1 >1

1 w-xn

= cn - --Lbtct-j-->

C- C = 0an >1

(2.26)

where the convergence is by the Toeplitz lemma, setting xnt = bt lan. .

The notion of array convergence extends the familiar sequenceconcept. Considerfor full generality an array of subsequences, a collection (fxmnk,k e (Nl m e EN),where fnk, k e INl is an increasing sequence of positive integers. If the limit xm

=limz--ycxaAun: exists for each m e IN,we would say that the array is convergent; and

its limit is the infinite sequence (x,,,,m e N1.Whether this sequence converges is

a separate question from whether it exists at all.Suppose the array is bounded, in the sense that supkmlxmpulf # < x. We know by

2.12 that for each m there exists at least one cluster point, say xm, of the inner

sequence (xaw,k e IN1.An important question in several contexts is this: is itvalid to say that the array as a whole has a cluster point?

2.36 Theorem Corresponding to any bounded array f(xrm,k e ENJ,m e N), thereexists a sequece (x,,l, the limit of the array (tx,ug,k e NJ, m e ENJ as k -->

x,

where (rl:) is the same subsequence of fn:) for each m.

Proof This is by construction of the required subsequence. Begin with a conver-1 b f (nz)such that .n sj

...-)

gent subsequence for m = 1', let (nkJbe a su sequence o ,

x1. Next, consider the sequence (.n,n)J. Like f-n,s:J,this is on the boundedinterval (-#,#), and so contains a convergent subsequence. Let the indices of this

1 b denoted f42:) and note thatlatter subsequence, drawn from the members of (?1k),exl,nj

-->

xl as well as 12,4 --

.n. Proceeding in the same way for each m gener-ates an array (fn, k e IN1,m e INJ, having the property that lxjw, k e EN) is aconvergent sequence for 1 S i S m.

k k e ENJ' in other wotds, take the first member ofNow consider the sequence fns ,

1 h d member of (rl2k),and so on, For each m, this sequence is a sub-tlkl, t e seconsequence of (n'2Jfrom the pzth point of the sequence onwards, and hence thesequence f.x,,,4,k k PZJis cnkergent. This means that the sequence (xmmj,k e EN)

k isfies the requirement of the theorem. wis convergent, so setting frl11 = fnzl sat

<diagonalmethod'. The elments nbkmay be thought of as theTls is called thediagonal elements of the square mairix (ofinttnite order) whose rows contain thesequences (n,')fJ, each a subsequence of the row above it. This theorem holdsindependently of the nature of the elements fx. 1.Any space of points on which

convergent sequences are detined could be substituted for R . We shall need ageneralization on these lines in Chapter 26, for example.

Page 58: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

3s4easure

3. 1 Measure SpacesA measure is a set function, a mapping which associates a (possiblyextended) realnumber with a set. Commonplace examples of measures include the lengths, areas,and volumes of geometrical tigures,but wholly abstract sets can be <measured' inan analogous way. Formally, we have the following definition.

3.1 Definition Given a class 5 of subsets of a set f, a measure

jt: ; F-> R-is a function having the following properties:

(a) g,(A) k 0, al1 A e %.(b) g,(0) = 0.(c) For a countable collection (Aj e F, j e ENl with Aj DA./' = O for j # j'

and UjAj e ;,

p. UXy = X g,(Xy).C1)' j

The particular cases at issue in this book are of course the probabilities ofrandom events in a sample space f; more of this in Chapter 7. Condition (a) isoptional and set functions taking either sign may be referred to as measures (seee.g. j4.4), but non-negativity is desirable for present purposes.

A measurable space is a pair (f1,T) where f is any collection of objects, and 5is a c-field of subsets of f. When (,T) is a measurable space, the triple (,T,g)is called a measure space. More than one measure can be associated with themeasurable space (f1,F), hence the distinction between measure space and measur-able space is important.

Condition 3.1(c) is called countable additivity. lf a set function has theproperty

p,tAkJ#) = p.(z4)+p,(f) (3.2)for each disjoint pair A,#, a property that extends by iteration to finite collec-tions A1,...,An, it is said to be jnitely additive. In 3.1 5 could be a field,but the possibility of extending the properties of g, to the corresponding c-tield,by allowing additivity over countable collections, is an essential feature of ameasure.

If g,(f1)< cxn the measure is said to be hnite.And if Q = UjLh where lfl is acountable collection of Y-sets, and g.tfl < x for each j, g. is said to beG-jinite. In particular, if there is a collection .Y'such that ; = c(./) and Lh l

,$

Page 59: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

sieasure

for each j, g is said to be c-finite on./

(ratherthan on ;). If = (A fa #: # e11 for some A q @,(A,8L)is a measurable space and (A,?L,g)is a measure spacecalled the restriction of (f,T,g,) to A. If in this case

g,(AC)= 0 (equivalentto

g(A) = gtf1l when p.tfl < cx)l A is called a supportof the measure. When A supports2, the sets of ?Lhave the same measures as the corresponding ones of 5. A point

e f with the property g,(( )) > 0 is called an atom of the measure.

3.2 Example The case closest to everyday intuition is Lebesgue measure, m, onthe measurable space (R,O), where tB is the Borel field on R . Generalizing thenotion of length in geometry, Lebesgue measure assigns r?z((J,:1) = b - a to aninterval (J,!71. Additivity is an intuitively plausible property if we think ofmeasuring the total length of a collection of disjoint intervals.

Lebesgue measure is atomless (see3.15 below), every point of the line taking

measure 0, but ??z(R) = x. Letting ((J,:1, O(u,,j,m) denote the restriction of(R,f,pz) to a finite interval, m is a finite measure on (t7,:1.Since IR can bepartitioned into a countable collection of finite intervals, m is c-finite. n

Sole additional properties may be deduced from the definition:

3.3 Theorem For arbitrary T-sets A, #, and (Ay,j e NJ,

(i) A c # = g,(A) S g,(#) (monotonicity).(ii) gtA k.p#) + g,(AchB4 = g.(A)+ g(#).

(iii) p,(UyAj) f Xg,(Ay) (countablesubadditivity).

Proof To show (i) note that A and B -A are disjoint sets whose union is #, byhypothesis, and use 3.1(a) and 3.1(c). To show (ii),use A wB = A QJ (B -A) and B

= (AchB) k.pCB- A), where again the sets in each union are disjoint. The resultfollows on application of 3.1(c). To show (iii), detine #1 = Al and Bn =

A - UP-14 Note that the sets Bn are disjoint, that Bn ? An, and that U==lfj=n./ =1 j. j

UT=IAy.Hence,JX G) oo X

p,U = p, U#, = Xp,(l < Xp,(A?). ./=1 j=1 /=1 j=1

(3.3)

This proof illustrates a standard technique of measure theory, converting asequence of sets into a disjoint sequence having the same union by taking differ-ences. This trick will become familiar in numerous later applications.

The idea behind 3.3(ii) can be extended to give an expression for the measure ofany tinite union. This is the inclusion-exclusion formula'.

#n n

p,UA/ - Xp,(A) - 77p,(+rnAzl + 77p.tAycAAkchA,l -

...

j=1 j= 1 k+j k:gy:y;l

+ jttx 1 r''hA:2,f'-'h ... r''hXrj), (3.4)where the sign of the last term is negative if n is even and positive if n is odd,and there are ln - 1 tenns in the sulp in total. The proof of (3.4)is by inductionfrom 3.3(ii), substituting for the second term on the right-hand side of

Page 60: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

38 Mathematics

g,(,U.-j+)- p.(-.-)+ p,(-U.-,'.,)- p.(-j,'-s

f..h

-.-)

, ,(3.5)

repeatedly, for n - 1. n - 2,..., 1.Let (AU,

n e IN) be a monotone sequence of T-sets with limit A e T. A set func-tion on g, is said to be continuous if g,(z4,,)

.--.-:

g.(A).

3.4 Theorem A finite measure is continuous.

Proof First 1et (An) be increasing, with A,,-I cAn, and A = U*a=IA,,.The sequence(Bj, j G ), where #1 = Aj, and Bj = Aj -Ay-I for j > 1 is disjoint by eonstruc-tion, with Bj e @,Aa = U7=1#j,and

p,(An) = Xp,().j=1

The real sequence (g,(A,,)) is therefore monotone, and converges since it is boundedabove by g,4Q)< x. Countable additivity implies LT) =1g,(#j)

= g,(U7=l#y.)

= g,(A).Alternatively, 1et (An) be decreasing, with z4n-l Q An and :4

= O';=1A,,. Considerthe increasing sequence lA)) , determine g,(49 by the same argument, and use finiteadditivity to conclude that g(A) = gtf1l - g,(A9 is the limit of g(An) =

P.(fX-

P.(XnC).K

The finiteness of the measure is needed for the second part of the argument, butthe result that g,tAul --> g(A) when ytu '

A actually holds generally, not excludingthe case g,(A) = x. This theorem has a partial converse:

3.5 Theorem A non-negative set function g,which is finitely additive and contin-uous is countably additive.

Proof Let (#n) be a countable, disjoint sequence. If A,, = U7=l, the sequence(An ) is increasing, Bn f'nAn-l = 0, and so gtAnl = g,(#n)+ g,(An-I) for every n, byfinite additivity. Given non-negativity, it follows by induction that (g,(An)Jismonotone. lf A = U7=tBj, g(A) = ZTJ =Ip,(), whereas continuity implies that g,(A) =

LLIUTJ= lB . *

Arguments in the theory of integution often turn on the notion of afnegligible'

set. In a measure space (f1,T,g), a set ofmeasure zero is (simplyenough) a set Me 5 with g.(M) = 0. A condition or restriction on the elements of Q is said tooccur almost everywhere (a.e.)if it holds on a set E and f - E has measure zero.lf more than one measure is assigned to the same space, it may be necessary toindicate which measure the statement applies to, by writing a.e.lgl or a.e.gvq asthe case may be.

3.6 Theorem .

(i) If M and N are T-sets, M has measure 0 and N i M, then N has measure 0.(ii) lf (A&) is a countable sequence with g(Mj) = 0, V j, then g(UjMj4 = 0.

(iii) If fe)) is a countable sequence with g,(r.'pl= 0, Vj, then g,((UjA))9 = 0.

Page 61: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Measure 39

Proof (i) is an application of monotonicity', (ii) is a consequence of countableadditivity', and (iii)follows likewise, using the second de Morgan law. .

In j3.2 and j3.3 we will be concerned about the measurability of the sets in agiven space. We show that, if the sets of a given collection are measurable, thesets of the c-field generated by that collection are also measurable (theExten-sion Theorem). For many purposes this fact is sufficient, but there may be setsoutside the c-field which can be shown in other ways to be measurable, and itmight be desirable to include these in the measure space. In particular, if g(A) =

g,(#) it would seem reasonable to assign jtt'l = g(A) wheneverd c E c #. This isequivalent to assigning measure 0 to any subset of a set of measure 0.

The measure space (f1,T,jt) is said to be complete if, for any set E e 5 with g.(F)

= 0, a11 subsets of E are also in T. According to the following result, everymeasure space can be completed without changing any of our conclusions except inrespect of these negligible sets.

3.7 Theorem Given any measure space (f,;,jt), there exists a complete measurespace (f,TB,g), called the completion of (f1,F,p), such that 5 c

TB, and g(F) =

F(F) for al1 E e %. In

Notice that the completion of a space is defined with respect to a particularmeasure. The measurable space (f1,@)has a different completion for each measurethat can be defined on it.

Proof Let AB denote the collection of a11subsets of i-sets of pmmeasure0, andJ;B = LF c D: FA F e AB for some E e

.41

.

If jt(F) = 0, any set F c E satisfies the criterion of (3.7)and so is in FB as thedefinition requires. For F e @B, let F(#)= g,(F), where E is any F-set satisfyingE F e AB. To show that the choice of E is immaterial, let Ej and E1 be two such

sets, and note that

p,(A'lA&) = g,((FAF1)A (FAF2)) = 0. (3.8)Since jttFl t-pF2) = p,(F1r'hE + g(F1 A F2), we must conclude that

g,tflr-hE 2 p.tfl k p,(F1r'hF2)

for i = 1 and 2, or, g,(F1) = p.(&). Hence, the measure is unique. When F e T, wecan choose E = F, since FAF= O e AB, confirming that the measures agree on T.

It remains to show that TB is a c-tield colgaining ?F.Choosing E = F in (3.7)for Fe 5 shows 8/c

FB. If F e ;B, thenFA F e ABfor E e 5 and hence Ec AFC=

EF e AB where Ec e 5, and so Fc e TB. And finally if Fj e J;B foj e (N,thereexist Ej e 5 for j e EN,such that Ejh Fj e AB. Hence

U1 A U/) c UIF/AFJe AB,

? j ?

by 3.6(ii). This means that Ojb 6 TB, and completes the proof. w

(3.10)

Page 62: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

40 Mathematics

3.2 The Extension Theorem

You may wonderwhy, in thedefinition of ameasurable space, Fcould not simply bethe set of all subsets; the power set of D. The problem is to tind a consistentmethod of assigning a measure to every set. This is straightforward when the spacehas a tinite number of elements, but not in an infinite space where there is noway, even conceptually, to assign a specitic measure to each set. It is necessaryto specify a rule which generates a measure for any designated set. The problem ofmeasurability is basically the problem of going beyond constructive methods with-

out running into inconsistencies. We now show how this problem can be solved forc-fields. These are a sufticiently general class of sets to cope with most situa-tions arising in probability.

One must begin by assigning a measure, to be denotedgo, to the members of somebasic collection C for which this can feasibly be done. For example, to constructLebesgue measure we started by assigning to each interval (tz,&1the measure b - a.We then reason from the properties of go to extend it from this basic coltection

to a11 the sets of interest. C must be rich enough to allow go to be uniquelydefined by it. A collection C c 5 is called a determning class for (f,F) if,whenever jt and v are measures on 5, jt(A) = v(A) for all A e C implies that g = v.

Given C, we must also know how to assign gmvaluesto any sets derived from rbyoperations such as union, intersection, complementation, and difference. Fordisjoint sets A and B w have g)tz4QJ #) = go(A) + go(#) by finite additivity, andwhen B c A, gotA - B) = go(A) - 04#). We also need to be able to determinegotA fa #), which will require specitic knowledge of the relationship between thesets. When such assignments are possible for any pair of sets whose measures arethemselves known, the measure is thereby extended to a wider class of sets, to bedenoted 'Y'.Often

./

and C are the same collection, but in any event-t/'

is closedunder various finite set operations, and must at least be a semi-ring. In theapplications

,/

is typically either a field (algebra)or a semi-algebra. Example1.18 is a god case to keep in mind.

However,'f

cannot be a c-field since at most a finite number of operations arepermitted to determine go(A) for any A e .Y'.At this point we might pose the oppo-site question to the one we started with, and ask why

,/

might not be a rich enoughcollection for our needs. ln fact, events of interest frequently arise which

,$

cannot contain. 3.15 below illustrates the necessity of being able to go to thelimit, and consider events that are expressible only as countably infinite unions

or intersections of C-sets. Extending to the events 5 = c(#) proves indispensable.We have two results, establishing existence and uniqueness respectively.

3.8 Extension theorem (existence)Let $ be a semi-ring, and let go:./

F--yF+be ameasure on

,/.

lf 5 = c(,$). there exists a measure jt on (D,?D,such that jt(F) =

jL0t'l for each E e,YI.

n

Although the proof of the theorem is rather lengthy and some of the details arefiddly, the basic idea is simple. Take an event A c f to which we wish to assign a

Page 63: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Measure

measure g,(z4).If A G,/,

we have g,(A) = go(A). If A./,

consider choosing a finiteor countable covering for A from members of

,/,.

that is, a selection of sets Ej e./, j = 1,2,3,... such that A c OjEj. The object is to tind as

teconomical'

acovering as possible, in the sense that Xgo(f;)is as small as possible. Theouter measure of A is

P,*(X) = inf X.to(F./), (3.11)

where the infimum is taken over a11tiniteand countable coverings of zl by y-sets.If no such covering exists, set g,*(A)= x. Clearly, g7(A) = gotAl for each A E

.9.

g,*is called the outer measure because, for any eligible definition of g,(A),

g,*(A) 2 Xg,(l 2 p, Uf 2 p,(A), for Ej E ,Y'. (3.12)j j

The first inequality here is by the stipulation that p(Ej) = goIFJfor Ej e,/

inthe case where a covering exists, or else the majorant side is infinite. Thesecond and third follow by countable subadditivity and monotonicity respectively,because p, is a measure.

We could also construct a minimal covering for Ac and, at least if the relevantouter measures are tinite, define the inner measure of 4 as jt+(4) =

g,*(f) -

p,*(z4C). Note that since p,(z4)= g,tfl -

jt(AC) and g,*(A9 k jttAfl by (3.12),p,(A) 2 g,+(A). (3.13)

lf g,*(4)= g+44),it would make sense to call this common value the measure of A,and say that A is measurable. ln fact, we employ a more stringent criterion. A setA g f is said to be measurable if, for any B c f,

g,*(Af-'h B4 + g,*tz4fchB) = jt*(#). (3.14)This yields g,*(A) = p,+(A)as a special case on putting B = f1, but remains valideven if g,tfll = x.

Let A denote the collection of all measurable sets, those subsets of f) satis-fying (3.14).Since g,*(A)= go(A) forA e

,/

and /1.:40)= 0, putting 4 = O in (3.14)gives the trivial equality g,*(#) = g,*(#).Hence O e A, and since the definitionimplies that Ac E A if 4 e A, f E A too.

The next steps are to detennine what properties the set function jt*: A !-+ Fshares with a measure. Clearly,

jt*(A) 0 for all 4 C f1. (3.15)

Another property which follows directly from the definition of p,*is monotonicity:

X1 I X2 = F*(X1) f >*(X2), fOr A1, 42 i . (3.16)Our goal is to show that countable additivity also holds for g,* in respect ofA-sets, but it proves convenient to begin by establishing countable Juhadditivity.

3.9 Lemma If fAy,j E ENJ is any sequence of subsets of f, then

Page 64: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

42

g,* UX./ f /7>*(X./). (3.17)

Proof Assume g*(Ay) < x for each j. (If not, the result is trivial.) For each j,1et fEjkq denote a countable covering of Aj by y-sets, which satisfies

Mathematics

Xpo(,) < p.*(A>+ 2-:k

for any : > 0. Such a collection always exists, by the definition of g*. SinceU#.j ? LkkEjk, it follows by definition that

p.*UA/ f XpotA>l < Xp,*()+ e, (3.18)j j,k

./

noting Z7=12S = 1. (3.17) now follows since : is arbitrary and the last inequal-ity is strict. .

The following is an immediate consequence of the theorem, since subadditivitysupplies the reverse inequality to give (3.14).3.10 Corollary A is measurable if, for any B c D,

*(Ar''h ) +g,*(AC

r''h#) K g*(#). n (3.19)t

The following lemma is central to the proof of the extension theorem. lt yieldscountable additivity as a corollary, but also has a wider purpose.

3.11 Lemma A is a monotone class.

Proof Letting fAj, j e INJ be an increasing sequence of A-sets converging to 4 =

U#.j, we show A e A. For n > 1 and E e f1, the definition of an A-set gives

g7(Anfa E) = g*(4n-1ra (AnrhA=))+ g*(4k-1f-'h (Anf''h F))

= g*(An-1rhEI+ g7(& rn E). (3.20)

where Bn = An -An-1, and the sequenceg,*(A()rn E) = 0', then by induction,

(BjI is disjoint. Put ztttl = O so that

ke(AnchF) = Xpz*( rn E)y=1

(3.21)

holds for every n. The right-hand side of (3.21)for n e INis a monotone real

sequence, and g,*(An(hE) --> g,*(4rhA>)as n.--)

x. Now, since An e A,

jt*(F) = r1*(Anrnm + h1*(Xrn)k jt*(An(nE) + g*tAcrnfl, (3.22)

using the monotonicity of g,*and the fact that AC Anf.Taking the limit, we havefrom the foregoing argument that

Page 65: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

sleasure 43

jt*(A7 k g7(A r'AF) +g.*(AC

f'-h A'), (3.23)so that 4 e A by 3.10. For the case of a decreasing sequence, simply move to thecomplements and argue as above. .

Since () is a disjoint sequence, countable additivity emerges as a by-product ofthe lemma, as the following corollary shows.

3.12 Corollary If () is a disjoint sequence of A-sets,

g,*U - Xp,*().j j

(3.24)

Proof Immediate on putting E = in (3.21) and letting n --> x, noting VjBj = A. w

Notice how we needed 3.10 in the proof of 3.11, which is why additivity has beenderived from subadditivity rather than the other way about.

Proof of 3.8 We have established in (3.15)and (3.24)that g* is a measure for theelements of A. If it can be shown that 5 q A, setting g,(A) = g,*(A)for al1 A e 5will satisfy the existence criteria of the theorem.

The t'irst step is to show that.t/7

c A or, by 3.10, that 4 e,/

implies

g,*(A'r7A)+ g,*(A'r-hAf)< g*(F) (3.25)for any E i (. Let lAye

,$1

denote a finite or countable covering of E such thatLjgotAyl < p,*(F) + E, for s > 0. If no such covering exists, g,*(A-)= cxl by defin-ition and (3.25)holds trivially. Note that E r'AA c U/A; r-7A), and since

,/

is asemi-ring the sets Aj ('''AAare in ,Y'.Similarly, E /''AAC

c Uyt>jf-7WC), and by simpleset algbra and the definition of a semi-ring,

Aj r'Aytf = Aj - Aj chAl = UG: (3.26)k

where the Cjk are a finite collection of yksets, disjoint with each other and alsowith Aj r'hA. Now, applying 3.9 and the fact that g,*(#)= k)(#) for B e

.$,

we find

g,*(A'faA) +g*(Ff-$AC) f X jtot/j chdl + XXgotGl

j j k

=Xw)(z4y)< p,*(eD+ :,

?(3.27)

where the equality follows from (3.26)becau .to is tinitelyadditive, and Aj DAand the Cjk are mutually disjoint. Since s is arbitrary, (3.25)follows.

Next, we show that A is a c-field. We have only to show that A is a field,because 3.11 implies it is also a c-field, by 1.24. We lready know that e A andA is closed under complementation, so it remains to show that unions of A-sets arein M. Suppose that Al and A2 are A-sets and E c f. Then

:,*(13= >*(A1 nA') + F*(Af rhEI

Page 66: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

= jt*(z42c7z4jra e) + g*(z4qrndl chE)

+ g,*(A2/'nAkra A')+ jt*(A rnAf f-h E)

p,*(A2f-7Alra A')

+ g*((Aq r7Al ra F) k.p (A2chAf rn E) k.p(Aj/-AACf-h F))

= jt*((A2rn41) cjE) +p7((42ra4j)cfaF), (3.28)

where the inequality is by subadditivity, and the rest is set algebra. By 3.10this is sufficient for A1f'742 e A, and hence also for A1kJA2 G A, using closureunder complementation.

It follows that A is a c-tield containing $, and since 5 is the smallest suchc-field, we have that 5 i X, as required. .

Notice that (3.28)was got by using (3.14)as the relation defining measurability.The proof does not go through using p.*(A) = gx(A) as the definition.

The style of this argument tells us some important things about the role of .Y'.Any set that has no covering by '-sets is assigned the measure x, so for finitemeasures it is a requisite that f q LbEj for a tinite or countable collection fEje #') . The measure of a union of y-sets must be able to approximate the measure o?

any F-set arbitrarily well, and the bagic content of the theorem is to establishthat a semi-ring has this property.

To complete the demonstraiion of the extension, there remains the question ofuniqueness. To get this result we need to impose c-finiteness, which was notneeded for existence.

3.13 Extension theorem (uniqueness)Let jt and g,' denote measures on a space(f1,T), where 5 = c(.$), and

,/

is a semi-ring. If the measures are c-finite on.$

andg,(A') = jt'(F) for a1l E e .T, then g(An) = g'(F) for a1l E e %.

Proof We tirstprove the theorem for the case of finite measures, by an appli-cation of the 7:- theorem.Define .J= fE e F: p.(F) = p,'(AD) . Then $z ,J by hypoth-esis. If

./

is a semi-ring, it is also a l-system. By 1.27, the proof is completedif Fe can show that .J is a l-system, and hence contains c(#).

When the measure is finite, f e.4

and condition 1.26(a) holds. Additivityimplies that, for A e

.4,

F(X3 = F(f) - XX) = LUIf)- $(A) = >'(A3, (3.29)

so that 1.26(b) holds. Lastly, 1et fA.j) be a disjoint sequence in.4.

By countableadditivity,

co oo X G)

p,UAp - 77p,(A> - Xp,-+)- p,' UAk,

j=3 j= j=1 y=1(3.30)

and 1.26(c) holds. It follows by 1.26 and the 7:- theorem that 5 = c(./) ? W.Now consider the c-finite case, Let ( = VjBj where Bj e

./

and pBp = g,'(#j) <

co.%= fBj rnA: A e 5 J is a c-field, so that the (Bj,5) are measurable spaces,

Page 67: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Measure 45

on which jt and $ are tinitemeasures agreeing on./

fe'hBj. The preceding argumentshowed that, for A e 5, g(#j rnz4) = bt'Bj rnA) only if g and g,' are the samemeasllre.

Consider the following recursion. By 3.3(ii) we have

p.(Ara (#1t..p#2)) = g,(z4r'7#1) + h1(z4f''h #2) - g.(Xf''h #1 f''h #2). (3.31)Letting Cn = U7=l the same relation yields

g,ta4rn cn)= p.tz4rn Bn) + p,tAra cn-j)- p,tz4rn Bnrn cn-1). (3.32)The terms involving Cn-l on the right-hand side can be solved backwards to yield

an expression for g,(Af-'h G), as a sum of terms having the general form

g,(Ara Bjj rnBjzra Bj, ra ... ) = g,to ra Bj) < x (3.33)for some j, say j = j, in which ease D = A fn Bh fe'hBh fa ... e F. Since we know

thatgtD ra Bj) = jt'(D ra B? for all D G 5 by the preceding argument, it follows

that in (3.32)g.(Afa Cn) = g,'(4 fa Cn4. (3.34)

This olds for any n. Since Cn --) f as n-->

x, we obtain in the limit

F(A) = F'(A), (3.35)the two sides of the equality being either finite and equal, or both equal to +x.

This completes th proof, since A is arbitrary. w

3.14 Example Let A denote the subsets of R which are measurable according to(3.14) when g,*is the outer measure defined on the half-open intervals, whose

measures go are taken equal to their lengths. This defines Lebesgue measure m.'hese sets fol'm a semi-ring by 1.18, a countable collectioh of them covers R, and

the extension theorem shows that, given m is a c-finite measure, A contains theBorel field on R (see1.21), so (R,f,?n) is a measure space. It can be shown (wewonpt) that a11the Lebesgue-measurable sets not in B are subsets of T-sets of

measure 0. For any measure g,on (R,f), the complete space @,SB,F)includes a1lof the Lebzsgue-measurable sets. In

The following is a basic property of Lebesgue measure. Notice the need to dealwith a countable intersection of intervals to determine so simple a thing as themeasure of a point.

3.15 Theorem Any countable set from R hs Lebesgue measure 0.

Proof The measure of a point (.x)is zero, since for x e R,@Q

fx) = r)(x- 1/n, xj e Bn=1

ppd ??z((<))q = limn-jxl/n = 0. The result follows by 3.6(ii). w

.: '

' ': '

' '

'

(3.36)

Page 68: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

46

3.3 Non-measurabilityTo give the ideas of the last section their true force. it needs to be shown that

D i ible in other words, that ( can contain non-measurable subsets. InX c 2 s poss ,

this section we construct such a set in the half-open unit interval (0,11,astandard counter-example from Lebesgue theory.

For m y e (0,11, define the operator

Mathematics

A'+ 1,

4.x =.#y+x- 1,

(3.37)

This is addition modulo 1. Imagine the unit interval mapped onto a circle, like aclock face with 0 at the top. y

.i.

x is the point obtained by moving a handclockwise through an angle of 2= from an initial point y on the circumference.For each set A (0,11, and x e (0,1J,detine the set

A J.x = (y4.x: y l AJ. (3.38)

3.16 Theorem If A is Lebesgue-measurable so is A %x, and m(A 4.a') = mA), forany X.

Proof For (J,!71 i (0,11,ma +x, b +x1) = b -

c = lna,bj), for any real x suchthat a + x > 0 and b A-x < 1. The property extends to finite unions of intervalstranslated by x. If A is any Lebesgue-measurable subset of (0,11,and A A-x

(0,11 where A + x = (y+x: y e AJ, the construction of the extension similarlyimplies that A ..x is measurable and mA) = mA +x).

Now let A1 = A /'7 (0, 1-x),

and A2 = A f-h (1-x, 11.Then pztdl +x) = -(A1) andmAl + x

- 1) = mA1), where the sets on the left-hand sides of these equalities

are in each case contained in (0,11.A1+x and A2+x - 1 are disjoint sets whoseunion is A %x', and hence

mA 4.x) = rntz4l +x) + mAz + x - 1)

= rn(A1) + mA = m(A). w (3.39)

Define a relation for points of (0,1q, letting xRy if y = x.i.

r for r e :. Thatis, xR y if y is separated from x by a rational distance along the cifcle. R isan equivalence relation. Defining the equivalence classes

Ex = fy: y =.x

.i.

r, r e t1), (3.40)the sets of the collection (fI,

x e (0,11) are either identical, or disjoint.Since every x is a rational distance from some other point of the interval, thesesets cover (0,11.A collection formed by choosing just one of each of theidentical sets, and discarding the duplicates, is therefore a partition of (0,11.Write this as

(FA,

x 6 C), where C denotes the residual set of indices.Another example may help the reader to visualize these sets. In the set of

integers, the set of even integers is an equivalence class and can be defined as0 the Ket of inteeers which differ from 0 an even integer. Of course F0 = El =

Page 69: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

sfeasure 47

# = ...= Eln, for any n l Z. The set of odd integers can be detined similarly as

1 f integers differing by an even integer from 1. F1 = F3 =...

= F2n+1E , the set ofor any n e Z. Discarding the redundant members of the collection (A'A, .x

e 110F1 d tine a partition of Z.leaves just the collection LE , J to e

Now constnlct a set H by taking an element from Ex for each x e

3.17 Theorem H is not Lebesgue-measurable.

Proof Consider the countable collection (H # r, r l (I)) . We show that this collect-ion is a partition of (0,11. To show disjointness, argue by contradiction. Suppose

z e H.i.

rl and z e H'i.

r2, for rl # r2. This means there are points 1, hl e H,such that

l.i.

rl = z = hz.i.

r2. (3.41)If rl # r1, we cannot have Jll = 2; but if Jll # hl then Jll and hz belong todifferent equivalence classes by construction of H, and cannot be a rational

distance lrl- r2 I apart; hence no z satisfying (3.41)exists. On the other hand,

let H = LjrH.i.

r), and consider any point .&

e (0,11..x belongs to one of theequivalence classes, and hence is within a rational distance of some element of H;but H* contains al1 the points that are a rational distance r from a point of H,for some r, and hence x e H., and it follows that (0,11c S*.

Suppose mllt exists. Then by 3.16, mH% = mlH4 for all r. Since mH*) k

r?z((0,1J)= 1, we must have m+ > 0 by 3.6(ii), but countable additivity thengives mH.) = rnzbul'-l r) = =, which is impossible. It follows that m+ does notexist. w

The definition of H involves a slightly controversial area of mathematics, sincef ivalence classes is uncountable. It is not possible to devise, eventhe set o equ

in principle, constructive rules for selecting the set C, and elements from Exfor each vk e C. The proposition that sets like H exist cannot be deduced from theaxioms of set theory but must be asserted as an additional axiom, the so-calledaxiom of choice. lf one chooses to reject the axiom of choice, this counter-example fails. We have made no attempt here to treat set theol'y from the axiomaticstandpoint, and the theory in Chapter 1 has been what is technically called nat've(i.e. based on the intuitive notion of what a

tset' is). For us, the problem ofthe axiom of choice reduces to the question: should we admit the existence of amathematical object that cannot be constructed, even in imagination? The decisionis ultimately personal, but suffice it to say that most mathematicians are willingto do SO .

X

Sets like H do not belong to (0,1j = f# r'h(0,1j, B e B ). It is not difficult toshow that a11the sets of (o,1j are Lebesgue-measurable', see 3.14 and restrict mto (0,1) as in 3.2. By sticking with Borel setf we shall not run into measurabil-ity difficulties on the line, but this example should serve to make us careful. lnless familiar situations (suchas will arise in Part VI) measurability cah fail insuperficially plausible cases. However, if measurability is in doubt one mightremember that outer measure g,*is well defined for a1l subsets of f, and coincides

Page 70: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

48

with z whenever the latter is defined. Sometimes measurability problems are dealtwith by working explicitly with outer measure, and forgetting about them.

Mathematics

3.4 Product SpacesIf (f1,T) and (E,T) are two measurable spaces, 1et

fxE = ((,(): (t) l f, ( G El (3.42)

be the Cartesian product of f and E, and define ?F@V = c(Spg), where

Spg = ((Fx G), F e @, G G V). (3.43.)The space (f1x E, 5 * V) is called a product space, and (f,F) and (E,W)are thefactor spaces, or coordinate spaces, of the product. The elements of the col-lection Sys are called the measurable rectangles. The rectangles of the Euclidean

2 d ts of intervals) are a familiar case.plane R x R = R (pro uc

3.18 Example Rather trivially, consider the two-element setsA = (1,(1 e @,andB = f(1,(21e N. The corresponding rectangle is

A XB = 1(t,)1,('1), (t,)1,t2),(*2,(1),()2,t2)l.The sets f(1,(1), ((*,,(2))and ((1,(2), (2,(1)) are not rectangles, but areunions of rectangles and so are elements of 5 @T. n

Two important pieces of terminology. If E c f x E, the set an(F) = (: (,() e F)is called the projection of E onto f1. And if A D, the inverse projection of A isthe set

o-1(A) = A xE = (4),4): ) e A, ( e EJ. (3.44)A x E is also called a cylinder set in f x E, with base A. The latter terminology

2 d E = R Cylinder sets with basesis natural if you think about the case = R an .

in 5 and T are elements of R,e. One might tlnk that if E e 5 @N, antA-) shouldbe an F-set, but this is not necessarily the case. :tE'f # ln(M) in general (see1.3) so that the collection C of projections of 5 (&N-sets onto is not closedunder complementation. However, notice that A = ao(A x Z) so that 5 i C.

The main task of this section is to establish a pair of results required in theconstruction of measures on product spaces.

3.19 Theorem If C and b are semi-rings of subsets of and E, respectively, thenSws = (CxD: C e C, D e 01

is a semi-ring of x E.

Proof There are three conditions from 1.16 to be established. First, Ses clerlycontains 0. Second, consider C1,Czq C, andDlyDz G 0. Q rn Czq CandDl rhDZ

e D, and as a matter of definition,

(C1 xD1) rn (Q xD = () e Q, ( c Dj ) fa ( e Q, ( e Dal

Page 71: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Measure 49

= ((oe cl fa c2,( e ol fazhl

= (c1chca)x (D1cjD1) e Rwe. (3.45)

Third. assume that CI xDl i Cz xD2, and by a similar argument,

C1xo2) - (c1xD1) = (( e Cz, ( e D2): either ) cl or ( D1l

= ((C2 - C1)XD1) tp (C1 X (D2 -D1))

tp ((Q - C1)X (D2 -D1)). (3.46)

where the sets in the union on the right-hand side are disjoint. By hypothesis,the sets Cz - C1 and D1 - DI are finite disjoint unions of C-sets and O-setsrespectively, say (47,...,Cn'), and (D(,...,D,,,'). The product of a finite disjointunion of sets is a disjoint union of products; for example,

UC; x D1 = (tl),():(t e/=1

n n

Uts , ( e D1 = UICJx D1).j=1

./=1

(3.47)

Extending the same type of argument, we may also write

n m

(ca- cllxtoa-ol) - Utclxol)k-p U(c1xoi) kp U(cJxo;) . (3.48).j=1 k=1 j,k

All of the product seys in this union are disjoint (i.e.,a pair ((t),))can appearin at most one of them) and al1 are in Scs. This completes the proof. .

The second theorem leads to the useful result that, to extend a measure on aproductspace, it suffices to assign measures to the elements of Scs, where C andD are suitable classes of the factor spaces.

3.20 Theorem If 5 = c(C) and T = c(O) where C and D are semi-rings of subsetsof f and E respectively, then 5 @V = C @0.

Proof It ij clear that Rwo i Sps, and hence that C (8)D 5 (8)W.To show theconverse, consider the collection of inverse projections,

-1 F e Fj c pwg..t/a = (J:o (#),It can easily be verified that

,/p

is a c-field of fl x E, and is in fact the small--

.1

est c-field containing the colletion vt/c= (;o'(C), C e CJ i C (>))0.

,t/c

is al-system, and since T (&D is a c-field and hence a

-system,

it follows by the ,;r-.

theorem that xt/p= c(#c) ? C @)0. Exactly same conclusion holds for #s, the

orresponding collection for V. Evel'y element of Sss is the intersection of anelement from #v and one from #'g, and it follows that Spg c C @D. But Sys is al-system by 3.19 and hence a further application of the 7:- theorem gives 5 (8)N ?u gs; ,irlik pztjlptlof >product extends beyond pairs to triples and general n-tuples, and' . .

.. . . . .

'.

y: jjtyj)jjjujf gd jjjajj be jnterested in the nrooerties of Euclidean a-qnnne ffRlh

Page 72: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

50

For tinite n at least, a separate theory is not needed because results can beobtained by recursion. If (Ylf,X)is a third measurable space, then trivially,

fxE x'P = ((,(,v): (,) e f, ( e E, v e t1:l

= f((,(),v): (,() e DXE, v e .P1

= (f x E) xT. (3.49)

Mathetnatics

Either or both of (f),T) and (E,V) can be product spaces, and the last two theoremsextend to product spaces of any finite dimension.

3.5 Measurable TransformationsConsider measurable spaces (f,T) and (E,N) in a different context, as domainand codomain of a mapping

F: D F-> E.-1 5 for a11# l N. The idea is that aF is said to be sl%-measurableif F (#) l

measure p,defined on (,T) can be mapped into (E,V), every event B e V being-1 #)) We havejust encountered one example, theassigned a measure v(#) = g,(F (. .

projection mapping, whose inverse defined in (3.44)takes each T-set A into ameasurable rectangle.

Corresponding to a mesurable transformation there is always a transformedmeasure, in the following sense.

3.21 Theorem Let jt be a measure on (f,F) and F: f F- E a measurable transfonu--1 i (E V) whereation. Then P.F s a measure on , ,

-1-1

g, gj5ojLF (#) = g,(F (#)), each B e . ( .

-14 1 0 a1l A e Bv. SinceProof we check conditions 3.1(a)-(c). Clearly g,F ( ) ,

-1E) = holds by definition, T-1(O) = O by 1.2(iii) and so g,F-1(3) =T (-1 O)) = g,(O) = 0. For countable additivity we must showp,(T (

g,F-1 U#y= X/tF-1()j j

(3.51)

for a disjoint collection Bj,B1,... e E. Letting B) =

that the B; are disjoint and that F-1(Uy#y)= Uy#J.becomes

-1F Bj4, 1.2 shows both

Equation (3.51)therefore

p,Ur = Xg,(#J)j

./

(3.52)

for disjoint sets Bgq,which holds because g, is a measure. .

The main result o general transformations is the following.

-1 5 for each B e D where D is an arbitrary class3.22 Theorem Suppose F (#) l ,

of sets, and N = c4D). Then the transfonnation F is T/v-measurable.

Page 73: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

proofBy 1.2(ii) and (iii), if T-3Bj) q @,j q s, then T-3LjBj) = UjF-1()($.,

.

?y d if z'-1(,)e @then z'-1(#r)= z'-1(#)cl @ It follows that the class of. E , an .

j'. k

,

,,

CtS

a . t,: w-1(s)s s)is a c-field. Since D c

.4,

N c 4 by definition. .

Measure 51

This result is easily iterated. If (t1#,J't)is another measurable space and U: E F-> VIJ

is a V/x-measurable transformation, then&oF: ( F-+ VP

1 N d henceis F/s-measurable, since for C E R, W (C) E , an-1

-1 -1

(&0D (0 = F (& (C)) 1 9. (3.53)

An important special case: F: f' -..: E is called a measurable isomorphism if it is-1 ble The measurable spaces (2,1) and1-1 oflto, and both F and F are measura .

(E,N) are said to be isomorphic if such a mapping between them exists. Theimplication is that measure-theoretic discussions can be conducted equivalently ineither (f,T) or (E,V). This might appear related to the homeomorphic property ofrealfunctions, and ahomeomorphismis indeed measurablyisomorphic. Butthereis

no implication the other way.

3.23 Example Consider g: r0,11 F--> (0,11,defined by

x + ,1 0 f x f 12

#(A) =. (3.54)

x - 1 l < x < 12, 1)

Note that g is discontinuous, but is 1-1 onto, of bounded variation, and hence-j

Sgo,lj/fgo,lj-measurableby 3.32 below, and g = g. I::I

The class of measurable transformations most often encountered is where thecodomain is (R,O), B being the linear Borel tield. In this case we speak of afunction, and generally use the notation f instead of F. A function may also havethe extended real line (RlW)as codomain. The measurability criteria are ajfollows.

3.24 Theorem(i) A function /': f F-+ R for which (): J(() f x) e ?Ffor each x E (Dis

T/f-measurable. So is a function foz which (): fk < x) e 5 for each

x e (D.

(ii) A function fl f.1 F-> 2- for which f(t):.f4))

f xl e ?F for each x eQ tp (+x) t.p (-x) is T/f-measurable.

f the form f-1(#)

B e T' whereProof For case (i), the sets f(t): flk f xl are o ,

C is defined in 1.21. Since B = c(C), the theorem follows by 3.22. The othercllection indicated also genertes :B, and the same argument applies. Theextension to case (ii) is equally straightforward. .

Page 74: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

52

The basic properties of measurable functions follow directly.

3.25 Theorem(i) If f is measurable, so are c + f and c#, where c is any constant.

(ii) If f and g are measurable, so is f + g.

Mathentatics

Proof If f K x, then f + c S x + c, so that f + c is measurable by 3.24. Also, forjq.X e ,

(: /*() < xIc ), c > 0

((t): /) < xlclc, c < 0(: c/) S XJ = (3.55)f1, c = 0 and x k 00, c = 0 and x < 0

where for each of the cases on the right-hand side and each x/c 6 R the sets arein 5, proving part (i).

If and only if f + g <.x, there exist r e ) such that f < r < x

-

g (see 1.10).lt follows that

(: J()) +.g((t)) <.z')

= U (2 f(l < r) f-n (: :(t0) < x- r1. (3.56)rq Q

The countable union of T-sets on the right-hand side is an T-set, and since thisholds for every x, pal't (ii) also follows by 3.24(i), where in this case it isconvenient to generate tB from the open half-lines. >

Combining parts (i) and (ii) shows that if h,...,fn are measurable functions sois Xhjcjfj, where the cj are constant coefficients.

The measurability of suprema, infima, and limits of sequences of measurablefunctions is important in many applications, especially the derivation of inte-grals in Chapter 4. These are the main cases involving the extended line, becauseof the possibility that sequences in R are diverging. Such limits lying in V arecalled extended functions.

3.26 Theorem Let f/l be a sequence of F/s-measurable functions. Then infn/n,supn/k, liminfln, and limsuw/ are F/W-measurable.

F f xl s 5 for each n by assumption. HenceProof For any x e , (: fnlk

t: supnlnt) <.z')

= Of): fnk S xl e @, (3.57)n=1

so that supn/ is measurable by 3.24(ii). Since infn/'n =-supnt-ln),

we alsoobtain

(: infn/t) <.x)

= (: sup,,(-Jn())> -xJ

= (: supn(-Jn())S -xJf

Page 75: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Measure 53

00 C

Ola''. -Tn(a')K-xl

n=1

X

=U1e):/() < xl e 5.n=1

To extend this result from strong to weak inequalities, writeX

(: infn/kt) f xl = O(: infn/kt) < x+ 1/rlJ e 5. (3.59)?n=1

Similarly to (3.57),we may show

1: supknfkt) f.x1

= Ot:.fn()

S.x1

e 5, (3.60)kkn

and applying (3.59)to the sequence of functions gn = sums/k yields

(: limsupn/t) K xl e F. (3.61)In much the same way, we can also show

( (t): liminfn/t) f x) e @. (3.62)The measurability condition of 3.24 is therefore satisfied in each case. .

(3.58)

We could add that limn/kt) exists and is measurable whenever limsupn/t) =

liminfs/kt). This equality may hold only on a subset of f1, but we say fnconverges a.e. when the complement of this set has measure zero.

The indicatorfunction 1s()) of a set E e 5 takes the value 1s() = 1 when (.t)E

E, and 1s((l)) = 0 otherwise. Some authors call ls the characteristichmction of F.It may also be written as fs or as zE. We now give some useful facts aboutindicator functions.

3.27 Theorem(i) 1s() is 5IB measurable if and only if E G F.

(ii) 1sc()) = 1 - 1s().

(iii)1Ujsj((l))= sup 1sf().i

(iv) lnjsjt) = inf 1s/) = I-l1sj((l)).i i

Proof To show (i) note that, for each B e .S,

if 0 G B and 1 G B

-1 E if 1 e #, 0 #1F (#) =

Ec if 0 e B, 1 , B0, otherwise

'Fhse sts are in 5 if and only if E e @.The other parts of the theorem reithediyte fwm tlw dfinition. .

(3.63)

Page 76: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

54

Indicator functions are the building blocks for more elaborate functions,constructed so as to ensure measurability. A simplefunction is a T/o-measurablefunction J: fl F...: R having finite range', that is, it has the fonn

ftl = Xaflsjtt9l= af, tt e Ei, (3.64)=1

Mathematics

where the G1,...,(xa are constants and the collection of T-sets F1,...,Fa is atinite partition of f1. T/f-measurability holds because, for any # 6 B,

J-1(,) = UEi s @. (3.65)ai EEB

Simple functions are ubiquitous devices in measure andprobability theory, becausemany problems can be solved for such functions rather easily, and then generalizedto arbitrary functions by a limiting approximation argument such as the following.

l

Fig. 3.1

3.28 Theorem If f is T/o-measurable and non-negative, there exists a monotonesequence of T/f-measurable simple functions f/'(n),n e ENI such thatnlt) '

ffor evel'y (l) e f1.

Proof For i = 1,...,n2'l, consider the sets Ei = f ): i - 1)/2' f < il2n) .

Augment these with the set Enzn.. = f : J() k nl . This collection corresponds toa n2n + l-fold partition of (0,x) into S-sets, and since f is a function, each

maps into one and only one.f4)),

and hence belongs to one and only one Ei. The Fftherefore constitute a partition of f1. Since J is measurable, Ei s 5 for each i.Define a simple function hn)on the Ei by letting i = i - 1)/2/, for i = 1,...,nln + 1. Then fn) f f, but A+1())k /4)) for every ); incrementing n bisectseach interval, and if n)()) = i - 1)/2*, n+1)()) is equal to either

&+1 2/ - 1)/2'+1 > J (). It follows that the sequence isli - 1)/2 = n)(), or ( .

Page 77: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

sieasure 55

monotone, and lims-yxnlt) = J(). This holds for each ) e f. To extend fromnon-negative to general functions, one takes the positive and negative parts.Define J+ = maxlf,ol and f- = f'b- f, so that both f- and J- are non-negativefunctions. Then if f'Injand j'-n) are the non-negative simple approximations tof+ and f- defined in 3.28, and fnj =

.f+(n)

- f-(n),

it is clear that

l/-J(n)I < lJ+-.f+(n)l+ 1.f---(n)I --.h 0. w (3.66)Fig. 3. 1 illustrates the construction for n = 2 and the case = R, so that

.f()

is a function on the real line.

3.6 Borel Fqnctions

lf J is a measurable function, and

gl S F-+ 5'; S g R, 'T g R

is a function of a real variable, is the composite function gof measurable? The

answerto this question is yes if and only if g is a Borel hmction.Let Ss =

(# ros: B e S), where B is the Borel field of R. fs is a c-field of subsets of S,and B DS is open (closed)in the relative topology on S whenever B is open(closed).in R (see2.1 and 2.3). fs is called the Borel field on S. Define fvsimilarlywith respect to 7. Then g is called a Borel function (i.e., is Borel-

-1 B for all sets B l f.r.measurable)if g B) e s-1 k ach point of R+ into the points x3.29 Example Consider gx) = Ixl. g ta es e

+ h triction of B to R+) the image under.g-1

is the setand -x. For any B E B (t e rescontaining the points x and -x for each a7 e #, which is an element of f. E1

3.30 Example Let g(x) = 1 if .: is rational, 0 otherwise. Note that (De B (see-1 i defined according to (3.63)with E = Q, so g is Borel-measur-3.15), and g s

able. I:a

ln fact, to construct a4plausible' non-measurable function is quite difticult,

but the obvious case is the following.

3.31 Example Take a set A f; for example, let A be the set H defined in 3.17.- l gpNow construct the indicator function 1x@): R F-> (0,1) . Since 1x (f11)= A ,

Shis function is not measurable. Ia

Necessary conditions for Borel measurability are hard to pin down, but the follow-ing suftkient conditions are convenient.

3.32 Theorem lf g'. S F-+ 'T is either (i) continuous or (ii)of botlnded variation,it is Borel-measurable.

Pmof (i)follows immediately from 3.22 and the definition of a Borel field, since.. jcotinuity implies that h B) is open (closed)in S whenever B is open (closed)

in T, by 2.17.To prove (ii),consider first a non-decreasing function h: R F.-y R, having the

prperty h@l S @)when y < x; if A = (y:h@l S (x)), sup A = x and A is one

Page 78: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

of (-x,x) and (-=A, so the condition of 3.24 is satisfied. So suppose g is non-decreasing on S; applying the last result to any non-decreasing h with the prop-erty hx) =

'(x):

x e S, we have also shown that g is Borel-measurable because-1 5') = h-'B4 f''ASe Ss, for each Brh'T e O.r. Since a function of boundedg CBt''h

variation is the difference of two non-decreasing functions by 2.20, the theoremnow follows easily by 3.25. .

This result lets us add a further case to those of 3.25.

3.33 Theorem If J and g are measurable, so is fg.Proof jg = z1((J+:)2

- fl - gl) and the result follows on combining 3.32(i)with 3.25(ii). .

The concept of a Borel function extends naturally to Euclidean a-spaces, andindeed, to mappings between spaces of different dimepsion. A vector function

' S R 1- R'/gl S--./ 7, i ,

-1 #) e B for a11B e Sv, where Bs = (# taS: # e lis Borel-measurable if g ( sand fv = (# r-75': B e f'''J.3.34 Theorem lf g is continuous. it is Borel-measurable.

Proof By 2.21. w

Finally, note the application of 3.21 to these cases.

Bk d : S s.y'r

is Borel-measurable3.35 Theorem If jt is a measure on (R , ) an gk d -Er R'/

-1

is a measure on (T,Sv) wherewhere S c R an c , gg

-1B) = t-1(s)) (3.67)gg ( ,

for each B e f.r. nA im le example is where g is the projection of R onto

R'N for m < k. If X iss pkx 1 with partition X' = (21,7+'.),where X. is m x 1 and X.. is k - m) x 1, 1et

k R*' be defined bygl R F->

r(A') = X.. (3.68)-1

-1

k-m p s qrrlIn this case, g,g B) = P.W(#)) = jt(# x R ) or .

Page 79: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

4Integration

4. 1 Construction of the lntegral

The reader may be familiar with the Riemann integral of a bounded non-negative

function f on a bounded interval of the line gJ,:J, usually written fdx. Theobjects to be studied in this chapter represent a heroic generalization of thesame idea. lnstead of intervals of the line, the integral is defined on an arb-

itral'y measure space.Suppose (f1,@,g)is a measure space and

: l'1 O 2-6is a T/s-measurable function into the non-negative, extended real line. The inte-gral of f is defined to be the real valued functional

jfdb = Sup tXjtinsfofte)ljpfj(4.1)

where the supremum is taken over a11finite pmitions of into sets Ei e T, andthe supremum exists. lf no supremum exists, the integral is assigned the value

5 h integral of the function IAJ,where 1x() is the indicator of the set 4 e+x. T e5, is called the integral of f over A, and written fdbk.

The expression in (4.1)is sometimes called the lower integral. and denotedsfdp. Likewise defining the upper integral of f,

j'ydb- inf E (supsy/jbte'i) ,

i G

(4.2)

we should like these two constnzctions, approximating f from below and fromabove, to agree. And indeed, ttis possible to show that .fdbk

= J'/W/whenever fis bounded and g(f1) < x. However,

*fdbk= oo if either the set (:

.f()

> 0) hasinfinite measure, or f is unbounded on sets of positive measure. Definition (4.1)is preferred because it can yield a finite value in these cases.

4.1 Example A familiar case is the measure space ('R,f,-), where m is Lebesguemeasure. The integral Ifdmwhere f is a Borel function is the Lebesgue integral off. This is customarily written j'fdx,reflecting the fact that mx, x + #x1) = tfx,

even though the sets (A'j) in (4.1)need not be intervals. n4.2 Example Consider a measure space (R,f,g,) where g, differs from m. Theintegral Ifdbk'where f is a Borel function, is the Lebesgue-stieltjes integral.

Page 80: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

58

The monotone function

Fxt = p,((-x,x1) (4.3)has the property g,((J,:1) = F(b) - F(c), and the measure of the interval(x,x + #< can be written JF@). The notation IfdF means exactly the same asJ/Wg.,the choice between the p.and F representations being a matter of taste. Seej8.2 and j9.1 for details. n

Mathematics

For a contrast with these cases, consider the Riemann-stieltjes integral. For aninterval Etz,!71,1et a partition into subintervals be defined by a set of points H

= (x1,...,Akl, with a = xo <.x1

< ... < xn = b. Another set H' is called arefinement of Fl if IR c H'. Given functions f and a: R 1-+ R, let

5lRvas.fl = 77f(f)(G(A) - a(zf-I)),

i=1(4.4)

here ti e (xf-l,xf). If there exists a number bafd%, such that for every : > 0W

there is a partition IQ with

( b&l-I.a,.f)

'-jajd

< 6:l

for all FI Q IQ and every choice of fff), this is called the Riemann-stieltjesintegtal of J with respect to a. Recall in this connection the well-known formulafor integration by parts, which states thqt when both integrals exist,

fbtub) = /J)a(J) + jafd+ ja@f'When (x = x and #is bounded this definition yields the ordinary Riemann integral,and when it exists, this always agrees with the Lebesgue integral of f over gtz,:).Moreover, if a is an increasing function of the form in (4.3),this integral isequal to the Lebesgue-stieltjes integral whenever it is defined. There do existbounded, measurable functions which are not Riemann-integrable (consider3.30for example) so that even for bounded intervals the Lebesgue integral is the moreinclusive concept.

However, the Riemann-stieltjes integral is defined for more general classes ofintegrator function. In particular, if f is continuous it exists for a of boundedvariation on gtz,&),not necessarily monotone. These integrals therefore falloutside the class defined by (4.1), although note that when a is of bounded varia-tion, having a representation as the difference of two increasing functions, theReimann-stieltjes integral is the difference between a pair of Lebesgue-stieltjesintegrals on (J,:1.

The best way to understand the general integral is not to study a particular

measure space, such as the line, but to restrict attention initially to particularclasses of function. The simplest possible case is the indicator of a set. Then,

every partition fEiI yields the same value for the sum of terms in (4.1),which is

Page 81: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 59

jgdn = jxdp= BIX), (4.6)

for any A e 5. Note that if A 5, the integral is undefined.Another case of much importance is the following.

4.3 Theorem lf J = 0 a.e.lgl, then fdp = 0.

Proof The theorem says there exists C with gtc) = 1, such that /4)) = 0 for )

e C. For any partition (Fl,...,&J 1et A'; = Ei rn C, and A': = Ei - F;. By additiv-ity of g,,

77 inf /.() g,(e) = X inf /.()l g,(e':) + X inf f g,(rJ)i (l) e Ei i (l) e F; i e e:

= 0, (4.7)

the first sum of terms disappearing because /*4) = 0, and the second disappearingby

.6(i)

since pE'i) f g,(C') = 0 for each i. .

A class of functions for which evaluation of the integral is simple, as their name

suggests,is the non-negative simple functions.

4.4Theorem Let (?() = X:=1%1sj()),where aj k 0 for i = 1,...,n, gnd EL,...,En E

5 is a partition of f. Then

tFp, = 77 ipEi).=1

(4.8)

Proof Consider an arbitrary finite partition of f1, A1,...,4-, and define I =

inftxxyst). Then, using additivity of g,

77Ip,(Ay) = XSX p,(AyraEi4.j=1 j=3 f=1

< xi jtt/tjc Ei4i=1 j=L

= ipEi),f=1

(4.9)

where the inequality uses the fact that I assumes the smallest value of Gj suchthat Aj ra Ei # 0, by definition. The theorem follows, given (4.1),since (4.9)holds as an equality for the case m = n and Ai = Ei, i = 1,...,n. w

So for functions with finite range, the integral is the sum of the possible valuesof f, weighted by the measures of the sets on which those values hold. Look atFig. 3.1. The Lebesgue integral of the approximating function fz) in the figureis the sum of the areas of the rectangular regions. To compute the Lebesgue-stiel-ties inteeral with resoect to some measure u.. one renlnces the width of the qetq

Page 82: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

62 Mathematics

= yaj + tzj, (t) s Afra Bj, (4.17)

a simple function. Hence,

ja-bdb - X Xtla,+bbbviAil-bB?i j

= zxGf7r/ttArnl + ,xfs>7/ttAf rn B?i j j i

= '7rafgtA + bTfbbkB)i j

=ajdp

+bjydp, (4.18)

showing that linearity applies to simple functions. Now applying 4.6,

jaf + bgldp = sup ja + bz4dbk(pf/f'

=a (suspyjt?lj1+ bjsyusp,jygjtj#

=ajfdp

+bjgdp. (4.19)

To extend the result to general functions, note that

la.f + bg l < 1a I. If I + Ib I. Ig l, (4.20)

so (4.19) shows that af + bg is integrable so long f and g are integrable, and aand b are finite. The identity

af + bg = afl' - afl- + bg4'b- (bg)- (4.21)implies, applying (4.19), that

jaf + bntdb = jaldb- J(&T)-*+Jt/WI+dB'-jb8t-db.

(4.22)

If a k 0, then afl+dbk - afl-d = Jtf/'+lt- JJ-Jg,) = ajfdbk, whereas if a < 0,

ab+dk- aft-dbk = Ia I f-db - f+db = IJI-fd

= alfdb. The same argu-ment applies to the terms in b and g. So (4.16) holds as required. .

Linearity is a very useful property. The first application is to show theinvariance of the integral to the behaviour of functions on sets of measure 0,extending the basic result of 4.3.

4.8 Lemma Let f and g be integrable functions.(i) If f f g a.e.lg1, then fdkt; hdp.

(ii) If f = g a.e.gg,l, then J/Wg,= dp.

Proof For (i), consider first the case J = 0. lf g 2 0 ekerywhere, dbk 2 0directly from (4.1). So suppose # k 0 a,e.rg,l and define

Page 83: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration

0, :() 2 0Jl4)) =

.

=, #() < 0

63

Then h = 0 a.e.gp,) but g + h k 0 everywhere, and, applying 4.7,

0 S jg+ hjdp = jgdp+ jhdp= jgdp, (4.23)

since fhdbk= 0 by 4.3. Now replace g by g- J in the last argument to show

J('-pdp

0, and hence dbkk J/Wptby 4.7.To prove (ii), let h = f -

g so that h = 0 a.e.gg,l, and hdp = 0 by 4.3. ThenJJJF = J('+ hjdbk = dp + hdbk= J#JF, Where the second equality is by 4.7. w

These results permit the extension to the more commonly quoted version of themonotone convergence theorem.

4.9 Corollary lf fn k 0 and fn'1'

J a.e.g.l, limn-oxl/nv = jdp. nTAnother implication of linearity is the following.

4.10 Modulus inequality jlf 1Jjt 2 jjj'dpj.

Proof jlJ lJz = jf' +#-)tfp, = jf'bdp+ jf-#p,

jJ/-d:- Jy-#>l- )Jy::1..

In the form of 4.9, the monotone convergence theorem has several other useful

corollaries.

4.11 Fatou's lemma If Jnk 0 a.e.Lg1,then

liminf fndpk j liminf Jn Jg,.nMx n->oo

Proof Let gn = infkn/'k, so that (,n) is a non-decreasing sequence, and gn 1%g =

liminfn/n. Since Jnk gn, Jfntkyt2 Igndp.Letting n--+

oo on both sides of theinequality gives

liminf fndp2 lim gndp = gdp = liminf fn #p.. wn.-+txz n-M n--x

(4.24)

4.12 Dominated convergence theorem If fn-- f a.e.lgj, an there exists g suchthat IfnI S g a.erlp1 for al1 n and gdbk < x, then JlnIg, .-A JJJP,.

Proof According to 4.8(i), dp < x implies JlfnI#g, < =. Let hn = Ifn- f I, suchthat 0 f hn K 2g a.e.lg,j and hn --> 0 a.e.ggj. Applying 4.3 to liminfns, linear-jw. attl , Fatu' s lymma,

......:t..... ........(.).t.......:.').........:.. ...)..(..( ..'). '

... .y.E... . . . . . . . .j.... .. .

Page 84: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

64 Mathematics

ljgdp = J liminftzg - hn) Jjt f liminf j (lg - hnjdpn-/co N->x

= jgdp- limsup jhndp,

N-+=

(4.25)

where the last equality uses (2.4).Clearly, limsupn-yxf/lnzv = 0, and since j'hndbk 0 the modulus inequality implies

1imjfndp- j/'ljt< 1imjhndp= 0. .n-e n'M

(4.26)

Taking the case where the g is replaced by a tinite constant produces the follow-ing version, often more convenient:

4.13 Bounded convergence theorem If fn--y

f a.e.gg1 and IA! f B < x for all n,then limn--jxlfntfg, --> Ifdt< x. E1

Theorem 4.7 extends by recursion from pairs to arbitrary finite sums of func-tions,and in pmicular we may assert that J(Z:=1'i)Jg, = Z:=1J##g..Put fn =

Xl=tgiand fndbk= l=tlgid,where the gi are pon-negative functions. Then, if fn-1xf = Z7=l'j < x a.e., 4.9 also permits us to assert the following.

4.14 Corollary If f'fl is a sequence of non-negative functions,

- , - #jg, . oJNg'fcz1 i=1

(4.27)

By implication, the two sides of this equation are either both infinite, or finiteand equal. This has a particular application to results involving c-finitemeasures. Suppose we wish to evaluate an integral gdkkusing a method that worksfor finite measures. To extend to the c-finite case, choose a countable partitionlffl of f1, such that g(fk) < x for each . Letting gi = loy, note that g = igi,

and gdp = iidp by (4.27).

4.3 Product Measure and Multiple lntegrals

Let (f,@,g)and (E,V,v) be measure spaces. ln general, tfx E, 5 @N,1) mightalso be a measure space, with 7t a measure o'n the sets of 5 @V. In this casemeasures g andv, defined by g,(#) = 7:(Fx E) andvtG) = a(f1 x G4respectivlly, arecalled the marginal measures conrsponding to T:.

Alternatively, suppose that g and v are given, and define the set function

1: Sys F-> 2-+,where Sss denotes the measurable rectangles of the space f x E, by

7:(Fx G) = g,(&v(G). (4.28)We will show that 1: is a measure on Syg, called the product measure. >nd has anextension to 5 * T, so that tfx E, 5 (8)V,a) is indeed a measure space. The tirst

Page 85: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 65

step in this demonstration is to define the mapping

F(o: E F-> x E

by Fo(() = (,(), so that, for G e V, F(o(G) = f )) x G. For E e 5 @V, let

Eo = F;1(F) = ((: (,() e F) c E. (4.29)The set Eojcan be thought of as the cross-section through E at the element (t. Forany countable collection of 5 @V-sets (Ej, j e EN),

U = ((:(t,),t)e U) = Ul(:(,() e EjI = U().j j

./

j(4.30)

For future reference, note the following.

4.15 Lemma Toj is a W5 @Vl-measurable mapping for each (l) e f1.

Proof We must show thate'toe N wheneverf e 5 (&V. If F=Fx G for Fq 5 andG e N, it is obvious that

G, E FEo, = e N.

3, (l) e F

Since 5 (&N = c(Sps), the lemma follows by 3.22. w

The second step is to show the following.

4.16 Theorem ';: is a measure on Sps.

Proof Clearly JE is non-negative, and <0) = 0, recalling that Fx O = O x G = Ofor any F e 5 or G e V, and applying (4.28).It remains to show countable additiv-ity. Let fE. e Sps, j e (N) be a disjoint collection, such that there exist sets Fje 5 and Gj E V with Ej = F) x Gj; and also suppose E = VjEj e Spg, such thatthere exist sets F and G with E = Fx G. Any point (,() e Fx G belongs to oneand only one of the sets Fj x Gj, so that for any (t) e F, the sets of thesubcollection fGjI for which ) e Fj. must constitute a partition of G. Hence,applying (4.30)and (4.31),

(4.31)

vE.) = v U = v U()(ojj

Gj, ) e Fj

= v U = X 1,())v(%),3, (t) e Fj jj

(4.32)

where the additivity of v can be applied since the sets Gj appearing in thisdcomposition are disjoint. Since we can also write vtfo) = v(G)1A)),we find

l(F) = g,tFlvtGl = JY(FV>()) = JX 1z)(t0)V(f) tf/t*l

1

. ... ..f.....j....L.... . : ..r.

.. . )..

.... .

Page 86: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

66 Mathentatics

=Xp.(F?)v(t7;) = Xz:(e;.)j j

(4.33)

as required, where the penultimate equality is by 4.14. w

It is now straightforward to extend the measure from R,g to 5 (& V.

4.17 Theorem ( x E, 5 e N, J:) is a measure space.

Proof 5 and V are c-fields and hence semi-rings; hence Svs, is a semi-ring by3.19. The theorem follows from 4.16 and 3.8. .

Iterating the preceding arguments (i.e. letting (,T) and/or (E,V) be productspaces) allows the concept to be extended to products of higher order. In laterchapters, product probability measures will embody the intuitive notion of statis-tical independence, although this is by no means the only application we shallmeet. The following case has a familiar geometrical intemretation.

2 R x R is detined for intervals4.18Example Lebesgue measure in the plane, R =,

by-((Jl,:11 x (J2,&21) = b1 - J1)(:2 - J2). (4.34)

Here the measurable rectangles include the actual geometrical rectangles (productsoof intervals), and

0A,

the Borel sets of the plane, is generated from these as a2 2

consequence of 3.20. By the foregoing reasoning, (R ,B ,m4 is a measure space inwhich the measure of a set is given by its area. n

We now construct integrals of functions /*((t),()on the product space. The follow-ing lemma is a natural extension of 4.15, for it considers what we might think ofas a cross-section through the mapping at a point (J) e , yielding a function withdomain E.

4.19 Lemma Let f:D x E F-> R be 5 @)Ws-measurable. Define.f(o(j)

= A,tlforfixed ) e . Then f(0:E F- R is V/o-measurable.

Proof We can write

f(() = f(eD,() = f(Fo(()) = /oF((). (4.35)

By 4.15 and the remarks following 3.22, the composite function fo is MIB-measurable. K

Suppose we are able to integrate f with respect io v over E. There are two ques-tions of interest that arise here. First, is the resulting function gl = JE/'IWV

F/plmeasurable? And second, if g is now integrated over f1, what is therelationship between this integral and the integral Inxzfd:Over ( x E? Theaffirmative answer to the first of these questions, and the fact that thefiterated' integral is identical with the tdouble' integral where these exist, arethe most important results for product spaces, known jointly as the Fubinitheorem. Since iterated integration is an operation we tend to take for g'rantedxxrltla mllltlnlo piemnnn infnyrnlq nerhnns the main Doint needin: to be stressed

Page 87: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 67

here is that this convenient property of product measures (and multivariateLebesgue measure in particular) does not generalize to arbitrary measures onproduct spaces.

The first step is to let f be the indicator of a set E e 5 (& V. ln this case fis the indicator of the set Efo defined in (4.29),and

(4.36)

say. ln view of 4.15, Efo e V and the function #s: fl .-.: 2-+is well-defined,although, unless v is a finite measure, it may take its values in the extendedhalf line, as shown.

4.20 Lemma Let jt and v be c-finite. For all E e 5 (&V, g; is T/p-measurableand

Jo#FdB= TCZ')' (4.37)

J/'olv= v(F(o) = +(e)),

By implication, the two sides of the equality in (4.37)are either both infinite,

or finite and equal.

Proof Assume first that the measures are finite. The theorem is proved for this

case using the ';:- theorem. Let.4

denote the collection of sets E such that gzsatisfies (4.37).Spv i ,W,since if E = Fx G then, by (4.31),

+() = v(G)1F(), F e Y, (4.38)and gzdbk = g(#)v(G) = l(F) as required. We now show

.4

is a-system.

Clearlyfl x E e

.4,

so 1.25(a) holds. lf Ej,E2 e.4

and F1 c Ez, then, since lsz-sj =

1sz- lsj,

gE,-EL4 =Js1/,()JV(()

-

Js1e'1(t0,()#V(()

='sztt,)l -%(),

(4.39)

an 5lT measurable function by 3.25, and so, by additivity of 'n,

jgjgh-zbdbk = l(F2) - r(F1) = zE2 - F1),. .VL.

t(4.k0)

showing that .Wsatisfies 1.25(b). Finally, If 41 and A2 are disjoint so are (A1)(oand (A2)(o,and ujwxct))= u1((l))+ gy). To establish 1.25(c), let (. e W,j e INJ be a monotone sequence, with Ej

't

F. Define the disjoint collection (Aj)With A1 q= E$ and Aj = Epj - A),j > 1, so that E = U7=3Ajand Aj e Wby (4.39).

ountableadditlvity of v,yx

(','(l

= Xuytl. (4.41)(( : y j

')'t...E..t.-)

)----''

..r.),).).tr.)r-.L.k,.j-L);(b..(.yy')-....qy-'.t.t

-.t.

.y- - - .

. . -.. ;

..q

)..' .(;()t.ttt.q'?.ryLLI.'.ti't(..(yyyEjtjj(j)q.....;t-yjjy:jj(j..y,r(.jk

yyyj,;(qjyyy..jt..j..yy.yjqqj)q;((.......;fj,jL)qq;....!k.yjjjy.;gjjyjjjjjt..yy. yjjjyjjj;gjrj.yy.j.jjjyrr

.gjyj)j..gyjj,-.tj!jj

jj.(gjrj

jjrjjkjj $kj)4jjygj.jtyjjj

.-.-f''E

.E.)7..y))..((r-.)-)t..t.-..(q..)--r;:,)j);.tr-ry -..)-.-.))t:..,.

r)yryt;y.)...-i.r..t.E)).r.q.r..q-().r-(.t(.#t-.-y....-.-!;y..:-,.-.

-..

. t.-.....

...... .

.. . ..

.''li.;.11:...l(!(l.''l.

''

.

1*

.(.

'

..

r'

..... . .

Page 88: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

68 Mathematics

x x g x

JoX'At) tfBtt0l = Xj o'Atttltfp,tl = X5(X/) = 1(13,

j=j j=1 v n j=j(4.42)

where the first equality is by 4.14. This shows that.d

is-system.

Since Sys is asemi-ring it is also a a-system, and 5 @N = c(Sys) c

.4

by 1.27. This completesthe proof for finite measures.

To extend to the c-finite case, 1et tffJ and (Ey) be countable partitions of fand E with finite g-measure and v-measure respectively', then the collectiontff x Ej e Sys ) forms a countable pmition of x E having finite measures,attf x Ej) = g.tfvtWl.For a set E l 5 @V, write Eq = E r7tf x Ej). Then bythe last argument,

ghdbk= nEq), (4.43)

where gzql fk F-> R+ is defined by +,y((l)) = v((F(j)(J,(t) e fk. The sets Eq aredisjoint and 's((l))

= vjLjjl whep (l) e ff, or

gE = X 1f4()X+/). (4.44)i j

The sum on the right need not converge, and in that case g/ = +x. However,F/s-measurability holds by 3.25/3.26, and

jcysdb- JoXjl XyS's Jp,

-XXJO:S/B

- XXn(A) - nk,i j

*' '*'l i j(4.45)

using 4.14 and countable additivity. This completes the proof. .

Now extend from indicator functions to non-hegative functions:

4.21 Tonelli's theorem Let 7: be a product measure with c-finite marginal mea-sures g and v, and 1et

.f:

fl x E F-> R+ be (@(&Vl/f-measurable. Detine functionsf1 E F-> R+ b-V(()= f(,(), and let gk = hfuA. Then

(i) g is T/f-measurable,

(ii)jtstcafdx= JoJsfA tfB. E1

ln part (ii) it is again understood that the two sides of the equation are eitherfinite and equal, or both infinite. Like the other results of this section, thetheorem is symmetric in (f1,@,g)and (E,N,v), and the complementary results givenby interchanging the roles of the marginal spaces do not require a sepapte state-ment. The theorem holds even for measures that are not c-tinite, but thij furthercomplicates the proof.

Proof This is on the lines of 4.6. For a partition (F1,...,Fn J of 5 @)T let f =

Xwlsj,and then f = Xii ltsjl(sand g = Xiqis'lEil by 4.4. g is F/p-measurable

Page 89: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 69

by 3.25, and 4.20 gives

gdb= Xm vedb = Xaine'i)= j'dn,fl i fl j AE

(4.46)

so that the theorem holds for simple functions. For general non-negative f, choose

a monotone sequence of simple functions converging to f as in 3.28, showmeasurability of g in the limit using 3.26, and apply the monotone convergencetheorem. wExtending to general f requires the additional assumption of integrability.

4.22 Fubini's theorem Let a be a product measure with c-finite marginal

measuresg, and v; 1et #: fl x E >-> R be (; @Vl/f-measurable with

(4.47)

define f)1 E >-> R by.406)

=

.f4,();

and let g = Jsffzyfv.Then(i) f is N/f-measurable and integrable for (z) e A f2, with g,tfl - A) = 0',

(ii) g is F/f-measurable, and integrable on A;

(iii)joxsft,clflttt,'l= JoJsTto'cldvt'l dpt*'

Jzxsl.ftt,)'tllllt,ll< =;

Proof Apart from the integrability, 4.19 shows (i)and Tonelli's Theorem shows(ii) and (iii) for the functions f' = maxl,o) and f- = f'b- f, where I

.41

=

f+ + f-. But under (4.47),I.f(),()

I< 'x on a set of a-measure 1. With A defined

as the projection of this set onto , (i), (ii) and (iii) hold for f'b and-,

with both sides of the equation finite in (iii). Since f = f'b- f-,

(i) extendsto f by 3.25, and (ii) and (iii)extend to f by 4.7. .

4.4 The Radon-Nikodym Theorem

Consider c-finite measures g.and v on a measurable space (f2,1).g, is said to beabsolutely continuous with respect to v if v(A) = 0, for E e 5, implies g(A-) = 0.This relationship is written as g <( v. If g tt v and v <( go the measures are said

c

to be equivalent. If there exists a partition (A,A ) of f1, such that g(A) = 0 andv(AC)

= 0, then g and v are said to be mutually singular, written g,.1.

v. Mutualsingularity is symmetric, such that p. . v means the same as v

.1.

g,.The following result defines the Lebesgue decomposition of g,with respect to v.

4.23 Theorem If p, and v are c-tinite measures, there exist measures g,1and p.2such that g = gl + ga, g,l . v, and p.2<t v. n

lf there is a function#: fl 1..->F+

such that g(F) = Js/tfv,it follows fairly directly (chooseE such that v(F) = 0)that g, << v, and f might be thought of as the derivative of one measure with

Page 90: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

70 Mathematics

respect to the other; we could even write f = dbjldv. The result that absolutecontinuity of g, with respect to v implies the existence of such a function is theRadon-Nikodym theorem.

4.24 Radon-Nikodym theorem Let v and g,2be c-finite measures and 1etp,2t<v.There exists a T/p-measupble function fl f F-y 2-+such that g2(F) = fdv for allE e 5. u

f is called the Radon-Nikodym derivative of g, with respect to v. If g is anothersuch function and g,2(A-)= Jsgtfvfor all E e 5, then v(.f #: g) = 0, otherwise atleast one of the sets Ej = ():

.f4)> .j'((l))1 and E1 = (: ()< gl ) must contra-dict the detinition.

Proof of these results requires the concept of a signed measure.

4.25 Definition A signed measure on (f,F) is a set function

z: 5 F-> R-satisfying

(a) :(0) = 0.(b) z(U#.j) = XgAj) for any countable, disjoint collection (Ay e 5 ) .

(c) Either z < x or z > -x.

u

For example, let g, and v be non-negative measures on a space ((, @),with atleast one of them finite. For a non-negative constant r, define

Z(X) = p,(A) - rvtAl (4.48)for any A e 5. For disjoint (Ay),

zj+l- p,(-s)- rv (x/)j=3 y=1 /=1 = Xtg,tll- rvtAjll,i=1

(4.49)

so that countable additivity holds.If A is a F-set with the property that z(#)k 0 for every # e 5 with B c A, A is

called apositive set, a negative set being defined in the complementary manner. Aset that is both positive and negative is called a null set. Be careful to distin-guish between positive (negative,null) sets, and sets of positive measure(negative measure, measure zero). A set

:4 has measure zero if g,(A) = rvtz4l in(4.48), but it is not a null set. By the definition, any subset of a positive setis positive.

The following theorem defines the Hahn decomposition.

4.26 Theorem Letx be a signed measure on a measurable space (f,1), having theproperty x(A) < cxo for a1l4 e 5. There exists a partition of f into a positive setA+ and a negative set A-

Proof Let , = sup x(A), where the supremum is taken over the positive sets of x.Choose a sequence of positive sets (Anl such that lima-ox z(Aa) =o and 1etA+ =

Undn.To show that z4+ is also a positive set, consider any measurable E A+.Iaetting B, = A1 and B- = A- -A--1. n > 1. the seuuence $Bnl is disioint, positive

Page 91: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 71

since Sa c A,i for each n, and UnSn = A+. Likewise, if En = Ecj Bn the sequencefEn ) is disjoint, positive since En c #n, and Unk = E. Hence zlE') = XnzEn)

0, and since E was arbitrary, A+ is shown to be positive. A+-An being thereforepositive,

Z(X+) = Z(X2 + Z(Z+-A k Z(Xu), a1l n, (4.50)and hence z(A+) k 1, implying z(A+) = 1.

Now let A- = f -A+. We show, by contradiction, that A- has no subset E withpositive measure. Suppose there exists E c A- with zE4 > 0. By construction Eand A+ are disjoint. Every subset of A+kJ E is the disjoint union of a subset ofA+ with a subset of F, so if E is a positive set, so is A+t? E. By definition ofX,

, 2 z(z4+b-)E4 = :+z(A), (4.51)which requires x(A>)= 0, so E cannot be a positive set. If F is a subset of F, itis also a subset of A-, and if positive it must have zero measure, by the argumentjust applied to E. The desired contradiction is obtained by showing that if zE) >

0, E must have a subset F which is both positive and has positive measure.The technique is to successively remove subsets of negative measure from E until

what is left has to be a positive set, and then to show that this remainder haspositive measure. Let ?z1 be the smallest integer such that there is a subset F1 cE with z(F1) <

-1/n1,

and detine F1 = E - F1. Then let nl be the smallest integersuch that there exists E1 i F1 with z(F2) <

-1/n2.

In general, for k = 2,3,...,1et nk be the smallest positive integer such that Fk-j has a subset Ek satisfyingZ(f) < -$Ink, and let

Fk = E - U).j=

(4.52)

lf no such set exists for finite nk, let nk = +x and Ek = 0. The sequence (Fk) isnon-increasing. and so must converge to a limit F as k --- x.

We may therefore write E = Fb..l(U7=1A), where the sets on the right-hand side

are mutually disjoint, and hence, by countable additivity,oo Cr

z(f) = :4#') + Xz(A) < zb7 - X llnk. (4.53)k=1 k=1

Since x(F) > 0 it must be the case that x(#)> 0, but since zF < oo by assumpt-ion, it is also the case that Z7=1(1/ak)< x, and hence nk -- oo as k

-->

x. This

means that F contains no subset with negative measure, and is therefore a positiveset having positive measure. .

For any set B e 5, define Z(#)= z(A+ rnB4 and z-B) = -z(A- fa #), such thatzB4 = x+(S) - z-B). It is easy to verify that z+and x- are mutually singular,non-negative measures on (f1,F). x= f - z- is called the Jordan decomposition of

i ned measure. z+and z- are called the upper variation and lower variation ofa s ga,

wxvxa +1-.zkmaooxa...x l ., I -

l,+..u

%,- lo zxollxcxzl tlao tntnl a/mrizyimn nf q; Tho Tnrdnn

Page 92: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

72

decomposition shows that all signed measures can be represented in the form of(4.48). Signed measures therefore introduce no new technical difticulties. We canintegrate with respect to z by taking the difference of the integrals with respect

Vn(j y

-

to x a .

We are now able to prove the Radon-Nikodym theorem. It is actually mostconvenient to derive the Lebesgue decomposition (4.23)in such a way that theRadon-Nikodym theorem emerges as a fairly trivial corollary. It is also easiest tobegin with tinite measures, and then extend the results to the c-tinite case.

Mathematics

4.27 Theorem Finite, non-negative measures v and g have a Lebesgue decompo-sition g = /1,1+ g2 where gl

.1.

v and p,2<<v, and there exists an F-measurablefunction /': f F-+ R+ such that g,2(F) = fdv for a11 E e 5.

6 L t G denote the class of al1 F/f-measurable functions gl f F-> R+ forProof ewhich IEgS S p,('), a11 E e @.6 is not empty since 0 is a member. Let (x =

sup#esf'tfv, so that (x f g,tfM< x. We show there is an element of G at which thesupremum is attained. Either this element exists, and there is nothing further toshow, or it is possible by detinition of a to choose an element gn of G satisfyingtx - 1/a f Igndvf a, for each n G IN.Generate a monotone sequence (/k,n e IN) in Gas follows. Put J1= :1 and define fnby fn = max(A-1((z)),

,w()),

so that fn2

A-l. Define the sets Aa = (:.fn-1((l))

> %()) for n = 2,3,..., and then if fn-le G,

Xtfv= fn-tdv+ gndvF Erun Fcxq

g,(FrnAn) + p.tf f'7Ak) = g,(f) (4.54)

so that fne G. Since fn1 f, it follows by the monotone convergence theorem thatfndv

--4 IEf* S h1,(0, and hence f e G. And since fn gn so that

a - Lln S gndv S fndvS a,

we must conclude that J.ftfv= a, as was required. Now detine g,2by

g,2tD= dv, E C F' (4.55)

Evidently g,2 is a non-negative measure (for countable additivity consider thefunctions . = IV, and use 4.14), and also g,2q v. Define g,1(e')= g,(f) - 41,24f),which is non-negative by construction of f, and also a measure. It remains to showthat p.j

.1.

v.Let (A+n,A-a)be a Hahn decomposition for the measure g.l - v/n, for n =

1,2,3,.... Then for E e T,

g,(Fr7A+) = p.1(Ff-74) + g2(F raA+sln

>-1v(l

ra x+n)+ j y:v = j .f

+

-)

w,n sck.: srn,j

(4.56)

Page 93: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Integration 73

and hence

p.(A')= p,(Ff'3A1) +p.(FrnAa-) 2 p.(Ff7A1) + p.2(Fr-7Aa-)

k J f + 1 tv + fdv = fdv +-)v(F

r-'hA). (4.57)El'x< n J Fzn-a J E

-11 e 6 so thatNote from this inequality that J + n Aj ,

1v(A+n)= a +

-1v(4+n),

(4.58)(x k Jf v + -s

u

implying vAhjlln = 0. This holds for each n G N, so if A = U';=1A, v(A) = 0. Notethat Ac = 0* 1Aa-i An-for every n, and so

g.1(AC) ; v(AC)/l for every n. HenceN=

g,1(Af) = 0, and-so jt1.1.

v. w

It remains to extend this result to the c-finite case.

Proof of 4.23 By c-tiniteness there exists a countable partition f l of , suchthat vtf#) and p,tf#l are finite for each j. If (Ay) is any collection with finitemeasures whose union is , letting f11 = A1 and Lh = Aj -Ay-1 for j > 1 definessuch a partition. If different collections with tinite measures are known for vand p,, say lAjzjl and fAvyl, the collection containing all the Ax rnAvk for j,k eN, is countable and of finite measure with respect to both v and g,, and after re-indexing this collection can generate (f).

Consider the restrictions of g, and v to the measurable spaces (Lh,%4, for j eIN,where %= (F t''hLh, E e 5 J. By countable additivity, g,(F) = XjttA' rn L%lwithsimilar equalities for g,l, g,z, and v; by implication, the two sides are in each

case either finite and equal, or both +x. If v(F rn Lhl = 0 implies g,2(FchLhl = 0for each j, then v(F) = 0 implies g2(F) = 0 for E e 9, and g,2< v. Similarly, letAj, AJ) define partitions of the Lh such that g,1(Aj)= v(A,) = 0', then A = U#.j,

and Ac = Ugdlare disjoint unions, p,1(A)= Qg4(AJ= 0, and v(AC)= Qv(A,) = 0.

Hence g,1 . v. K

The proof of the Radon-Nikodym theorem is now achieved by extending theother conclusion of 4.27 to the c-finite case.

Proof of 4.24 In the countable partition of the last proof, 4.27 implies theexistence of X/f-measurablenon-negative such that

g,(F rnLh) = p,ltf f''h L1jl + g,2(Fr-hLhl

where

P,2(A't''h f#) = Jsyoo*, al1 E e 5. (4.59)Define J: F- R+ by

.fttl)l = X lf/ttl'tt9l. (4.60)-1

This is a function since the Lh are disjoint, and is g/s-measurable since f B)

= Uy/'y-1(#) = LtEj e 5 where Ej e 5j, for each # e S. Apply 4.14 to give

Page 94: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

74 Mathematics

:2(A3 = Xp,ztFr7 Lhl = X Jsljdvj j

- JsX 1tzjfldv. - Jsfdv . * (4.61)

Consider the case where g, is absolutely continuous with respect to another

measure v. If the Lebesgue decomposition with respect to v is g, = g,l+ gq,v(A) =

0 implies g,(A) = 0 which in turn implies g,l(A) = 0. But since g,l . v,g,1(AC)

= 0too, Thus, g,1(f2)= 0 and g,= ga. The absolute continuity of a measure implies theexistence of a Radon-Nikodym derivative f as an equivalent representation of themeasure, given v, in the sense that g,(F) = Js/'tfvfOr any E e @.An importantapplication of these results is to measures on the line.

4.28 Example Let v in the last result be Lebesgue measure, m, and 1et g, be anyother measure on the line. Clearly, g,1 . m requires that g,1(A'')= 0 except when Eis of Lebesgue measure 0. On the other hand, absolute continuity of g,2withrespect to m implies that any set of Ezero length', any countable collection ofisolated points for example, must have zero measure under g2. If g, is absolutelycontinuous with respect to m, we may write the integral of a measurable function gas

+x p+xj '(AVP'@)= 1 gxtfxtdx' (4.62)-X - -X

so that all integrals reduce to Lebesgue integrals. Here, f is known as thedensity functionof the measure g, and is an equivalent representation of g,, withthe relation

g,(f) = jefxtdx (4.63)

(the Lebesgue integral of f over E4 holding for each E e f. n

Page 95: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

5Metric Spaces

5. 1 Distances and Metrics

Central to the properties of R studied in Chapter 2 was the concept of distance.For any real numbers x and y, the Euclidean distance between them is the numberJstx,y) = Ix- y I e R+. Generalizing this idea, a set (otherwisearbitrary) having

a distance measure, or metric, defined for each pair of elements is called ametric space. Let S denote such a set.

5.1 Dqfinition A metric is a mapping J: S xs F-> R+ having the properties(a) dy.x) = dx,y),(b) dx,yt = 0 iff if .'t7 = y,(c) dx,yt + dy, 2 dx, (triangleinequality).

A metl'ic space (S,#) is a set S paired with metric #, such that conditions (a)-(c)hold for each pair of elements of S. u

If 5.1(a) and (c)hold, and dxtx) = 0, but dx,y) = 0 is possible when x # y, wewould call d a pseudo-metric. A fundamental fact is that if (A,#) is a metric

space and # c A, (#,#) is also a metric space. If (E1is the set of rationalnumbers, Q c R and (Q,#s) is a metric space', another example is (g0,11,#s).

While the Euclidean metric on R is the familiar case, and the proof that dzsatisfies 5.1(a)-(c) is elementary, dz is not the only possible metric on R .

5.2 Example For x,y E R 1et

(5.1)

It is immediate that 5.1(a) :nd (b) hold. To show (c), note that Ix -y l =

doxtyjlk - dx,yj). The inequality tz/(1 - a) + /7/(1 - b4 2 c/(1 - c) simplifiesto a + b 2 c + abl - c). We obtain 5.1(c) on putting a = doxty), b = doy,z),c = 4)@,z),and using the fact that 0 f do S 1. Unlike the Euclidean metric, Jois defined for .x

or y = ix. (V,4) is a metric spae on the definition, while R-with the Euclidean metric is not. n

2 1 riety of metrics is found.lh the space R a arger va

5.3 Example The Euclidean distance on R2 is

Ix -

y I4@,y) =

! + jx ..y j.

defx-yt= IIx-yII= E(m-y1)2

+(.n-y,)211'2

.'

d (q2 ) is the Eglidean plane.p,lp, , r 4.

' ' . ' ' '

(5.2)

An alternative is the Etaxicab' metric,

Page 96: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

76 Mathematics

dvx,y) = Ixl-

yl I+ 1A'2 -

y2 1. (5.3)dz is the shortest distance between two addresses in Manhattan as the crow flies,but dv is the shortest distance by taxi (seeFig. 5. 1). The reader will note thatdv and Js are actually the cases for p = 1 and p = 2 of a sequence of metrics on

2 He/she is invited to supply the definition for the case p = 3, and so for anyR .

p. The limiting case as p-..-h

cxo is the maximum metric,

JA,g,y) = maxl 1.x1- yl I, 1x2-y2l

1. (5.4)Al1 these distance measures can be shown to satisfy 5.1(a)-(c). Letting R'T

=

R x ERx ... x R for any finite n, they can be generalized in the obvious fasltionsto define metric spaces (R&,Js), (RR,#r), (R'l,#x) and so forth. n

t/yg(-,:,)2) ------------ t2-(-);,)7)

Fig. 5.1

Metrics dk and dz on a space S are said to be equivalent if, for each -v

e S and :

> 0, there is a > 0 such that

dblx,yq< 8 = dlx,yq < E

dzlxty) < 8 = [email protected]')< E

for each y e S. The idea here is that the two metrics confer essentially tlte sameproperties on the space, apart from a possible relabelling of points and axes. Ametric that is a continuous, increasing function of another metric is equivalent

to it; thus, if d is any metric on S, it is equivalent to the bounded metric2 odll + d4. Js and do of 5.2 are equivalent in R, as are are Js and du in R . n

the other hand, consider for any S the discrete metric Jo, where for x,y e S,dnxby) = 0 if .x

= y, and 1 otherwise. do is a metric, but Jo and dE are notequivalent in R .

ln metric space theory, the properties of R outlined in j2.1 are revealed as aspecial case. Many definitions are the same, word for word, although otherconcepts are novel. In a metric space (S,#) the concept of an open neighbourhoodin R generalizes to the sphere or ball, a set Sdxn = (y: y e S, J(x,y) < :) ,

where x e S and : > 0. We write simply Sx,z) when the context makes clear which

Page 97: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 77

2d ) S@ :) is a circle with centre at x andmetric is being adopted. ln (R , E , ,

2 it is atdiamond' (rotatedsquare) centred on x with e theradius E. In (R ,#w)

2d ) it is a regular square centred on x,distance from x to the vertices. In (R , u3d 11 think about it!with sides of 2e. For (R , s) ... we ,

An open set of (S,#) is a set A S such that, for each x e A, (; > 0 such that5'@,) is a subset of 4. If metrics dj and dz are equivalent, a set is open in(S,#1) iff it is open in (S,J2). The theory of open sets of R generalizesstraightforwardly. For example, the Borel field of S is a well-defined notion, thesmallest c-field containing the open sets of (S,J). Here is the general version of2.4.

5.4 Theorem(i) If C is any collection of open sets of (S,#), then

c = UAAe N

is open.(ii) lf A and B are open in (S,#), then A ch# is open.

Proof (i) If Sx, c A and A e C, then S.z,z) c C. Since such a ball exists bydefinition for a11 x E A, a11 A e C, it follows that one exists for all x e C.(ii) If 5'(x,/) and S(x,k) are two spheres centred on x, then

5'@,6k)f''h 5'@,Es) = 5'@,E), (5.8)where e = minlerx,Es) . If x e A, H5'(x,/) c A with Ex > 0, and if .x

e #, (5S(x,ze4

c B similarly, with E > 0. If x e A chB, 5'@,E) I A rn B, with 6: > 0. x

The important thing to bear in mind is that openness is not preserved underarbitrary

A closuresuch that fordenoted A, is called the closure of A. Closure points are also called adherentpoints, sticking to' a set though nt necessarily belonging to it. If for some 8> 0 the definition of a closure point is satisfied only for y = x, so that&x,) f'-AA= (xJ, x is said to be an isolated point of A.

A boundary point of A is a point x e X,such that for all > 0 H z e Ac withJ(x,z) < . The set of boundary points of A is denoted :A, and W= A Q.J:4. Theinterior of A is A0 = A - PA. A losed set is one containing al1 its closureoints, such that X = A. An Qpen st dyjes not contain a1l of its closure points,P

since the boundary points do not belng to the set. The empty set 3 and the spaceS are both open and closed. A Subset pf A is said to be dense in A if B c A c X.

. . ...:.;. .

.

' ' . ;

A collection of sets T' is calld a. bvering for A if 4 Usssf. If each B is' . .

.,>'

open, it is called an open covtink. A jt A is called compact if every openkqtog. x is said to be relatively compact ifcovering of A contains a tinite S

x is compact. If s is itsel? cott,.p, 'A,t,tk'.$,j,'..,k4)is said to be a compact space. The,.t

,1.)

ttk.)) rk.,itl '

.

in j2. 1 about compaltlktlyj, y))(y,y;(jty)):Vyty,yjkj4grjg) eqqally relevant to the general case.remarks jkyyjytyyyjjyyyy,yjjy .yj < x, sucj.jtjaatx s sxtrj;and alsotjyodjj jyyyyyktty.jtyz,y,g,jyjyyyyyjxyyyjyjyy,,yyyyyyytyy;yjj..,.4!k.ii.1b;1b;Citii.4::11.1l:4(E:1.11::).4k220d!;pdr:;li154!r

.....r

. .F ..--rr.t).:y...;....,..

intersections.point of a set A is a point x e Sall 6 > 0 ('!y e A with #(x,y) <

(not necessarily belonging to A)

.The set of closure points of A,

Page 98: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

78

totally bounded (or precompact) if for every e: > O there exists a finitecollection of points x1,...,xm (calledan E-net) such that the spheres Sxz), i

= 1,...,- form a covering for z4. The Sxi, can be replaced in this definition bytheir closures Sxinz),noting that Sxi,z) is contained in Sxi, : + ) for all 4'5

> 0. The points of the E-net need not be elements of z4. An attractive mental image2 ith little cocktail umbrellas of radius : (Fig. 5.2).is a region of R covered w

Any set that is totally bounded is also bounded. In certain cases such as ?n,dE)

the converse is also true, but this is not true in general.

Mathematics

. k. .. . . ... . ..

.. . . . . . . . ' '

. . i .. . . . .

....

. . k....

... '.. . .. ' .'. '

..

. .k . . . . .

,. . . . . . . . . . j

. .. . . y . . . .

4 . . .. . . . . .

..

.

.. . .

. . . . .. . . . . ....

.. . . ...'

.

'.'k.

.

.. . . . .

.'.

. 4 ....'.

.i;.. . ... . ... kj'..'

Fig. 5.2

5.5 Theorem lf a set is relatively compact, it is totally bounded.

Proof Let A be relatively compact, and eonsider the covering of X'consisting ofthe e-balls Sxnz) for a1l x e X'.By the detinition this contains a finite sub-cover Sxitz), i = 1,...,-, which also covers A. Then (x1,...,au)is an e-net ford, and the theorem follows since e: is arbitrary. K

The converse is true only when the space is complete', see 5.13.

5.2 Separability and Completeness

In thinking about metric spaces, it is sometimes helpful to visualize the analogueproblem for R, or at most for R'J with n f 3, and use one's intuitive knowledge ofthose cases. But this trick can be misleading if the space in question is tooalien to geometrical intuition.

A metric space is said to be separable if it contains a countable, dense subset.Separability is one of the properties that might be considered to characterize an4R-like' space. The rational numbers (I) are countable and dense in R, so R isseparable, as is R&.An aliernative definition of a separable metric space is ametric spaee for which the Lindelf property holds (see2-7). This result can begiven in the following fonn.

5.6 Theorem In a metric space S the following three properties are equivalent:(a) S is separable.(b) Every open set A S has the representation

Page 99: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 79

4 = UBi, Bi e 'P,izu1

where V is a countable collection of open spheres in %.(c) Every open cover of a set in S has a countable subcover. I:a

A collection V with property (b) is called a base of S, so that separability isequated in this theorem with the existence of a countable base for the space. Intopology this property is called second-countabilit.y (see j6.2). (c) is theLindelf propely.

Proof We first show that (a) implies (b).Let V be the countable collection ofspheres (5'(1,r):a7e D, r e t1+), where D is a countable, dense subset of S, and Q+is the set of positive rationals. If A is an open subset of S, then for each x eA, 3 > 0 such that 5'(x,) g A. For any such x, choose xi e D such that dxie <

/2 (possiblesince D is dense) and then choose rational ri to satisfy dxip <

ri < &2. Define Bi = Sxitr e V, and observe that

Bi I S(A7,8) i X. (5.10)Since V as a whole is countable, the subcollection fBiI of al1 the sets thatsatisfy this condition for at least one x e A is also countable, and clearly A cUff I X, SO X = U#f.

Next we show that (b) implies (c). Since V is countable we may index itselements as (Jj, j E EN). If C is any collection of open sets covering A, cboose asubcollection (Cp j e LNJ, where Cj is a set from C which contains 5. if suchexists, otherwise let Cj = 0. There exists a covering of :4 by F-sets, as justshown, and each 5. can itself be covered by other elements of V with smallerradii, so that by taking small enough spheres we may always Gnd an element of Cto contain them. Thus 4 c UjG,and the Lindelf property holds.

Finally, to show that (c) implies (a),consider the open cover of S by the sets(5'(x,1/n), x e S l . If there exists for each n a countable subcover (Sxnktllnj, ke ENJ, for ach k there must be one br more indices k' such that dxnkpxnk')< lln.Since this must be true for every n, the countable set fak, k e (N, n e EN) must bedense in S. This completes the ptoof. .

The theorem has a useful corollary,

57 Corollary A totally bounded space is separable. n@.

Another important property is that gbspaces of separable spaces are separable.which we show as follows.

y

s R Tuoorem If 4:.* is a sp' abll,!.'p'

ae and A c S, then A,d) is separable... .'

. .'. ;l: 5:.' l ' :

. .. ')' . ..' : g

. .'g

...

. .

y. y; ....y.((

jyq.ty;..j,.);.j.;Lj.yk)qy. . ) .;y.. ... .

. , .

Ivoof supposeD is countbl a, d'tll(td.)a,,#..z'inl.s.

Constnlct the countable set E by. ....y

. .(yyy ...E:

k')'

(E

j'

...''..E.'

)'jt.

ly.

'''jdlc'tg)'k(j)jt)':.

qp

.'k(t(jjj.'.

;jt,..

t;(.','

jE..

jt'ty'k).rjd

.)..jytgr

.'

gy.y:.q?.

5*

E..(.q: .

:

yg..;..j

jy'.

.E . .

takingone point from each$.',tj,.A,,F tnu.ptllko'tyq)

qyjtliltjt,tl'.r

E.E;.

''

..

:'E'.7*

.

C'

r.

'

..i!..$2.(E'q...q

'

.y

)'

'

j;'16'.

((

i'.

Eqi'y.(E.

)'r)'k(!@''11(F*))7(j)1))*-

.

-7*

5i'-!tt)@'.tjjj:

.

illj.Fklfll'r!id

l

t'j()$.tj!y'jkd

rkjjy-y..y)..

,.t,'jj.'lq.'h'

iyl!q

!l-i@;yy'.?'

.

'

jy

('

..::.''. E

)'

E.

j'

.-

('

..i

.-

j'td

.

kE''.

.

.

yy''

'.E.. .

.' .. '

.

.

'

.' r...'.

)'

.

..

)'

yj'

.

-'

E.

.

yj'('j'

(.

;';'

-

('

.;

;)'

;,;-.

(E(-..(j.

..t-d

rj-

jy.yjy-'tyydj')jjj'Lqjj.j;Lj))(L:'....-tj).'.'j(j('(jy';;)(;(jjjr-:';E'

..-'

k:5j';;'jll.'

jjtk(,....

::.tky,(y)jjj4jj44.2*

..

(jd

, jq

j'.'.'

q.rljljjrjtrj'jk.

E

jjj')j(jk)'.yjyyjj...d(y'r;.

(Ej.j(

tl'tjjgjyd

jjijjjjjyryjgggggg,

y'

.,g

-:

#y'

rk.,yjjE

j'

jy

y.'r'

y..y.y

yy,

jjryjjgj;rrjjj'.

.

..

:'.

g.

gy....;,-j,bj,.,....

.. y..

.r:,ggjj4;4*

jjjjj.s

jjjjj..d

tjggjjj.--..!1k.... 1,:-::,..-,k..-:;..,-:3626,3.-.;

.-...633333:,31,'..

...);..ryj-rkji-..,.-bj',::,;;;LL?3b.,L,lttjj?,tyjj-yyy.trr,pjrrj--t:.

...y'ptj,;,.-p!h!!!qqj--..,.

. ;f;;,,,;j:;-f);;,-'LL?33'-.,-j;)--Ljj;::-),'b--jj.)L,,t,;b'f?;));j-..jjyy...-k,yt.yytyyy-y,......--...-

.-.....-.-,...yjj.....-....j....-,,.......

.....4::).

,-

.... ....-....#.,..-.;(;;yyit..ykrj.-.yy-j-kjtjrqy-yj.ytlyy.jitr..ijijLjlijjjiiljl-.

,!j;pjj42)$g,-.,,y,..(y.kpjr)(.y(y(,yyjjyjjyj;jy.-..,...-y.tyjky,

)r,y,-.yyjyjjtjj)y.,,..,jjt;-...

,....y ,;.j,:-.,,j;;)jyty,r.t....y-

.. .(.

. - .. . . .. .., r.

'

..:......'.....(....

;'

.

.'

:!jr.:.:

. .

,'

.' -.

,'-'

.

.,'

.' ..' ;k..'..

. ..'.'

..'-',.'

,.. ,.

--'

.

y'

),.

...'

Page 100: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

80

For any x e A and > 0, we may choose y E D such that J(x,y) < 6/2. For everysuch y, R z e E satisfying z e A fn S@, for r < 6/2, so that d@,z) < &2. Thus

dxa) f dx,y) + J(.y,z) < 8, (5.12)and since x and 8 are arbitrary it follows that E is dense in A. w

Mathematics

This argument does not rule out the possibility that A and D are disjoint. Theseparability of the irrational numbers, R - Q, is a case in point.

On the other hand, certain conditions are incompatible with separability. Asubset A of a metric space (S,J) is discrete if for each x e A, R 6 > 0 such that(5'(x,6)- (xJ)DA is empty. In other words, each element is an isolated point.The integers ; are a discrete set of (R,#s), for example. If S is itself discrete,the discrete metric do is equivalent to J.

5.9 Theorem If a metric space contains an uncountable discrete subset, it is notseparable.

Proof This is immediate from 5.6. Let A be discrete, and consider the open setUxez5'@,exl,where Ex is chosen small enough that the specified spheres folnn adisjoint collection. This is an open cover of A, and if A is uncountable it has nocountable subcover. .

The separability question arises when we come to define measures on metric spaces(see Chapter 26). Unless a space is separable, we cannot be sure that a1l of itsBorel sets are measurable. The space Dgtz,,jdiscussed below (5.27)is an importantexample of this difficulty.

The concepts of sequence, limit, subsequence, and cluster point all extend fromR to general metric spaces. A sequence (x,,)of points in (S,J) is said to convergeto a limit x if for al1 e > 0 there exists Ne k 1 such that

dxng) < : for a1l n > Ne. (5.13)Theorems 2.12 and 2.13 extend in an obvious way, as follows.

5.10 Theorem Every sequence on a compact subset of S has one or more clusterpoints. n

5.11 Theorem If a sequence on a compact subset of S has a unique cluster point,then it converges. Ia

The notion of a Cauchy sequence also remains fundamental. A sequence (xnlofpoints in a metric space (S,tf) is a Cauchy sequence if for a11 e,> 0, 3 Nz suchthat dxnbxmj < e,whenever n > Nz and m > Nv.The novelty is that Cauchy sequencesin a metric space do not always possess limits. It is possible that the point onwhich the sequence is converging lies outside the space. Consider the space(Q,#s). The sequence fxn), where xn = 1 + 1/2 + 1/6 + ... + 1/n! e Q, is a Cauchy

sequence since Iak+1-xn l = 1/4n+ 1)! -- 0', but of course, xn-->

e (thebase ofthe natural logarithms), an irrational number. A metric space (S,#) is said to becomplete if it contains the limits of all Cauchy sequences defined on it. (?,dE)

Page 101: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 81

is a complete space, while Q,dE4 is not.Although compactness is aprimitive notion whichdoes notrequirethe concept of

a Cauchy sequence, we can nevertheless detine it, following the idea in 2.12, interms of the properties of sequences. This is often convenient from a practicalpoint of view.

5.13 Theorem The following statements about a metric space (S,(f) are equivalent:(a) S is compact.(b) Every sequence in S has a cluster point in S.(c) S is totally bounded and complete. Ia

Notice the distinction between completeness and compactness. In a complete spaceall Cauchy sequences converge, which says nothing about the behaviour of non-Cauchy sequences. But in a compact space, which is also totally bounded, allsequences contain Cauchy subsequences which converge in the space.

Proof We show in turn that (a) implies (b),(b) implies (c),and (c) implies (a).Suppose S is compact. Let fxn,n e IN) be a sequence in S, and detine a

decreasing sequence of subsets of S by Bn = fxk:k k n,J. The sets Tnare closed,and the cluster points of the sequence, if any, compose the set C = O';=1X,;=

(U=IICIC. If C = 3, S =U';=I#,C,,

so that the open sets T'ncare a cover for S, andn=

by assumption these contain a t'initesubcover. This means that, for some m < oo, SU'''ol) =

(O'2=1))C= LKLc. This leads to the contradiction Tm= 0, so that C mustS n

be nonempty. Hence, (a) implies (b).Now suppose that every sequence has a cluster point in S. Considering the case

of Cauchy sequences, it is clear that the space is complete; it remains to showthat it is totally bounded. Suppose not: then there must exist an : > 0 for which

no e-net exists', in other words, no tinite n and points (x1,...,AkJsuch thatdxjtx S : for all j :g k. But letting n

-->

x in this case, we have found asequence with no cluster point, which is again a contradiction. Hence, (b)implies(C).

Finally, 1et C be an arbitrary open covr of S. We assume that C contains nofinite subcover of S, and obtain a contradiction. Since S is totally bounded itmust possss for each n k 1 a tinite cover of the fonu

(5.14)B = sxni, 1/2'')i = 1...,kn.

ni ,

Fixing ?z, choose an 1 for whlch Bni hs' orfinite cover by C-sets (at least onesuch exists by hypothesis) and all thls st Dn. For n > 1, jlBni))=n1 is also a

w

jw p ap wg ggyj gjyjpjjj Xy jl btjjt X & X jja.snjj fojyjj*tg Sujpcgyeyy FCOVCf1ng n-1 , r,(,.)!y;yy;jy,Ey

h n-1.

.

t y yjyyjjk :, zjojjsea sequence of points (x e DC-sets, and accordlngly IS nonemp yi ,..

yr , jy y n n,.E y.kj..

.. L;..

..f.kj)

.g.(..)t;((.(-(j,jj(.y,j;..)yy.y(t ..y..;..- ..

.,2:jj,g.y.....

-.. ,-

s ) since Dn 1s a ball Of radm#,1

lzt'i')>t7)Eitu

tontalns ak, and Ds+1 1s of radi usgj . . . .

... y, .,......,,.j.)

. .,...

...,,q..yyy-

yy,,.tyjjj;,,,,gy;jj-j.yr;yy,,..syyyy ..ksysj-sssssyyyssy-,j.yr

y.y....jjygjyyjyggjjjjygj..$j44,,(j.jygyrgjsgjygjsjkgyyjtyjjyyj.gj.ikyj.gjsgj.;yygyjj,,gjggjygjyyyggjjgjj(j.yjsjkgyyjjyy;yyjjygjsyyyjy;jylln- and contalns Ak+1 , dxngxq-j).iJ',','t.)y;;r,))',tyyyiy),.,),y!(jjj,y)g.,,yiityy;j,t,y.,,yy,,,jyjj,

y. .

4::;1r-4:/xn,,.iltrr;,:r

..j.,

j,;.jr()-'::::g (1!k(:ljillz-'''''j= 4:24:::22.

'''' 0'1: '' <- 33131'13L33..''.'''''12.12

!1..2,'...q.!.E1t!12$!E;k....)

, .''..'',-11911...'' .'r'y)'-'.

;yy.yy.',..j';;jjc''''j;.)',.*.,.,,;.;,)k,q..qL,.j'q,)j-,Ls))jj)q,;t)).L'j,.jy)(yj.(.y?1L2IL,,y...'.......1812.....,....,,'.. (',.2...

.bif'

... (1-..(12151,IIEIL4(:E((:Cilk11.;1.4::::llrlu(li??'!q;4:2::1;2(j.ql.;l.4k2::qlr:l.4::::4k1!:: llrl.4:2j1L.

.. q..yg...j-:.E..... .'.j.

.(...j!;r';-r:).,..-jj)....-;p7.(rqstgjt.r.() .

-.;(

j;yj.jjqr,y..:......., r.....

-.. .. ....

.

...

.

... ..1

d,

tjjjjjg;i.jgyy;j,' ...

.jj;rtj;....(..,..jjkjyjjj..yjjjyyjtjqjjj....

tjjjgyjjyjt:,444,.4)F?jjjgjjj:...jjjtjjy,)'gjjjjyrjy...., . gjjgyjj..). jjkgjj;,jjjj;k.J)r.$jjjj;..jjyjj,jjjr .(. .g.,

.. ... ,t. ...

.....jtgz:zjj:rr;jjjjr.yjj.!jjj;,.

4j222::jjry..(jjjlj;.4j222:):jjji qjjy44yr2),iljjlj.(jj,(jj.(jgrrjlrr)j.(j,.j(r,

.j;,k:rr 4E!EE1EF!j, L?j6l''...j .. y,y

.q.yt.jyy.g;....j,yj.)j)j.j..,)j.)):j.....).f..j).jj;jyv)y);)y.gyyj.tjjyjjjy...

yyjl.kyj.j.jyjyjyyj,(jy;j,t...jrj.),)j)j),f)4.

yjyjy;.;yjjry,;;jy..j(..

jy,ty,j.y(ry.y..j....,...j.jgy,...;.g..g........y.y.rysg..;,..y......

y.gy.jy.j.,j.jjffjjjf,..j.........

...jjgjj.

.,tyyj;yy...yjjjy;yjy;jjsggggggggy.yyjjyyjy.

yjjyyyyygyyyyyyjyy,ygjjjjyygyyyyyyyjyyyyy.yyyajygygyyyjyjyyyyyyy;;yyyjyyjjyyy,y

;,d, d...... .E;.....

:()i):y:.jjjjjj,jj),,,jr.:g;jjjr.:jjj..jjg

., . jjryrrjyjtrjjjjjjya.y.y

yg:jjjjjjjjjjy..(jjy,,4.. .yjj,(jj(jy..ygyjjjy2jjj;;;.t.:yjjjj.jjtgjjjj..,yyjjyygjjg

..,..

.jjjgjrj...

gjjjtjjy...jjjj.j,.,jjjjr!jj.jtg ljt:jjjjz''jtkkgry44yr2),jjr.:ljs.jrr,

Cgjjjk.jjr.(jr.qlk,jjlsjjlrtjt.j!jjjjj)..''L'.'..'.,Ljj;,6L)?).......;..ypj.'q).k(jyj.,j:;II,..:.g.yyy..

jpkzjlk.gyyy..))j))jjj.,b6.;bb,.....,.))jjjjjj):...k

y'

),j)jj,....

.,yjy,4Et .rqrrty.r. .

jr;,,.q.;u,..,..,,,,r,,....4,j..

jyjjjjjyyjjjy'

jjjjrjjyyy...,r,...r,r,,,(jyj;,..,....)))))jj,jj.j:)jj(j,;..(qjj..yy.rjj,y.y.;yyjyyyyjjg.jy,y,yj.y.k;.,jj.yjyy;yyy..,,.y,.yg...,...y

,,..).:,.j

..

jjgyjjjjyyjtjjyggj.jjyyg;jjjjygg;jrjjjijjggygiggjjjjsrjjjijjggygi,jjy,

, . (gy .yyysy..jjg jgy.jy.yy..g

..,.fj),)),)jj)j)),,

. tjgj;yyy...yjyyyjyy.yyyyj;yyj..yjya .yy,yyy..yyj,,

..,...yyyyyj;..;.

yjjjjjjjjyyjj.g.

jy.,.;.....yyjgyyyyy..

y.

yy,,yjjjyyyj,...g.yy.....yy.,yyyy;yyy..44444444,.yyy.

.yyyyyjy,....yjyyyy. a.aayyayay..

yyjyyyyy,yjyyyyyay...

.

yjgyjyj...;..y,..,.;yj..,y.,yyyy;yyj...gjjjyy.yyj.ayyyjy.yygyyjy,kyyyya

yjgy;yyjjjyyyjsjygyyyyyjygyyyyygjyjyyyyayyyyy.yyjay;jjjjyy yyjjyyyj,y:g...s..t. t....lrqE.-;Ljij..-(... ty)-t)-tyyt yk;yygyj-.. . jt.

yjjjgj.y,. ..

.(!ii!i;;(11.--(Irrlk..(::::.4122,,:jlqq-'s:q:,hglrr-'::E51k.21:.11.(:!i//-';,,-:1!.ktrriiir'dt:l'.k;,d::r

'-,.,,

.:;,,::7(22,-

'-::::::' ,t211;-,///-1::22!:.,,-

.

.')'i1l1)..'i.--;.-)1l::1t-'y--y'yt,t)jjy-.,,t:tiIll.'-ty,-yj..-T!rt)j-ji..-t-;t4i!)7-.ygjjtt'

-

LL;'-3bL',--

--i)...(;!-yy.;.,-....!...'.

....-..-L.,3,,q;qj,--,?Lki)Lj3,,.

-pk@$1r....-,...

.y@;,--,...p-.--$t'-.r,.-,digj,r-,i,-.--;jj-t.yy.jjjj--k:-jgtyyjjkqt..

.(-.:t;))..)j;,-..j:)j-,y)y,yy-...t.......--..-.-..-j--.,-y.y,--)--).j.y,--..,---.....,.y-.......t

-... . !,-

.EEEE;)(;jy

,qjy)j)y(.

yr... .

--jjy;;-

-. . -.

'' ' .';'E

-zi

t.ylii.3$:1;?L66'6$3'?.'.ry;)rl)!?t.E)#t!'t(.p@!-jjjtky1'!i.7ttl;)'i--;jq,,,pyjj-,,...'..

,;,'.-);,...-.. jl,.

1t;;$(yr';)))(;...---.j$)yyk?Ey-..,it).

. ,

'-'.

.

'. .'.-(jj(-,-.'.ti;;jit)()(..;,..-..

.tlih'tjjrt.;2).-:.lqt.1r):.

b3-,Lt;j,,1'3$;t311L;;,;(;L)3$L;--77?:?,-'),1lpl-t''t,-;-)@--t..i;'.).j))..E.'.:.'.y..E.E......:.......

. ...- .

...:-...'.-..?.-..,

;.hF(.f...k:,Ji..;..:..... ,-.....

. . ;-... ,. -

.. . . .

Page 102: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

82 Mathematics

< 9Iln ensures that Dn c Sxn. But this means Dn c A, which is a contradictionsince Dn has no finite cover by C-sets, Hence C contains a finite subcover, and(c) implies (a). K

In complete spaces, the set properties of relative compactness and precompact-ness are identical. The following is the converse of 5.5.

5.13 Corollary In a complete metric space, a totally bounded set A is relatively

compact.Proof lf S is complete, every Cauchy sequence in A has a limit in S, and a1l suchpoints are closure points of A. The subspace (A,#)is therefore a complete space.It follows from 5.12 that if A is totally bounded, X is compact. .

5.3 ExamplesThe following cases are somewhat more remote from ordinary geometric intuitionthan the ones we looked at above.

5.14 Example In j12.3 and subsequently we shall encounter R=, that is, inhnite-dipensional Euclidean space. lf x = (x1,.n,...)e R=, and y = @1,y2,...)e R=similarly, a metric for R* is given by

X

d-x-y)= >72-Q()(x,,y,),al

(5.15)

where do is detined in (5.1).Like do, t/x is a bounded metric with #xtx,yl f 1 forall x and y. n

5.15 Theorem (RX,Jx) is separable and complete.

Proof To show separability, consider the collection

4 = (x= (x1,x2,...):xk rational if k S m, xk = 0 otherwise )R=

.

(5.16)

Am is countable, and by 1.5 the collection A = (A,u, m = 1,2,... J is also count-able. For any y e

R''O

and e:> 0, H x e Am such thatm co

d-x,y)S y2-$ + 5-' 2-4(0,yk) S e+ 2--.-al

-t=

m+ 1(5.17)

Since the right-hand side can be made as small as desired by choice of e:and m, yis a closure point of z4. Hence, A is dense in R=.

To show completeness, suppose (xn= @1n,x2n,...),n e N) is a Cauchy sequence' Since dtxknvxkm)S 2J @ xm4 for any k, fxa, n e EN) must be a Cauchyin R . x n,

sequence in R. Since

d-xx) / 57l-kdxkakn) + 2-'n-t=1

(5.18)

Page 103: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 83

for all m, we can say that xn-->

x = @1,x2,...)e R= iff xkn--)

xk for each k =

1,2,...., the completeness of R implies that lxn) has a limit in R=. w

5.16 Example Consider the infinite-dimensional cube', g0,1J=;the Cartesiansproduct of an infinite collection of unit intervals. The space (g0,11=,#x) isseparable by 5.8. We can also endow (0,11=with the equivalent and in this casebounded metric,

X

(x,y)= 772-11x:-ytl

. uPxa1

(5.19)

In a metric space (S,*, where Jcan be assumed bounded without loss of general-ity, define the distance between a point x e S and a subset A ? S as dxl) =

infysxJtx,y). Then for a pair of subsets h,# of (S,#) define the functions zs j..o R+du 2 x ,

here 2S is the power set of S, byW

du(A,B4 = max sup d@A), sup d(y,B) .

xB yGA

du(A,B) is called the Hausdorff distance between sets 4 and #.

(5.20)

5.17 Theorem Letting J'fs denote the compact, nonempty subsets of S, ls,du)is a metric space.

Proof Clearly du satisfies 5.1(a). It satisfies 5.1(b) since the sets of Rs areclosed, although note that duA, W)= 0, so that du is only a pseudo-metric forgeneral subsets of S. To show 5.1(c), for any x E 4 and any z E C we have, bydefinition of dx.B) and the fact that d is a metric,

sup dx,B) K supttftmz) + dz,B) ) . (5.21)xcA Ac#

Since C is compact, the infimum oyer C of the expression in braces on the right-hand side above is attained at a point z e C. We can therefore write

supltms)< snp tnf(4(x,z)+ dz,B))A xed ze.xG

sup (;:,6'D.#

supltz,l).x6 z<C

(5.22)

'milarly,supyssltxz)-<

supzs (z )+ supyss: , , and hence,sl... . .

..??

.. .. : ,

. . .

. . . . . ''..';

2..' .:;.

''..

..','.... .j .k..'

. ..

'

.

.. E.

drr;ir----.11..

,-

.n

(2),-<

:l::)Er!t.ax sll.,l.(14:,.dr:s(x,,

...1..-.

.. s..,...12...!147,.....,-...,.'...g...,.....-,..i(:).-,..-..:!:............ 'i.ii!Eil'.....,......

......(2,......

..,..k..,..,..-.s.

...,1:11....(14;,..tii''fL;.:,,,.-,.11..(;,..qI...sll.,l.jll:),

dr:;ir-,,

. . . ' ..i : : :. : .i.E ' . . . .

'.' ' ' . .

E c: )y: ) j j) Ej ) E: yE t: C.;)t:rdEE...441k.

.-

.1:7

.:.E.;

.q.

.1,.!; ,)L.q/;j-i...L,j;...3L;.....L...

.ttlLi:)...E.....;..-..1q-s? q

..'q''... ..... .. . . (:3?'dE!!!E.i!EiI?

. . .... ' .E..E:(:;ifr: LE.

. ......(

()t,.g..y(..!. . . .: . q . E. .. .

..... . E. . ..

..

..:..jj..y!.)y.g;...::.).j....;)..

)5.jk;;LL;.1Lj.

. Ljf,.y:.Ey.tjyy.tpyit;kr

:Lj.LL..:....j.L..LL..L......L...;..sy

.g. .: . .

..j . ...... . ..

:. ...

...)

. .. .j.y..(.k!:..(#k:gk....jtt....4:j..qL.,.,.,..;yyj: :(.ygjq

tjkj;j..tjtrL).;):,.;.)..).;.:L.....j;.:......

. .

. .. .. ..(.... (.. ,. .... .: . :....:(.y.)(.)j.;j.jjyt.yjt....jg

..

i.g.(.

.E.. j ,(ygyr:. jr.)..

j..(.

y: ....,y...

. y..:

.

.

..

..

..... .;....L..

.. ;.

.kj.. . , )y(jjy,..t;.yy:j)jj;,yq.yt;jg( j)

. ;.j k:,jj,..jy. y..... .

. .. ....:::::::::(

jjt!rrg;ji!r'd

.-jji!!!ijiij''-.-..11......

111:::2;11!!'*

,j.!!.!!!!Ii;...'):b,... - ..

-

.

'

..t'

-'y

.;

;'.

.y-..

:'

'

y'j,,g.'

.

...-....

:'

:.

.-'

g'

i'.

.(

33666666636663*,

.....((y.-

!yj'.

-.

.y

.yt,.q..

(''..

y!qi(gy-.

r)'...jj',;L?'.ygy';6!r;.'.jygjyj.yry!;(r!.

..

j);';'.'qL);)'.

E

)'

.':''.('

q

jgy'1!..y'qtd

.(yj

;',rjjy't.

.

sgEy-yzgyj',d

j.(..:j;jjyE.

-'

litjjjltsiytj'y'.k'.

.

y-'(

.

i-('j'j'gijydt.yy.rd

ry.

jy-').

..(jj

(i'yd.;j,'

...j

y'

E:.--

.

.

..

.

y'

.yy..:.

;'r'y'g

..r.

y'y

.

.:..

j'

.

..

r'

.

y'

y..' .j.......'..........

.'.

,,,,,,,,,,,,,'!,,'

... ..

, yy

y'

.

4*

. ..' L-.

, .. . .jy

.tg, yyjyry.yg,.. jj,s

yrgj'.;yjyjd

r

yy.j.-d

.

jyy'.

, y.,),))))j.

.

j'y;dyyjyy'.

yy,y

(tg,'

yy.

ytyy'.'

. t.y;y;y.

yrdj'

..

j'

y ....y

y'

.y. ... ...

E'

'..

.:

:'

;:

:'.

..E''..-(-..i..'

rtjy'E

i: .'

;'..L'.t'jtdk'44::7*

!ItIL.

.'

iE-':jd

kk))r.'

y;il'.kjrt.ilyjrg-7ldrjjklypysjlljpt'jjq'lrsr'd:@),'.,.,,;y.jjjj;g'j;g'-j::yt',.'Lli'y.jjy:yjj'lt-.kt4j?iyrj)j('.'

. ..

:g')k',.;jr!j'(;;'.33t'.,jy',rg..jjjj.

t)$:.,;jjg:.jj;.,.

(;,'.

kjl

)'

yy:yjytrr.

t'yyjy,,,

.,

j'

,:jy..

y'jy,ytd

gj

yjjy'.

;.,

yy.y.

yyd

yyy

,',.r'

.

y',

y

y'

..

y':'

.yjj..-..y..yg.y-

.

.

.

.

.. . .

.

.

.

' .. ... .g..,),.;.tj..$

r4)f,Ljj;.L;;;j).);).f..j;q))..)):k..yy4tjj,r...

.L.(..,,,:

Lqjjjf...yyrjyjjjjjj.,rjyL...j),)j)):)..,..,.))q,..L4)q);;q;.gs.tjjtyyyjygjgytjrjryj:.j).,.;t,),

.r. fjjjj.yy..j. y...

,,.. ; ...

. .. .yjj.,.

,(:!:;qlr....4:7-..ti!iiE;k-,-

dtr:;.!r-'l)p-II.:!i4;.dl:::::4.:2::,.:llr7.:':1L.11:::,,.ll.dliz!,-

'1r:.

dk:r:-,,- :.1.11k2.di:22:).1lE5'..lI::!'.....'...i...---..-).)--..-.---...-..-........---...;...i..f..t--...-;:.,----.-b--b-,i-------;-jbv....--.--?-j:,,-(-,-----..-tk-.,--L-.-----?,,--.a.)-....-.--...-.,y.,-.;,:.-...t,!!i9;...-....-.... .-........-k;!!iiIr......t...-....-.... . -.

(5.23)

Page 104: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

84 Mathematics

5.18 Example Let S = R. The compact intervals with the Hausdorff metric definea complete metric space. Thus, ((0, 1 - 1/n!, n l EN) is a Cauchy sequence which

converges in the Hausdolf metric to (0,11. This is the closure of the set g0,1)which we usually regard as the limit of this sequence (compare2.16), but although(0.1) Cts, JVE0,1),(0,1q)= 0. u

2J ) and J'fs contains the closed and bounded subsetsAnother case is where S = @ , sof the Euclidean plane. To cultivate intuition about metric spaces, a usefulexercise is to draw some figures on a sheet of paper and measure the Hausdorffdistances between them, as in Fig. 5.3. For compact yt and B, duA,B) = 0 if andonly if A = B; compare tlzis with another intuitive concept of the tdistance

between two sets', infxsx,yss#stmy), which is zero if the sets touch orintersect.

.. . . . : 727::!:::::12::!; :;:: ' ' . . . ' ' ' .

. . . . . :.::: :

'' ' . . :: : : :::7::t:::!:!J1:::: ::''

' ' ' ' ' ' ' ' ' '

. . . . . : 7t : : : . . . . . . . . .

' : :: :: (72:2 : 7 7 ! : : 2. ' ' ' ' ' . ' '

''

.'

..: :: : ) : : : : ' : : ). . . . . . . . .

. *1'1. . :: l : ::L l L: t2 : 1

D:2:::::::::r2:2E:1:::2r:2:J:;::;::::t:

....

..

. :. . ..

..

.

...

..

.

..

..

.

.

.. .::::.:.:.:.:.:.:.:.::.:::.::l.:::.:::;;:::i::.:.:.::::.:,...

dr;lrigrjr (:)

a//qik

,

aj!jjl

()h.'

: . .

Fig. 5.3

5.4 Mappings on metric spacesWe have defined a function as a mapping which takes set elements to unique pointsof R, but the tenn is also used where the codomain is a general metric space.Where the domain is another metric space, the results of j2.3 arise as specialcases of the theory. Sme of the following properties are generalizations of thosegiven previously, while others are new. The terms mapping, transformation, etc.,are again synonyms for function, but an extra usage ksfunctional, which refers tothe case where the domain is a space whose elements are themselves functions, with(usually) R as co-domain. An example is the integral defined in j4.1.

The function J: (S,#) F-> (T,p) is said to be continuous at x if for all e > 0 H> 0 such that

sup p(.f@),f(x))< z. (5.24)ys sdx,n)

Here, may depend on x. Another way to state the condition is that for e > 0 3> 0 such that

f(&(x,))i 5'p(.f@),e). (5.25)

Page 105: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Melrfc Spaces 85

where Sd and s'pare respectively balls in (S,J) and (7,p).Similarly, f is said tobe unbrmly continuous on a set A i S if for al1 : > 0, 3 8 > 0 such that

sup sup p(.f(.y),J(x))< :. (5.26)xGz4 y e 5'J(x,)t''4x

Theorem 2.17 was a special case of the following important result.

-1 A) is open (closed)in S wheneverA is open (closed)5.19 Theorem ForA I 1-,f (in 1-, iff f is continuous at al1 points of S.

Proof Assume A is open, and 1etfx) e A for x q.f-144).

Wz have Spflxln ? Afor some E > 0. By 1.2(iv) and continuity at x,

Sdxnn) f-'fsdx,nbLtj c J-1(Sp(f@),e)) i f-'(A). (5.27)-1 A) = s

-y-1(4)

by 1.2(iii), whichIf A is open then''

- A is closed and f (V -

-1 A) is open. This proves sufficiency.is closed if f (-1 i S whenever A is open in 1-,and inTo prove necessity, suppose J (A) is open n

lar J-1(' (/x),E)) for E > 0 is open in S. Since .x

e f-(S (.j(x),:)),particu , p pthere is a 8 > 0 such that (5.25)holds. Use complements again for the case ofclosed sets. .

This property of inverse images under J provides an alternative characterizationof continuity, and in topological spaces provides the primary definition ofcontinuity. The notion of Borel measurability discussed in j3.6 extends naturally

to mappings between pairs of metric spaces, and the theorem establishes thatcontinuous transformations are Borel-measurable.

The properties of functions on compact sets are of interest in a number ofcontexts. The essential results are as follows.

5.20 Theorem The continuous image of a compact set is compact.

Proof We show that, if A c S is compact and Jis continuous, then /'tAlis compact.-1

Let C be an open covering of fA). Continuity of f means that the sets f @),Bs C are open by 5.19, and their union covers A by 1.2(ii). Since A is compact,.A

.j .j

these sets contain a finite subcover, say, J (#1),...,J Bm4.lt follows that

y(A) c /' U.f-'(#y)- U.f(.f-'(>)c UBj,y=1 jv1 y=1

(5.28)

where the equality is by 1.2(i) and ihe seond inclusionBj,...,Bm is a finite subcover of fA) by Cij4ts. sinceT' isthat fA) is compact. w .

. . . . ., .

..

.( (.

.. .

5.21Theorem If Jis continuous on njpd set, it is uniformly continuous on. ..) . . . . ..5..(

' ..'

. .'..

fj...

.'

''. '. ' .'.'

. ...

the set ''.

lrEi,.

. ' y.' ..'

jC;

.E.r: .)()(';jjl.... . kj.g::(k...Fy.

..j

.. . . . .. . j . . .

Proof Let A S be compact. Ch,ptt., '. tt'tkttyt,ryyy,y',.'yt;9y'jr,tty:j(,.i.yyy,rr,,y;d fr eacil x e A, continity a.tx-.'..E'77(;;..r' :

.'ij.tE...!)E;1il):il).l!;)7tj!:itJ))j#15!4).k6.).tjq... .)j..(.q-

.( .....

.. . . !. . ..

... ' '' .. ....

yyy'...-y'yjj'y,(;jjry.qjjjyy.rjj(..jjkykzrr:;d

-:rjjgijr,.jjjjyrr;jj:.iljj;;:rjjygqj;j!jrr:jjy.

,:pj(,jg.'

rjjlg)d

.1jlgq,

(jjlyrrr!ll.d41kk2:::;*

(qjj::--.

41kk2:::;*11:222:::;*(r:ljill;k.d(jii).d:!jEi;,-'.11;:2)*

:!ij;;-.

-:1EE5111,(.:,!iij;;..t!jjI(4(:jpt..'...

'

(111:r::114.*.

., -

.

y'

..'.'

.diIt!j:rjt)'...'

.

.j

y'

.y..

(!jjt:r!''-,

..y..y

.'-.;k1ijj(2!t$;'y

y'

.

t-,'.-'

.

y-'

,;....

(.(;-'j4'

,gyy..t:li

t)j)-',.L-t--..,,blfbqk:t''L.'

.

(y;'.

.'. ;;,.:ijj,...f-

.jjqn,qjv...'k::kf'qktLLLl''

..

11*.

''-'..'

.lil-.' .-y.

-,'

-.

-.-,'-qi;'fqbI;;'.'

:.,.. .....

.-kk;'.

.'Erjj,?.

LjjkLbftjt'

.

)jyyjy-.''.--'---((2:jp,'.'

5.t;y

,','.jyjyd

,j.

;'

..

-(j(y'-

. .

E'jyyyy

-.'.)')66fLLLj.-.'

.

-

('

,

-),'jLiii'''L.

..

y'-y'

--,

.

.

-'.

..yy...qE-

-

y

k'

q

j'-'

.

;'(j1Ii(r:j11:qr:1I,;..

.y...y..' ,

.

.

.

.

. ..

1;:Ej5)11,,*.

....jL:jji,,'''.

-. .

11(2rrjjIIl,.

11;j2r!:;'.

jjjjgl:::))-.IIkrIT)... (j11(r:)l1...,11I::2jII),.

.111:2k::1,,*.

.

(,11:::,14,...r.. .yy

..-,;;p9i;j:r'.-'.,'

.

y'((r)))I-

..g.g... .

y'

..y .

y-;:!iii;y-.

y.

k;jy'jiI-k::IIL..-,yy.'rs yy

-.,.

y;t'11I::2t!):-,'

j

y'yd

j.'. gy.....y(IIE:r:11l,..,

.

.

jy'-

.

jy'.yyd

....

g'g'.

..-yjj...y..;jj,-,

.y'

..

,*4111::)*.

.

,.....

;'yyllllirrrjlk.'gj

yjj.

.g'

tyy

y'

j.

y'

.y....-....C4;CIIi:jyjj,yy..

..y

y'

y

rlll:rjjd

.))j,.y.

j'y

..j.,-,.

j-

y'.y'

yy

..'

.y.y..y.y-.y.

.y'

.......' ...-.

.y.

.

t'

.......yyy.

y'

...............

g.

' .'' ' -

.'

C'-'

'E

'.

;'''

.!. E.

.!q'-?q#').i.

.i.

--.

-'q'

.;'

.it')E''

):.::-.74.

.i-'',,j(;)'

l;-

';t';--,::;;;jg!!r;y.-j)-'

.;

.,-

-i-E.(E'-'j);,4,j)tf,'3t','

-

..:,'.'i;'i---'jkg;jjpjr..

.

.

,';,,*j3.*313413;*..tjjjrrjjq:!)gry:yjj(,.'.---'ibbfLi',,j-

..

-.'

.

''.

.

jr:'-'.:.,,-j,jj;jjjjjjk,:.,y)--.)t;'i.'.-'

' '-'..

.r.j;4'?i

EJiii..''.

---')::'

...

'.

-'.t't(;yy!ryryjjjgr;rryy.k'

-.-y

y'

j.

yrd

..E-

'

.' ).

F:'!)'.-;'.jjjjjjrjjjjj.'.

,,

;'

,.

'-jjj::jjrjj;'.

jj!jjj;j,..(:

:',jjjkjg,r;j'.y'

.....y.

-'-'.

..'..'

.

.-jjjyjy:jjj,.'

..

-j;yjjjjj,jyjjj,,'.

(yjjj.'.

y

rljjj;;-djjr:;rr:r;d

.jjjyyygyjjj-

j-q,,,4fb.'.jjj;jy!r;'.y'

,.rjjjyy:t..

..,-jjj,!r!,jjj(j-..'...

tj.ytjjj't.

.

t'.'.

.

-

.

..

ty'--y'.j';;;r2qpjjy.'.

....

y'

gj.

('.

,

.

)'.y'yyjjprrtjjj-;'

.'.

-'t.'

.'.j

:''jjj;:rjjjjj...'

.' ,

jyj'''

.,.

('

j..'.t.

gjjjjjjjjj'.'

.'.'-'.','ygjjjj)jjjj(yyrjj(y,'yyy

-r,jj:j:r:;;.

.. ..'

.jj122222,,r*.

.'jy,.j'y':-'jj(jgj.-

.

.

y.'t'yjytt'js

.y-jjrrjqyjjjjj-t.,.' .y

,':'

.

ty.-y.jjy'..'

,jjjjj)f:63L...

jjryjpjiij'j..

,

.

jg'y.yyj-,-ly';','

yy

j'ry'j

jy..

-'jyj'

y

,'y.jy'-yj

j-

y'ty'.y'

..j.y.

y'y

y......j-.

.

y'y'.

yjj

y'-

.

j'yd

.

.yy.y

j'yj,d

.

yy'.'y

.

g'

-

.y'

y.

,k-'-

...j-jgyy.y.

.

y-

-.

..

.

....

-'.

44:)(,g;j/r:'

44:)(.j;1d:r7 ((;),,,.

,gjj!r:r

4(:4):2,j-:,((;),((;),--k:::::: rrrjj;!lL11E1!2,rllqq-ll:E),s(Ir:''4422::,tlEsll,d;:::::(Iir11..-(:!i-l' .d1EE!i1..).q-.'.-.-,b:3b...:.'.'.,.Ekikkir:....-..

.

.!tr:())yry-

?!-):;)!t27tE.''''.

.yy;y-(...-.,#.,.yyj,y,k.-.-.-.yy..y?yy)......t-,,..-.yt;y-..jyy..,;,jt-.,...jr))-...jj;tj.-$it)kyy.)jtt.i..rjjjy)y

j.y;jj....yy)....r.-kt..j).yk,-..-jy.)..;(.,.........q..-...

..

-.... .

. .....

y . . ..

... .

...(j........r.

. .........-j;.;.;..-,:.,j-).j.;k.q.-;jr-,y,..y,.yyy.-.t.y4.j..yy,.-y.y?.-y......-E.-y-.q.-..-.(q.yjky.y..,y(y.yy(jyy!..-jyy-..,yyy.jjj.

.j;L....

,)..t,y.-.y,.yj.t,,yt.,yy,(.t,

jy..;y...yjtj;.....,jy..-jj#,,-.-yyy.yjjyt...jj-..)j.,...;,tyy;,.y....-.

.

IIEE:!''.

... ...-... .;;.;(gt;-)..-.

. ..... ....

. . . ..

by 1.2(v). Hence,arbitrary, it follows

Page 106: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

86

compact they contain a finite subcover, say Sdlxknr, k = 1,...,-. Let 8 =

minlckcmrk,and consider a pair of points x,y e S such that dlx,y) < . Now, ye Sdxk,rk) for some kn so that pfx,fyjj < zlE, and also

dxkvx)S dlxk,yt + dlx,y) f rk + 6 S lrk, (5.29)usingthe triangle inequality. Hence plflxnfxl) S r1e,and

p(f@),J@))S p(J(x),f(x1))+ p(Axk),J(y))< e. (5.30)Since, 8 independent of x and y, f is unifonnly continuous on A. w

-1 i is called a homeo-If.f:

S F..+ T' is 1-1 onto, and f and f are cont nuous, Jmorphism, and S and

'r

are said to be homeomorphic if such a function exists. lf Sis homeomorphic with a subset of T', it is said to be embedded in 'T by f. If f alsopreserves distances so that p(f(x4,f@)) = #@,y) for each x,y e S, it is called

an isometry. Metrics J1 and #2 in a space S are equivalent if and only if theidentity mapping from (S,J1) to (S,J2) (themapping which takes each point of Sinto itselg is an homeomorphism.

Mathematics

5.22 Example If Jx and pooare the metrics defined in (5.15) and (5.19)respectively, the mapping gl (R=,#x) --) ((0,1)=,px),where g = (:1,:2,...)and

1 Xi

'(x) =-+

, i = 1,2,...2 2(1 + Ixfl) (5.31)

is an homeomorphism. EI

Right and left continuity are not well defined notions for general metricspaces, but there is a concept of continuity which is <one-sided' with respect tothe range of the function. A function f: (S,#) >-> R is said to be uppersemicontinuous at x if for each E > 0 3 5 > 0 such that, for y e S,

dxnyt< = /'(J) < fx) +:. (5.32)If t.tkJ is a sequence of points in S and #@n,x) -->

0, upper semicontinuityimplies limsupa/xn) f fx). The level sets of the form (x: f(x)< aJ are openfor a1l (x e R iff f is upper semicontinuous everywhere on S. f is lower semi-continuous iff

-f

is upper semicontinuous, and f is continuous at x iff it is bothupper and lower semicontinuous at x.

A function of a real variable is upper semicontinuous at x if it jumps at x withfx) > max(/'(x-),.f(x+)J;isolated discontinuities such as point h in Fig. 5.4 arenot ruled out if this inequality is satisfied, On the other hand, upper semi-continuity fails at point #. Semicontinuity is not the same thing as right/leftcontinuity except in the case of monotone functions', if f is increasing, righttlef't) continuity is equivalent to upper (lower)semicontinuity, and the reverseholds for decreasing functions.

The concept of a Lipschitz condition generalizes to metric spaces. A function fon s,dj satisfies a Lipschitz conditionat x e S if for > 0 3 M > 0 such that,for any y e Sdx,),

Page 107: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces

P(.f(A'),.f@)) f Mhdx,yq) (5.33)

where (.):R+ F-> ER+satisfies hd) 4, 0 as d.1,

0. It satisfies a unrm Lipschitzconditionif condition (5.33)holds uniformly, with tixed M, for all x e S. Theremarksfollowing (2.9) apply equally here. Continuity is enforced by thisconditionwith arbitrary , and stronger smoothness conditions are obtained forspecialcases of h.

Fig. 5.4

5.5 Function SpacesThe non-Euclidean metric spaces met in later chapters are mostly spaces of real

functionson an interval of R. The elements of such spaces are graphs, subsets ofR2. However, most of the relevant theory holds for functions whose domain is anymetricspace (S,#), and accordingly, it is this more general case that we will

study.Let Cs denote the set of all bounded continuous functions.f:

S F-+ R, andde/ne

duf g) = spl J(x)- gx) I. (5.34)xe S

5.23 Thorem dv is a metric.

5 1(a) and (b) re immediate. To prove the triangle inequalityProof Conditions .

i functions J, g and h CN,write, g ven

j (x)... (x)+ glx) - hlx) jdfbhj = SLIP'.

:.

'.. .

'''

... . .'t ' '.Ies

<-154,11.,1..jj(:),....4(:)..-1......#..............4(:4..

..x.....j..j..)...

.......+.,....;.3:''.').t(.'.(....j),b:.. ).I..jI..jj!jj,r 4:yx)--....

j/ri!r4:yx)j(y)jIE S ' ) ; ' ( E :'

..E .E j (.

.. ' ' q ' . . .'. ..

y'

.. .. . . ... .g....g...

('

:.j.j

''

('

.. .,

g'

.;

t'

.

....

j'

g..

t'

g.:jy.

.

;'E

.'y'

..'

...,

.

j'

g.....

.

j'

.

.

<-d ,g.) .!#.,p

,, ;yyr.,bhli . (5.35)

.( . ..:..

.....:.(

.

i.)...j.y...;.jL

E:.y..;y:ry.,:tj..t.yjyEy.yj..-.y

j... rq.............'.E.

..

y'

.

-'.j-

.E..

-';'

EE-

.;)(-.

.yy.('

E

ttd

.rt).yjyriy;.-.r

t'

-

.

yyty'.);'

y'

jy..;

t'

-

.

)'

-..

y;t'.

-i

.-j;'.

y

-

j':'jyjy'

y--.

.

.

j'

.yy.

j'.

y(

y'

..yyjtj,

-y,.......-y

r'

..y

t'

,..

'

.,

q.' -

. ..)......

. .,)L-)t!y(j--? ): j..;(2jEjqy

(..y..y-,:j;)-

:.rq-....

' 'y:t;[email protected]:l).))-.-'jjyyjjgjj(jgj(yjj(ytk.:i'q).;'f'l.(j-EJII?).3fk.i..t

t..t(Ehi.(

li.-.E#.(..'.j)

,

:,);jj..-L.L,.j;;.,,fjfL... . . -,

,,

..

ma.j.;yjgjjy;j(jjtj;;y: ...jt

. .,, yyjjtj)..)j) ..

rygryyj;(.

)jq)j,gLyj,yyjjjjyj.. y. .,jfjj),.

k;

gj.jjyi $, gjygjggjjjkjgrgjypj:g;yjjyjgjjg;jgkyj-jkg;yjjgyr.,4(:;1!'jry;?-21.14k1(:2:llk(1.(1.4k22:4:(11.112.llr!l.4kj2:ir;lj!'i!rIo i!'-l!hl!'i!... .

.;.g....y....y,j.t-(y.jy.yy..jj..44:.ryyt,.-yjjyjyyjjjyj..,..y;ry..jy.jir.,,.ggyyj.;,j)yljjk.jj.jy....(:.j)-ri..(..y,.q

..,js..,..l

..

. . .

..k j ;q

.jj.x,

yy ,vy,,

, yt(;

,, , yj jyjyfyurjyjjy (;outj.yjuous functi ons . lf.-d,d!!qjj-.jlr,:lk.(11.--ylrrrllr.rlk.tjjgrr)l,1:4::,1jl:r-.lk:.$:E1,1,rlrr:)ks

.1rr.

Jjikillk.zll.11:::),.:i4;dlzz!ri'1rr.

,1:2:11-Lqf.66113L32:'......,...--tE!i!t.r.'(11).E'ktliikt..-:3)q.jL:'6t3.f't;jtj..jII2:bi:.I2:bi:.....y)j.jyjJ;;.-,g;yyr;,.-y-. t-.-tjj,..--tj:.?-.,gjykjjj,y--,-

.yryj,jjj-.;jjs---,,L...tj,j,(.L)L;));;;--):-:fi.

yjjy.,..).L......jt..-,,.-j......-, ...

..,;,.yy.: z.lttrit/lltq:(q)

.

..g

<,,.. . . .

-- k!r)-;,.11::,-.y.......,,....s....-r......-..k....y,.z..-.,r,.,y.-,1,)..,,,)#t)-t!s,i(..'.y?.-.-.jj-,..

tt......-,,,,,,,,,-,-,,.;2,....#...-,,.,,.-,t!)1y),ty,,j;-.,-.....(-v...;;,.--k.,,,.;;,.-,..1.

,,,-#.s---,...,.,...,..,.,...-.....,iills.,..

.(1k..atl

-,k-,.

dk,rl.(:j,,.d:::rdlE:,-m(14:,,ad:::rt ,- Ik,r-,k-,.

ev.-r,,--

'EEE!Edt:EE::'t!!,;!lr:.LasS IS Compact, Cs = & ) ., ut. .y ,

Page 108: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

88

a uniformly continuous restriction to S, and evel'y f e Us has a continuousextension to V, say J, constructed by setting Jx) = J(x) for a' e S and

'x)

=

limu/xu) for a sequence fx:.G S l converging to x, for each x e V-S. Note thatfor any pair fbf' G S, dvf'l = duf,f). so that the spaces Ck and Us areisometric. There are functions that are continuous on S and not on , but thesecannot be uniformly continuous.

The following is a basic property of Cs which holds independently of the natureof the domain S.

5.24 Theorem (Cs,#g) is complete.

Proof Let (Jnl be a Cauchy sequence in C's; in other words, for 6: > 0 3 Ns 1such that dulfnnfm) f E for n,m > Nz. Then for each x e S, the sequences (/@) )satisfy Ifnx) - fmx)l s Jtz(/,Al; these are Cauchy sequences in R, and so havelimits fx). In view of the definition of dv, we may say that jn

-->

J unifonnly inS. For any x,y 6 S, the triangle inequality gives

lfx) -.f(A') l f 1/'(x)- fnx) l + l/@) - fnlL I + Ifn-f(.y)

!. (5.36)

Mathematics

Fix e > 0. Since fn e Cs, ('i 6 > 0 such that Ifnlx)- Jn(.y)I < Y if dx,y) < .

Also, by unifonn convergence (; n large enough that

maxfIfx) - fnlx)l, Ifaly)- f@lIl < 13s, (5.37)

so that Ifx) - J(y)1< z. Hence f e C's, which establishes that Cs is complete. w

Notice hpw this property holds by virtue of the unifonn metric. It is easy todevise sequencej of continuous functions converging to discontinuous limits, butnone of these are Cauchy sequences. lt is not possible for a continuous functionto be arbitrarily close to a discontinuous function at every point of the domain.

A number of the results to follow call for us to exhibit a continuous functionwhich lies. uniformly close to a function in Us, but is fully specified by a finitecollection of numbers. This is possible when the domain is totally bounded.

5.25 Theorem Let (S,J) be a totally bounded metric space. For any f G &s, thereexistj for any : > 0 a function g e Us, completely specified by points of thedomain .x1,...,au and rational numbers J1,...,Jm, such that dvjqj < E. D

We specify rational numbers here, because this will allow ud to assert in appli-cations that the set of a11possible g. is countable.

nProof ' By total boundedness of S, R for 8 > 0 a finite-net

(x1,...,au). For eachxi, 1etAf = fx: dx,xi) k 26) and Bi = (x: dxnxi) K ,1)

, and detine functions

gi: S r- 0, 1Jby

dxli)gix) =

s ),dxAij + dlX, i(5.38)

where dxA) = infysxltmy). dxl) is a uniformly continuous function of .: byconstmction, and gilx) is also uniformly continuous, for the denominator is never

Page 109: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 89

less than ). Then define

'bl-lp/xl7f

'4x)= .

'7-1:/x)(5.39)

Being a weighted average of the numbers (JfJ,gx) is bounded. Also, since (xflisa

-net

for S, there exists for every x e S some i such that dxli) k , as well

as dx,Bi) < dx,xi) f , and hence such that gix) k . Therefore, ZT=Ip(x)

and unifonn continuity extends from the gi to g.For arbitrary f e &s, fix E > 0 and choose small enough that Ifx) - fyl l<

ls when #(x,y) < 26, for any x,y e G. Then tixm large enough and choose xf and aifor i = 1,...,-, so that the Sxinn) cover S, and lfx - Jfl < Y for each i.Note that if dxtx k 28 then -x

e Aj and gix) = 0, so that in a1l cases

gixtlfx) - fx I f.ftxls.

(5.40)Hence

gix)Ifxt -

ai I < gix) lfx) - f x I + gix) Ifx - ai I

<gixtz (5.41)for each x 6 S and each i. We may conclude that

duf,gl= sup Ifx)-:(x)1

x 6 5

ZT-1p(x)Ifx) - Jfl

S sup < z. .

xss Z:'-1'/x)(5.42)

The next result makes use of this approximation theorem, and is fundamental. Ittells us (recallingthe earlier discussion of separability) that spaces of contin-uous functions are not such alien objects from an analytic point of view as theymight at first appear, at least when the domain is totally bounded.

5.26 Theorem(i) If (S,#) is totally bounded then (&s,Jg) is separable.

(ii) If (S,#) is compact then cn,dulis separable.

Proof We need only prove part (i),since for part (ii),Cs = Us by 5.21 and thesame conclusion follows.

Fix m and suitable points (xl,...,aulof S so as to define a countable family offunctionsAm= fgmk,k e IN), whUre the g.k aredefined as in 5.25, and the index k

.'

. '..

..'

.

enumerates the countable collectibn of lrvectofs (J1,...,tu) of rationals. For

each: > 0, there exists m large enppgh yhqt,for each f e &s, duf, gm < : for5 A = limp-xA?n ij, tits'ftibi, apd there exists gk e A such thatsomek. By 1. , .)

..y,,y.

> 0. Th@'!'ril,.

' 'd.ttlh,gtAis dense in &s. .dvy, gk) < : for every : . , ,, ,,

,'

t).',t)'ttyytlyyttttt.'tij.'yjy,,yt.#,y,

yytyjyy jjojoug uyytsy mox geueraj.,'ttyljk

yj,jsy,.1r:.:lrr-.d.pir,-EIl..-..jj3i,1.......

d1:::,i-....--.i1friil-.

. ).()!j,,-jyyt-r),t-.y.ytyj-,.-.

yty..,.434......t-y.y-.)(tyy.y,yy..-,)g..)ky,yy..yyy-.j));;.,j);)q.;j...;);)...,typtl-l-y-y-)..

yy...;jygr-.j-y--..-..y;;)))..;....-..;.............

......To show that we canno

..-......

.. ..r;-.)t);t@)j-#).;-y--t..r-y)-...-.

.

Page 110: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

90

circumstances, we exhibit a nonseparable function space.

5.27 Example For S = (tz,:J,an interval of the real line, consider the metric

space Dka,bndu) of real, bounded cadlag functions of a real variable. Cadlag isa colourful French acronym continue droite, limites gauche) to describefunctions of a real variable which may have discontinuities, but are right contin-

uous at every point, with the image of every decreasing sequence in Ltz,:)contain-ing its limit point; in other words, there is a limit point to the left of everypoint. Of course, Cp,s) c Dp,x. Functions with completely arbitrary discon-tinuities form a larger class still, but one that for most purposes is toounstructured to permit a useful theory.

To show that Dfa,bjsdu) is not separable, conslder the subset with elements

Mathematics

0, t < 0J/f) =

, 0 6 EJ,1.

1, t k 0(5.43)

This set is uncountable, containing as many elements as there are points in gtz,hJ.But duh,h = 1 for 0 # 0', so it is also discrete. Hence Dta,bj,du) is notseparable by 5.9. (n

Let A denote a collection of functions f : (S,#) F-y (T,p). A is said to be equi-continuous at x G S if Y 6: > 0 3 > 0 such that

sup sup pfyt,fxj) < :. (5.44).feA y G Sx,&)

,4 is also said to be unformly equicontinuous if Y 6: > 0 3 8 > 0 such that

supsup sup pfyl,fxl) < :. (5.45)feA a:s s yc Saxnnl

Equicontinuity is the property of a set of continuous functions (or uniformlycontinuous functions, as the case may be) which forbids limit points of the set tobe not (uniformly)continuous. ln the case when :4 Cs (&s) but A is not(uniformly) equicontinuous, we cannot rule out the possibility that W Cs (Us)k

An important class of applications is to countable A, and if we restrict atten-tion to the case A = ffn,n e IN), A c Cs torUs) may not be essential. lf we arewilling to tolerate discontinuity in at most a finite number of the cases, thefollowing concept is the relevant one. A sequence of functions f/, n e IN) will besaid to be asymptotically equicontinuous at .x if Y E > 0 3 > 0 such that

limsup sup p(A(y),Jn(x))< e,,l-yx ye Sdxszj

(5 46)

and asymptotically ?zzif/brzlfy equicontinuous if V e, > 0 R > 0 such that

Page 111: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Metric Spaces 91

limsup sup sup p(Jn@),n(x)) < E.

n--:x ;r e y e Jx,&)(5.47)

If the functions fnare continuous for a11n, limsupn-yx can be replaced by supn in(5.46) and similarly for (5.47)when all the fnare uniformly continuous. ln thesecircumstances, the qualifier asymptotic can be dropped.

The main result on equicontinuous sets is the Arzel-Ascoli theorem. Thisdesignation covers a number of closely related results, but the following version,which is the one appropriate to our subsequent needs, identifies equicontinuity asthe property of a set of bounded real-valued functions on a totally bounded domainwhich converts boundedness into total boundedness.

5.28 Arzel-Ascoli theorem Let (S,#) be a totally bounded metric space. A set Ac Cs is relatively compact under dv iff it is bounded and uniformly equi-continuous.

Proof Since Cs is complete, total boundedness of A is equivalent to relativecompactness by 5.13. So to prove if' , we assume boundedness and equicontinuity,and construct a tinite E-net for A.

It is convenient to define the modulus of continuity of f, that is, the function

w: C xR+ h- R+ wheres

w(T,) = sup sup If(.y4- J@)I.

a.ss ysdx,)(5.48)

Fix s > 0, and choose 8 (as is possible by unifonn equicontinuity) such that

sup w(.f,) < :. (5.49)ysA

Boundedness of A under the unifonn metric means that there exist finite realnumbers U and L such that

L f inf fx) S sup #4x)f U. (5.50)/'lA.xe S DA,XC 5

Let (x1,...,xm)be a-net

for S, and construct the finite family

Dm = l'k q A, k = 1,...,(v + 1)-) (5.51)according to the recipe of 5.25, with the constants ai taken from the finite setfL + (&- Ljulvj , where u and v are integers with v exceeding CU- L4Ie and u =

0,...,v. This set contains v + 1 real values between U and L which are less than 6:

apal't, so that Dm has (v+ 1)= members, as indicated. Since the assumptions imply4 c &s, it follows by 5.25 that for every f E A there exists gk e Dm withdvf,g < :. This shows that Dm is a e-net for A, and A is totally bounded.

To proveionly ito, suppose A is relatively compact, and hence totally bounded.

Trivially, total boundedness implies boundedness, and it remains to show unifonnicontinuity. Consider for E >

'

0 the setequ

Page 112: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

#k(:) = (/': w(/',1/k) < :). (5.52)Uniform equicontinuity of A is the condition that, for any 6: > 0, there exists klargeenough that X'c Bkz). lt is easily veritied that

lw(.f,) - w(',* I f 2JrXJ,'), (5.53)so that the function w(.,): (Cs,d F--> R*,dE) is continuous. Bkz) is the in-verse image under w(.,8) of the half-line (0,:) which is open in R+, and henceBkzj is open by 5.19. By definition of Cs, w(/', 1/k) -- 0 as k

-->

x for each f eCs . In other words, w converges to 0 pointwise on Cs , which implies that thecollection (#k(:),k e IN) must be an open coverinAforCs, and hence forW.But b.-hypothesis X is compact, every such covering of A has a finite subcover, and so Aq Bke) for finite k, as required. .

Page 113: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

6Topology

6. 1 Topological SpacesMetric spaces fonn a subclass of a larger class of mathematical objects calledtopological spaces. These do not have a distance defined upon them, but theconcepts of open set, neighbourhood, and continuous mapping are still welldefined. Even though only metric spaces are encountered in the sequel (Part VI),much of the reasoning is essentially topological in character. An appreciation ofthe topological underpinnings is essential for getting to grips with the theory ofweak convergence.

6.1 Definition A topological space (X,I) is a set X on which is defined a topol-

ogy, a class of subsets 'r called open sets having the following properties:(a) X e T, O e 1.

(b) If C i T, then U()scoe T.

(c) If 01 G'r, Oa G T, then 01 f-'h 02 q T. n

These three conditions dehne an open set, so that openness becomes a primitiveconcept of which the notion of e-spheres around points is only one characteriza-tion. A metric induces a topology on a space because it is one way (thoughnot theonly way) of detining what an open set is, and a1lmetlic spaces are also topolog-ical spaces. On the other hand, some topological spaces may be made into metric

spaces by defining a metric on them under which sets of ':

are open in the sensedefined in j5.1. Such spaces are called metrizable.

A subset of a topological space (X,I) has a topology naturally induced on it bythe parent space. If A c X, the collection 'u

= (A f''h 0: O G 'IJ is called therelative topology for A. (A,n) would normally be referred to as a subspace of X.If two topologies ':1 and .:2

are defined on a space and :1 c T2, then T1 is said tobe coarser, or weaker, than T2, whereas 12 is hner stronger) than I1. In partic-ular, the power set of X is a topology, called the discrete topology, whereas(?,x J is called the trivial topology. Two metrics define the same topology on aspace if and only if they are equivalent. If two points are close in one space,their images in the other space must be correspondingly close.

If a set O is open, its complement Oc on X is said to be closed. The closure X'of an arbitrary set A ? X is the intersection of a1l the closed sets containing A.As for metric spaces, a set A i #, for B c X, is said to be dense in B if B c X.6.2 Theorem The intersection of any collection of closed sets is closed. X and Oare both open and closed. n

Page 114: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

94

However, an arbitral'y union of closed sets need not be closed, just as an arbit-rary intersection of open sets need not be open.

For given .x

e X, a collection Vxof open sets is called a base for the point xif for every open O containing x there is a set B e Vxsuch that x e B and B c 0.This is the generalization to topological spaces of the idea of a system ofneighbourhoods or spheres in a metric space. A basefor the topology T on X is acollection V of sets such that, for every O 6 ,r, and every x 6 0, there exists B eV such that x e B c 0. The definition implies that any open set can be expressedas the union of sets from the base of the topology; a topology may be defined fora space by specifying a base collection, and letting the open sets be defined asthe unions and finite intersections of the base sets. ln the case of R, forexample, the open intervals form a base.

Mathematics

6.3 Theorem A collection V is a base for a topology I on X iff(a) Ulevf = X.(b) V #1,#2 e F and x e #1 chB1. 3 B5 e V such that a' e #3 c #1 /n %.

Proof Necessity of these conditions follows from the definitions of base and openset. For sufficiency, define a collection 'r in terms of the base F, as follows:

O e'r iff, for each .'&'

e 0, H B e F such that x q B c 0.

O s4tisfies the condition in (6.1), and X satisfies it given condition (a) of thetheorem. lf C is a collection of T-sets, UtxwO e T since (6.1) holds in this casein respect of a base set B corresponding to any set in C which contains x. And if01,02 G 1, and x e 01 fa O1. then, letting #1 and B1 be the base sets specified in(6.1) in respect of x and 01 and 01 respectively, condition (b) implies that x e

lti h shows that I is closed under finite intersections. Hence, ': isB5 c 01 f''h O1, w ca topology for X. .

The concept of base sets allows us to generalize two further notions familiarfrom metric spaces. The closure points (accumulationpoints) of a set A in atopological space (X,'r)are the points x e X such that every set in yhe base of a7contains a point of A (a point of A other than x). An important exercise is toshow that x is a closure point of A if and only if .x is in the closure of z4.

We have generalizations of two other familiar concepts. A sequence tak) ofpoints in a topological space is said to converge to x if, for every open set Ocontaining x, H N k 1 such that ak e O for all n k N. And .x is called a clusterpoint of (.:kl if, for every open O containing .x and every N k 1, xn e O for somen k N. ln general topological spaces the notion of a convergent sequence isinadequate for characterizing basic properties such as the continuity of mappings,and is augmented by the concepts of net and///er. Because we deal mainly withmetric spaces, we do not require these extensions (seee.g. Willard 1970: Ch. 4).

6.2 Countability and CompactnessThe countability tutppu provide one classitkation of topological spaces accord-ing, roughly speaking, to their degree of structure and amenability to the methods

Page 115: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 95

of analysis. A topological space is said to satisfy the first axiom of countabil-ity (to be jrst-countable) if every point of the space has a countable base. Itsatisfies the second axiom of countability (issecond-countable) if the space as awhole has a countable base. Every metric space is first-countable in view of theexistence of the countable base composed of open spheres, Sx, 1/n) for each x.More generally, sequences in first-countable spaces tend to behave in a similarmanner to those in metric spaces, as the following theorem illustrates.

6.4 Theorem ln a first-countable space, x is a cluster point of a sequence (.v, ne IN) iff there is a subsequence (xa#,k s IN) converging to x.

Proof Sufficiency is immediate. For necessity, the definition of a cluster pointimplies that 3 n k N such that xn e 0, for every open 0 containing x and everyN k 1. Let the countable base of .x be the collection Vx= fBi, i G (11), and choose

a monotone sequence of base sets (Ak,k G INl containing x (andhence nonempty)with A1 = #l, and Ak c Ak-l f''h Bk for k = 2,3,...; this is always possible by 6.3.Since x is a cluster point, we may construct an infinite subsequence by taking xnkas th: next member of the sequence contained in 4s for k = 1,2,... For every openset O containing x, R N k 1 such that xnk E Ak 0, for a1l k k N, and hence xnk--> x as k

.-..:

x, as required. w

The point of quoting a result such as this has less to do with demonstrating a newproperty than with reminding us of the need for caution in assuming properties wetake for granted in metric spaces. While the intuition derived from R-like situat-ions might lead us to suppose that the existence of a cluster point and a conver-gent subsequence amount to the same thing, this need not be true unless we canestablish first-countability.

A topological space is said to be separable if it contains a countable densesubset. Second-countable spaces are separable. This fact follows directly ontaking a point from each set in a countable base, and verifying that these points

are dense in the space. The converse is not generally true, but it is true formetric spaces, where separability, second countability and the Lindelf property(that every open cover of X has a countable subcover) are all equivalent to oneanother. This is just what we showed in 5.6. More generally, we can say thefollowing.

6.5 Theorem A jecond-countable space is both separable and Lindelf.

Proof The proof of separability is in the text above. To prove the Lindelfproperty, let C be an open cover of X, such that UzeCA = X.For eachA e C and .x

eA, we can find a base set Bi such thgt x 6 Sj c A. Since UT=1#f= X, we may choose

a countable subcollection Af, i = 1,2,... such that Bi c 4f for each i, and henceU7=1Xf = X. K

A topological space is said to be compact if every covering of the space by opents has a finite subcover. It is said to e countably compact if each countablese

coveringhas a finite subcovering. And it is said to be sequentially compact ifeach sequence on the space has a convrgnt subsequpce. Somesimes, compact-

Page 116: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

96

ness is more conveniently characterized in tenns of the complements. The comple-ments of an open cover of X are a collection of closed sets whose intersection isempty', if and only if X is compact, every such collection must have a finite sub-collection with empty intersection. An equivalent way to state this proposition isin terms of the converse implication. A collection of closed sets is said to havethe Tnite intersection property if no finite subcollection has an empty inter-section. Thus:

Mathematics

6.6 Theorem X is compact (countablycompact) if and only if no collection(countable collection) of closed sets having the finite intersection property has

an empty intersection. a

The following pair of theorems summarize important relationships between thedifferent varieties of compactness.

6.7 Theorem A first-countable space X is countably compact iff it is sequentiallycompact.Proof Let the space be countably compact. Let fxn,n G EN) be a sequence in X,anddetine the sets Bn = fxn,ak+1,...), n = 1,2,... The collection of closed sets (Tn,n e (N) clearly possesses the finite intersection property, and hence (X'n isnonempty by 6.6, which is another way of saying that fak)has a cluster point.Since the sequence is arbitrary, sequential compactness follows by 6.4. Thisproves necessity.

For sufficiency, 6.4 implies that under sequential compactness, al1 sequences inX have a cluster point. Let (Ci, i e INl be a countable collection of closed setshaving the finite intersection property such that A,y = 0Z=IC/ # 0, for everyfinite n. Consider a sequence fxn) chosen such that xn e A,,, and note since (A,,)is monotone that xn e Am for all n 2 m; or in other words, Am contains thesequence fak,n k /?z). Since (ak)has a cluster point x and Amis closed, x e Am.This is true for every m e N , so that OT=1Qis nonempty, and X is countablycompact by 6.6. w

6.8 Theorem A metl'ic space (S,#) is countably compact iff it is compact.

isProof Sufficiency is immediate. For necessity, we show first that if Scountably compact, it is separable. A metric space is first-countable, hencecountable compactness implies sequential compactness (6.7),which in turn impliesthat every sequence in S has a cluster point (6.4).This must mean that for any 6:

> 0 there exists a finite E-net (m,...,auJ such that, for a1l x e S, dx,xk) < E,for some k E (1,...,-) ; for otherwise, we can construct an infinite sequence fxn)with dxn,xn') k 6: for n # n', contradicting the existence of a cluster point.Thus, for each n e INthere is a finite collection of points 4,, such that, forevery x e S, #@,y) < 2-n for some y e An. The set D = U*n=1Anis countable anddense in S, and S is separable.

Separability in a metric space is equivalent by 5.6 to the Lindelf property,that every open cover of S has a countable subcover', but countable compactnessimplies that this countable subcover has a finite subcover in its turn, so that

Page 117: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 97

compactness is proved. .

Like separability and compactness, the notion of a continuous mapping may bedefined in tenns of a distance measure, but is really topological in character. In

a pair of topological spaces X and , the mapping /':X F--y f is said to be contin--1 i in X when B is open in , and closed in X when B is closeduous if J (#) s open

in . That in metric spaces this definition is equivalent to the more familiar onein terms of s- and 8-neighbourhoodsfollows from 5.19. The concepts of homeomor-phism and embedding, as mappings that are respectively onto or into, and 1-1continuous with continuous inverse, remain well defined. The following theoremgives two important properties of continuous maps.

6.9 Theorem Suppose there exists a continuous mapping f from a topologicalspace X onto another space .

(i) If X is separable, T is separable.(ii) If X is compact, is compact.

Proof (i)The problem is to exhibit a countable, dense subset of . Consider .f(D4where D is dense in X. lf J(D) is the closure of J(D), the inverse imagef-JfD is closed by continuity of f, and contains f -1(/'(D)), and hence alsocoptains D by 1.2(iv). Since X'is the smallest closed set containing D, and X c 5,it follows that X c J-1C(D)). But since the mapping is onto, f = /X)ff-1(fD))4 c J(D), where the last inclusion is by 1.2(v). J(D) is thereforedense in '

as required. J(D) is countable if D is countable, and the conclusionfollows directly.

-1 C) must be an open cover of(ii) Let C be an open cover of . Then (J (#):B eX by the definition. The compactness of X means that it contains a finite sub-

-1 B ) f-1B ) such thatcover, say f ( l ,..., n

y'= y(x) = y Uy-'tspy=1 = J y-1 tjsy c Usy,

j=L /=1(6.2)

where the third equality uses 1.2(ii) and the inclusion, 1.2(v). Hence C contains

a Gnite svbcover. .

Note the importance of the stijulation <pnto' in both these results. The extensionof (ii)to the case of compact subsets of X and f is obvious, and can be suppliedby the reader.

Completeness, unlike separability, compactness, and continuity, is not a topo-logical property. To define a Cauchy sequence it is necessary to have the conceptof a distance between points. One of the advantages of defining a metric on aspace is that the relatively weak notion of completeness provides some of theessential features of compactness in a wider class than the compact spaces.

6.3 Separation Properties

Another classification of topological spaces is provided by the separation tztppzu,

whichin one sense are more primitive than the countabilityaxioms. They are

Page 118: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

98

indicators of the richness of a topology, in the sense of our ability to distin-guish between different points of the space. From one point of view, they could besaid to define a hierarchy of resemblances between topological spaces and metric

spaces. Don't confuse separation with separability, which is a different conceptaltogether. A topological space X is said to be:

Mathematics

- a Fl-space, iff V x,y e X with x # y 3 an open set containing x but not y andalso an open set containing y but not

.x;

- a Hausdotffo Fa-) space, iff V x,y e X with x # y 3 disjoint open sets 01 and01 in X with .x

e 01 and y E O2;

- a regular space iff for each closed set C and .x C Hdisjoint open sets 01 and02 with x e 01 and C c O2;

- a normal space iff, given disjoint closed sets C1 and Cz, 3 disjoint open sets01 and 01 such that C1 c 01 and (72 c O2.

A regular Fl-space is called a Fg-space, and a nonnal Fl-space is called aF4-space.

In a Fl-space, the singleton sets (.:) are always closed. In this case y e (xJfwhenever y # x, where (xJC is the complement of a closed set, and hence open.Conversely, if the F1 property holds, evel'y y # x is contained in an open set notcontaining x, and the union of all these sets, alsp open by 6.1(b), is (x)f. It iseasy to see that F4 implies Tz implies Fa implies T1, although the reverseimplications do not hold, and without the F1 property, nonnality need not implyregularity. Metric spaces are always F4, for there is no difficulty in construct-ing the sets specified in the definition out of unions of E-spheres.

We have the following important links btween separation, compactness, count-ability, and metrizability. The first two results are of general interest but will

not be exploited directly in this book, so we forgo the proofs. The proof of 6.12needs some as yet undefined concepts, and is postponed to j6.6 below.

6.10 Theorem A regular Lindelf space is normal. a

6.11 Theorem A compact Hausdorff space is F4. (a

6.12 Urysohn's metrizatlon theorem A second-countable Fg-space is metriz-able. u

In fact, the conditions of the last theorem can be weakened, with F4 replaced byFa in view of 6.10, since we have already shown that a second-countable space isLindelf (6.5).

The properties of functions from X to the real line play an important role indefining the separation properties of a space. The key to these results is theremarkable Urysohn's lemma.

6.13 Urysohn's Iemma A topological space X is nonnal iff for any pair A and Bof disjoint closed subsets there exists a continuous function f: X

.-+

(0,1jsuchthat jA) = 0 and J(#) = 1. :l

The function J is called a separating function.

Page 119: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 99

Proof This is by construction of the required function. Recall that the dyadicrationals D are dense in (0,1j.We demonstrate the existence of a system of opensets fUr, r e Dl with the properties

A c &r;

Vrfa # = 0', (6.4)'X c Ur for r > s. (6.5)

Nonnality implies the existence of an open set 171/2(say)such that 171/2containsd (& )ccontains B. The same story can be told with &f/2replacing B in the:4 an 1/2

role of C1 to define 171/4,and then again with 171/2replacing A in the role of C1to define Um. The argument extends by induction to generate sets (Umnn,m =

1,...,2*- 1) for any n e EN,and the collection (Ur, r e D) is obtained on letting

n --A x. lt is easy to verify conditions (6.3)-(6.5)for this collection. Fig. 6.lillustrates the construction for n = 3 when A and B are regions of the plane. Onemust imagine countably many more

<layers of the onion' in the limiting case.

)1 !/3/8 :/1/2 315ls !/3/4

Llzls:/1/4

U1/8B

,4

Fig. 6.1

Now define f: X --)

(0,11by

inffr e D: x e &r) , x e Urss Ur

x - UrsoUr(6.6)

where, in particular,.f(x)

= 1 for x e #. Because of the monotone property (6.5),we have f:r any a e (0,1)

(x: fx) < al = (x:inf fr e D: x e < a)

= (x:H r < a such that x e &r)

= U&r, (6.7)r< (X

which is open. On the other hand, because D is dense in g0,1qwe can deduce that,

Page 120: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

100

for any k$e 0, 1),

(x:

Mathematics

flx) < I31= (x:inf tr e D: x e &rl < jlJ= (x: x e Ur V r > ) )= OUr

rn.js

= r) r (6.8)r>f$

which is closed. Here, the final equality must hold to reconcile the following twofacts: first that Ur i , and second that, for al1 r > I3,there exists (sinceDis dense) s e D with r > s > k$and Us c Ur by (6.5).We have therefore shown that,for 0 S ;$< a ; 1,

(x: j < fx) < a) = (.x:flx)< a) fa fx: flx) K ;'$)C

(6.9)is open, being the intersection of open sets. Since every open set of (0,11is a

-1 A) is open in X whenever Aunion of open intervals (see2.5), it follows that f (is open in g0,11,and accordingly f is continuous. It is immediate that f(A4 = 0and f(#) = 1 as required, and necessity is proved.

Sufficiency is simply a matter, given the existence of f with the indicatedrties of citing the two sets J-1(g0,1))and J-1((11j), whose images arePrOPe , a a,

open in X, disjoint, and contain ,4 and B respectively, so that X is normal. w

It is delightful the way this theorem conjures a continuous function out of thinair! It shows that the properties of real-valued functions provide a legitimatemeans of classifying the separation properties of the space.

In metric spaces, separating functions are obtained by a simple direct construc-tion. If 4 and B are closed and disjoint subsets of a metric space (S,#), thenonnality property implies the existence of > 0 such that infxsz,yswJtmy) k 6.The required function is

Jtx,Alfx) =

dxv) + ty(x,s) (6.10)

where dxl) = infysxttx,y), and J(x,S) is defined similarly. The continuity of ffollows since dxl) and #(x,#) are continuous in x, and the denominator in (6.10)is bounded below by . A similar construction was used in the proof of 5.25.

The regularity property can be strengthened by reqtliring the existence ofseparating functions for closed sets C and points x. A topological space X is saidto be completely regular if, for a1lclosed C c X and points x C, 3 a continuousfunction fl X 1- E0,1)with fc' = 0 and flx) = 1. A completely regular Fl-spaceis called a Tychonoff space or Fgj-space. As the tongue-in-cheek terminologysuggests, a L-space is F3j (this is immediate from Urysohn's lemma) and aF3j-space is clearly F3, although the reverse implications do not hold. Being F4,metric spaces are always F?j.

Page 121: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 101

6.4 Weak Topologies

Now let's go the other way and, instead of using a topology to define a class ofreal functions, use a class of functions to define a topology. Let X be a spaceand F a class of functions f : X i-- y, where the codomains y are topologicalspaces. The weak topology induced by F on Xis the weakest topology under which

-1every f e F is continuous. Recall, continuity means that J (A) is open in Xwhenever A is open in y. We can also call the weak topology the topology gener-ated by the base of sets V consisting of the inverse images of the open sets ofthe y, under f e F, together with the finite intersections of these sets. Theinverse images themselves are called a sub-base for the topology, meaning that thesets of the topology can be generated from them by operations of union and finiteintersection.

If we enlarge F we (potentially)increase the number of sets in this base and

get a stronger topology, and if we contract F we likewise get a weaker topology.With a given F, any topology stronger than the weak topology contains a richercollection of open sets, so the elements of F must retain their continuity in thiscase, but weakening the topology further must by definition force some fq F to bediscontinuous.

The class of cases in which y = R for each f suggests using the concept of aweak topology to investigate the structure of a space. One way to represent therichness of a given topology ':

on X is to ask whether 'r contains, or is containedin, the weak topology generated by a particular collection of bounded real-valuedfunctions on X. For example, complete regularity is the minimal condition whichmakes the sol4 of construction in 6.13 feasible. According to the next result,this is sufficient to allow the topology to be completely characterized in termsof bounded, continuous real-valued functions on the space.

6.14 Theorem lf a topological space (X,':) is completely regular, the topology ': isthe weak topology induced by the set F of the separating functions.

Proof Let F denote the collection of inverse images of open sets under the funcrtions of F. And 1et F denote the weak topology induced by F, such that the F-sets,together with their finite intersections, fonn a base for F. We show that F = T.

For any x e X, let O e'r be an open set containing x. Then Oc is closed, and by

complete regularity there exists f e F taking values in (0,1j with J(I) = l andJ(O9 = 0. The set (1z,1)is open in g0,1j, and B = J-1((, 1q) is therefore anopen set, containing x and disjoint with Oc so that B c 0. Since this holds forevery such 0, x has a base Vxconsisting of inverse images of open sets underfunctions from F . Since x is arbitrary the collection V = (Vx,x e XJ forms a basefor T. lt follows that T ? F.

On the other hand, F is by definition the weakest topology under which every fE F is continuous. Since f E F is a separating function and continuous under 'r, italso follows that F c 1. .

Page 122: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mathematics

6.5 The Topology of Product Spaces

Let X and f be a pair of topological spaces, and consider the product space Xx .

The plane R x R and subsets thereof are the natural examples for appreciating theproperties of product spaces, although it is a useful exercise to think up moreexotic cases. An example always given in textbooks of topology is C XC where C isthe unit circle; this space has the topology of the torus (doughnut).

Let the ordered pair (x,y)be the generic element of Xx f. The coordinate

projections are the mappings 7:x: Xx F-) X and z:v: Xx f F-> , defined by

lx(.Y,') = A' (6.11)1:(-Y,)/) = y. (6.12)

If X and are topological spaces, the coordinate projections can be used togenerate a new topology on the product space. The product topology on X x is theweak topology induced by the coordinate projections.

The underlying idea here is very simple. lf A i X and B q f are open sets, the- 1 d Xx B = 1:y- 1(#) will beset A x B = (Ax f) rn (Xx #), where A x f = lk (z4)an ,

regarded as open in Xx , and is called an open rectangle of Xx f. The producttopology on Xx is the one having the open rectangles as a base. This means thattwo points @1,y1)and @2,y2)are close in Xx provided xl is close to x2 in X,antl yl to y2 in f. Equivalently, it is the weakest topology under which thecoordinate projections are continuous.

If the factors are metric spaces (X,#x) and f,h), several metrics can beconstructed to induce the product topology on Xx , including

p((x1,y1),@2,y2))= maxllxtxl,xzl, Jr(A'1,y2)l (6.13)

andp'(@1,A'1), @2,y2))= #x@1sx2)+#:/1,y2). (6.14)

An open sphere in the space (X x , p), where p is the metric in (6.13), alsohappens to be an open rectangle, for

5'p(@,A'),) = Sdsxnnt X Sdj,vnt', (6.15)but of course, this is not true for every metric.

Since either X or may be a product space, the generalization of these resultsfrom two to any finite number of factors is straightforward. The generic elementof the space X2=1Xfis the n-tuple (x1,...,ak:xi e Xf), and so on. But to dealwith infinite collections of factor spaces, as we shall wish to do below, it isnecessary to approach the product from a slightly different viewpoint. Let 4denote an arbitrary index se

,t and fXa, (x l A ) a collection of spaces indexed by4. The Cartesian product X = Xasxxais the collection of a1l the mappings x: 4F-y Uaszxasuch that x(a) e Xa for each (x e A. This definition contains that givenin j1.1 as a special case, but is fundamentally more general in character. The

,coordinate projections are the mappings 1a: Xn F-> Xa with

Page 123: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 103

l(x(.%)= Xq), (6.16)but can also be defined as the images under x of the points a e A.

Thus, a point in the product space is a mapping, the one which generates thecoordinate projections when it is evaluated at points of the domain z4. In the caseof a finite product, A can be the integers 1,...,l. In a countable case A = EN,ora set equipotent with EN,and we should call .x

an infinite sequence, an element ofX'='(say).A familiar uncountable example is provided by a class of real-valuedfunctions x: R F-> R, so that A = R. In this case, x associates each point a e R

Rwith a real number x(a), and defines an element of the product R .

The product topology is now generalized as follows. Let (Xa, a e A) be anarbitrary collection of topological spaces. The Tychonoff topology (product

X h base the finite-dimensional open rectangles, setstopology) on the space X as asof the form Xas.4otx,where the Oa ? Xa are open sets and Os = Xa except for atmost a finite number of coordinates. These basic sets can be written as theinterkections of tinite collections of cylinders, say

-1-1

o ; o j'y;B = aajtoaj) f-h ... chzramt oo , .

for indices a1,...,a,a e A.Let 'r be a topology under which the coordinate projections are continuous. If Oz

-1 O ) e'r and hence ': contains the Tychonoff topology. Sinceis open in Xa, aa ( (: ,

this is true for any such 1, we can characterize the Tychonoff topology as the-1 O fonn theweak topology generated by the coordinate projections. The sets C:a ( a)

sub-base for the topology, whose hnite intersections yield the base sets.Something to keep in mind in these intinite product spaces is that, if any of

X i t Some of our results are true only for non-the sets Xctare empty, X s emp y.empty spaces, so for full rigour the stipulation that elements exist is desirable.

6.15 Example The space hdv)examined in j5.5 is an uncountable product spacehaving the Tychonoff topology; the unifonn metric is the generalization of themaximum metric p of (6.13).Continuous functions are regarded as close to oneanother upder dv only if they are close at every point of the domain. The sub-sequent usefulness of this characterization of C,du) stems mainly from the factthat the coordinate projections are known to be continuous. n

Thetwoessentialtheorems onproductspacesextendseparability andcompactnessfrom the factor spaces to the product. The following theorem has a generalizationto uncountable products, which we shall not pursue since this is harder to prove,and the countable case is sufticient for our purposes.

6.16 Theorem Finite or countable product spaces are separable under the producttopology iff the factor spaces are separable.

Proof The proof for tinite products is an easy implication of the countable case,hence consider X* = XT=fXf.Let Df = fdikndizn...l c Xfbe a countable dense setfor each i, and construct a set D c

X''O by detining

Page 124: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

104 Mathematics

Fm = Xojx fl lf=l i=m+ 1

(6.18)

for m = 1,2,..., and then letting D = U;=IF-. Fm is equipotent with the set ofm-tuples formed from the elements of the countable DI ,...,Dm, and is countable byinduction from 1.4. Hence D is countable, as a countable union of countable sets.

We will show that D is dense in X=. Let B = X7=IO/ be a non-empty basic set,with Oi open in niand Of = niexcept for a finite number of coordinates. Choose msuch that Oi = Xf fOr i > m, and then

m =

B ra Fm = X((?/rno/)x X ftl ) y, ?.=1 f=rn+1

(6.19)

recalling that the dense property implies Oi fa Di # 0, for = l,...,m. SinceB fa Fm B ?'7 D, it follows that B contains a point of D; and since B is an arbi-trary basic set, D is dense in X7 as required. w

One of the most powerful and important results in topology is Tychonoff'stheorem, which states that arbitrary products of compact topological spaces arealso compact, under the product topology. It will suftice here to prove the resultfor countable products of metric spaces, and this case can be dealt with using amore elementary and familiar line of argument. lt is not necessary to specify themetrics involved, for we need the spaces to be metric solely to exploit theequivalence of compactness and sequential compactness.

6.17 Theorem A finite or countable product of separable metric spaces (Xf,) iscompact under the product topology iff the factor spaces are compact.

Proof As before, the finite case follows easily from the countable case, so assumeX= = X:=1Xj, where the Xf are separable spaces. In a metric space, which is firstcountable, compactness implies seprability and is equivalent to sequentialcompactness by 6.8 and 6.7. Since Di is sequentially compact and tirst-countable,every sequence (xfa,n e INl on Xf has a cluster point xi on the space (6.4)kApplying the diagonal argument of 2.36, there exists a single subsequence ofintegers, (nk,k E l ), such that xink

-..-9

xi, for every i. Consider the subsequencein X*, lau, k s N) where xng = @1n:,xw,...). In the product topology, xy --

.x = (.x1,xo...) iff xink -- xi for every i, which proves that X* is sequentlallycompact. X= can be endowed with the metric px = Z7wl4/zf,which induces theproduct topology. X= is separable by 6.16, and sequential compactness isequivalent to compactness by 6.8 and 6.7, as above. This proves sufficiency.

Necessity follows fzom 6.9(ii), by continuity of the projections as before. w

6.18 Example The space R= (see5.14) is endowed with the Tychonoff topologyif we take as the base sets of a point .x the collection

N(x.,k, = (y: la)-yjl

< :, = 1,...,k1., k e N, er> 0. (6.20)A point in R* is close to a7in this topology if many of its coorclinates are close

Page 125: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology

to those of x; another point is closer if either more coordinates are within 6: ofeach other, or the same coordinates are closer than e, or both. The metric Jxdefined in (5.15)induces the topology of (6.20).If (xn) is a sequence in X,dxlxnvx)

-->

0 iff V z,k 3 N 1 such that xn e N(x;k,e) for a1l n k N. We alreadyknow that R= is separable under Jx (5.15),but now we can deduce this as a purelytopological property, since R* inherits separability from R by 6.16. (:1

The intinite cube (0,1j=shares the topology (6.20)with R= and is a compactspace by 6.17; to show this we can assign the Euclidean metric to the factorspaces (0,1j. The trick of metrizing a space to establish a topological propertyis frequently useful, and is one we shall exploit again below.

6.6 Embedding and Metrization

Let X be a topological space, and F a class of functions J: X F-> y. The evalua-tion p'lzw :: F-y X/rfyis the mapping defined by

exjf = xt. (6.21)The class F may be quite general, but if it were finite we would think of d(x) asthe vector whose elements are the J(x),J e F . (6.21)could also be written z:yoe

= f where ay is the coordinate projection. A minor complication arises because fneed not be onto y, and :(X) c Swy is possible. If 4 is a set of points in

-1 4) may contain points not in e(X). We thereforey, the inverse projection ly (need to express the inverse image of A under f, in tenns of c. as

-144)= (xyo:)-1(4)= :-1(ay-1(4) f...jc(x)) c y. (6.2:)fThe importance of this concept stems from the fact that, under the right condi-

tions, the evaluation map embeds X in the product space generayed by it. It wouldbe holeomorphic to it in the case c(X) = Skyy.6.19 Theorem Suppose the class F sparates points of X, meaning that fx) # J@)

d distinctpoints of X. If X is endowed with thefor some f G F whenever x an y areweak topology induced by F, the evaluation map defines an embedding of X intoly .J

Proof lt has to be shown that e is a 1-1 mapping from X onto a subset of Xrowhich is continuous with continuou: inverse. Since F separates points of X, e is

h never flx)#(.f@).for some f e F. To show continuity of1-1, since c@) # :(.y) w e- 1

e, note first that f (A) is open in X rwheneker A is open in y under the weaktopology, and sets of the form ly*(A) i

'lri Lkwise

open in X/ywith the producthe projecons are

'tihuoj.

Bltt e (z:y(A)) = (ayoc) (A) =topology, since t , , q , yqtjt ,$ ; ; ,

jy

y yk y yj g

J- (A), so we can conclude that thtelh,,,k;

tyrtj.,,,

,ytl,ii1.

gs under e of sets of the form. . ..

.'.

.. '-' ..::.i:t).).'

...'...'j;.ji(...

.T.'1.).j:lt.),q:..,.;i

.yj'.).yt)j(.(qgt...)

.y.E....:.

.

-.)... ...g..t.. . .

.. .

ay-1(4),4 c :y, are open. sinceinvrfs #ytyi,

yy'yt,jgyy,kytt,jyyyryltysy.,p,y,t;erytyrve unions and intersections(t.

.. .. .' ((..-;'(.;.L.L..E!''.j2.t().).(II'E ty( .rt. ..:E.(.... . :. y

...

..

..

(see 1.2) the same property extrh.d,jy)t,,.E.j ,,/'::t Eyyyhtttyyjt.ttjlyyyyyyyyjyyjrjyl,y.yyyt'jtb,t,,.,)

, se sets of Xfy,which are,

(..))t kygtk..qjlkyrltlqplttit,tt,i,))

tjjutuj tjo product topology, andf-iite intersections of these invfj pillwlcztt',

) ltjj,yttytyytyy;y

yyyjyy,yyyt(jjy y,y(

y.. . ..;.Ei...E-jy;;y.yjE(j..)y.....y''E,#..:/E.:,tytlr.-)()jtEj4jjkjp..j))j;)).jjjy)(4tr.)jk;)r...'...))'Ljjqj.jlqjtjlllq-kjLb.r))t.()),qy:,.

y1l;llrl.421,:1lr1.d::::1ki::1l;d:E:,h:El211.11.1!2!!F:1.421,:'(E:,.gllr)p.ki:rllr.. 154,.kE,:1t1!4,1(E:,.:1Er'.%...'.

''.-.';'.7ti1ir7.

l.rlijjif..'.rt;...'.7...,tj;-rtrL.)iLj;;Lb'-j)'3';'bb''jyyy.

..

,:...)8,<-..,s...... jjj7td-jjj,-ftt.&.-,.#:#*'f,ty.-')j.A.yy,y''-t::'..''.y)1I)k-jyjj)'.'''1.j.

).%',y...'y#)tyv-y..y,',.;;.bitiL.),)'..;(.).,j.y,tty.(

,-t.k..y..''.....t..

.. .. . ...

.'

,g.'tjd

.

j'

..jj

y'

tjjyja.'-1 ii.:55;4:2::4:::,,llr11l2ii.417:1.11.,1.4:::,,11.21.1bd;i.lq-4: 4:y#()iiL'154)

.4::2ii..p..

..'

.

'.......11#)i)...riEjptl-l,k(!r7qp)....b1'3+..

'-('.ij'

-.7)'t*..111!111k!!1--j:''yt,.;Ayz,t.,$44)14..- ...k'.

.#.:,jiii);.r'#..'y)yy:.'....TiTi.tj;)))1t-?-yy.'':;)'r.y(..y)j-.,$y.,',.y,yr.;..,-yj:..,.-p..

...:

.

#I!2l-y......ll:ll.i.1Ir71LL3ffL'..

-, (1(2..-,1ki!:1l:-:

ql::)p4k12:;EE1g

. '..-E:

f'.f..

':'.

:.);'

-:-

f'

;-

.,'

;'

.'..

.:

-''

!.-ki..

.

..

.

k'

;.. . .. .. -.

lkd::'i;t'

,. -,.

.'-.'

E.?).

r'

!-

-'

,. ...'...'

..

--f,,'

.. - .-.'

,'

.

.'

.

Page 126: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

106

-1 A) where A is open in y. Since F defines the topology on xset of the fomf ( ,

we know this set to be open, and the finite intersections of such sets form a base-1 i it will suffice to verifyfor X by assumption. Since e is 1-1 and e a mapp ng,

that their images under e are open in e(X). Noting that B is a set of the type-1 A) r''hd(x), but since ly-1(A) is open, eB) is open inshown in (6.22),eB) = ay (

'(x) as required. .

The following is the (forus)mostimportant case of Urysohn 's embedding theorem.

Mathematics

6.20 Theorem A second-countable F4-space (X,5) can be embedded in (0,1)*.(:I

The proof requires the sufticiency part of the following lemma.

6.21 Lemma Let x e X, and 1et O X be any open set containing x. 1ff X is aregular space, there exists an open set U with .x

e-t7

c 0.

Proof Let X be regular. If O is open nd x e 0, there exist disjoint open sets Uand C such that .x

e U and OC

c C, and hence CC 0. Since & CC by the dis-i tness and Cc is closed, we have V ?

CCg O and sufficiency is proved. Tojo n ,

prove necessity, suppose x e U and Vc 0. OC is a closed set not containing x, andOC

c-t7'

where U and 'P are disjoint open sets. Hence X is regular. w

Proof of 6.20 Let V be a countable base for . Since the space is T4 it is F? andhence reglar. For any x e X and B e Vcontaining x, we have by 6.21 a U e

'1 suchthat .z7

G U c #, and also by detinition of a base 3 A e V with .z'

e A c U c V,andhence x e X'c V c B. (Xis the smallest closed set containing A, note.) Since V iscountable, the collection of all such pairs, say

.4 = ((A,#): A e F, B e F; X'c BI, (6.23)is countable, and so we can label its elements A,B4i = (Aj,Sj), i = 1,2,... Everyx e X lies in Aj fr some i e IN.

Since the space is normal, we have by Urysohn's lemma a separating function :

X F-+ (0,IJ for each element of .W,such that fixi) = 1 and hB = 0. For each a'e X and closed set C such that .x # C, choose (Aj,#Jsuch that x e Xic Bi c Cc,and then h.@)= 1 and ficj = 0. These separating functions form a countableclass F, a subset of U(X). Since the space is F1, C can be a point (yl so that Fseparates points. And since the space is F3j and hence completely regular, ': isthe weak topology induced by F, by 6.14. It follows by 6.19 that the evaluationmap for F embeds X into (0,1)*..Recall that r0,IJ= endowed with the metric px defined in (5.19)is a compactmetric space. lt follows that :(X), which is homeomorphic to X under the eval-uation mapping by F, is a totally bounded metric space. lt further follows that(X,I) is metrizable, since, among other possibilities, it can be endowed with themetric under which the distance between points .x and y of X is taken to bepx(c@),c@)).We have therefore proved the Urysohn metrization theorem, 6.12.

The topology induced by this metric on g0,11=is the Tychonoff topology. A basefor a point p = (p1,p2,...)e g0,1)*in this topology is provided by sets of the

Page 127: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Topology 107

fonn

Nlp-nkte.t= 14./e l0,11*: I#/- qi I < E, i = 1,...,/c), (6.24)for some finite k, and 0 < 6: < 1, which is the same as (6.20).The topologyinduced on X by the embedding is accordingly generated by the base sets

Nx,.kte) = ly e X: I @)- fily)I < :, i = 1,...,/c1, (6.25)which can be recognized as finite intersections of the inverse images, underfunctions from F, of e-neighbourhoods of R; this is indeed the weak topologyinduced by F. This further serves to remind us of the close link between producttopologies and weak topologies.

Since metric spaces are F4, separable metric spaces can be embedded in (0,lj=by 6.20. In this case the motivation is not metrization, but usually compactitic-ation - that is, to show that separable spaces can be topologized as totallybounded spaces. Both metrization and compactification are techniques with impor-tant applications in the theory of weak convergence, which we study in Chapter 26.Although the following theorem is a straightforward corollary of 6.20, the resultis of suftkient interest to deserve its own proof; the main interest is to seehow in metric spaces there always exists a ready-made collection of functions todefine the weak topology.

6.22 Theorem A separable metric space (S,#) is homeomorphic to a subset of(0,11*.Proof Let Jo = J/(1 + #), which satisfies 0 f Jo f 1 and is equivalent to d, sothat (X,A)is homeomorphic to (X,#). By separability there exists a countable setof points (zf,i e EN) which is dense in X. Let a countable family of functions bedefined by fix) = dx,zi), i = 1,2,..., and define an evaluation map : X F-,

x jjg0,11 y

hx) = (%(mz1),J0(x,z2),...). (6.26)We show that h is an embedding in (g0,1J=,px)where px(,#) = 7=1Ihk - #kl llk. lf(x,,) is a squence in X converging to x, then for each k, dolxntz -- 4(x,zk)kAccordingly, V k,z HN 2 1 such that xn e Nxik,e) for a11n k N, poohxnlvhxjj--> 0, and h is continuous at x. On th other hand, if xn zz...hx, there exists : > 0such that Y N k 1, djxntx) 2 6: for some n k N. Since (z:) is dense in X, there isa k for which doxnnz k Y and J0(mzk) < /, so that 1doxnnz - doxvz I 21: and hence

1+1px((x,,),(x)) 2 /2 k (6.27)Since this holds for some n k N for every N k 1, it holds for infinltely many n,and hxn) zz-y (x). We have therefore shown that hxn) -- (x) if and only (Jxn--).x. This is the property of a 1-1 contingous function with cntinuous inverse. .

But note too the alternative approach of transforming th distance functions intoting functions as in (6t10),@nd applying6.20.separa

Page 128: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 129: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

11

PROBABILITY

Page 130: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 131: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

7Probability Spaces

Probability Measures

A random experiment is an action or observation whose outcome is uncertain inadvance of its occurrence. Tosses of a coin, spins of a roulette wheel, andobservations of the price of a stock are familiar examples. A probability space,the triple (D,1,#), is to be thought of as a mathematical model of a randomexperiment. f is the sample space, the set of all the possible outcomes of theexperiment, called the random elements, individually denoted . The collection 5of random events is a c-field of subsets of f1, the event A e 5 being said to haveoccurred if the outcome of the experiment is an element of A. A measure # isassigned to the elements of 5, PA) being the probability of A. Fonually, we havethe following.

7.1 Definition A probability measure (p.m.)on a measurable space (f1,1)is a setfunction #: 5 F--> (0,1j satisfying the axioms of probability

(a) #(A) k 0, for a11:4e T.

(b) #(D) = 1.(c) Countable additivity: for a disjoint collection tz4je @,j e IN) ,

# Uz4y= TPAj). n

Under the frequentist intepretation of probability, #(A) is the limiting case oftheproportion of a long run of repeated experimnts in which the outcome is in 4.Alternatively, probability may be viewed as a subjective notion with #(A) said torepresent n observer's degree of belief that A will occur. For present purposes,the intrpretation given to the probabilities has no relevance. The theory stands

or falls by its mathematical consisteny alone, although it is then up to us todecidewhether the results acord with ur intuition and are useful in the ana-lysis of real-world problems.

i he axioms.Additional properties of P follow om t

7.2 Theorem lf A, #, and (, j e Eq) am afbitrary T-sets, then(i) #(X) f 1.

(ii) #(AC)= 1 - #(A).

(iii) PZ) = 0. 'i t E

#() tpij,

ytiitity).(iv) A c # = #(A) f lty, yyy,y

.ji!!:::,'44:4)..j!!E111,(2)jI...-...k.

.jj!.::;i'

4(:(y.-t,d$p1k-2.-. .Ir$r7.y7,#kk.

.... ........))

L-i4qii.....

.....)-.(21,q.y..jit'');.(.y.--yE...tE.y..-

-t.(..-...y...ty-.t

. ..ty...;...t- y..-.....

. . .-' .

. . .(v)(Aw B) = pA) +y y jy,,j,yjykyyy,,y j,( y y

yyjgtry)j)jut,jj

kq$jjy

j j ,,y

j(vi) PojAj) f L)PAj) (ctllitAb tyt,tt,tjjyyytyyyylyyy,yyqyjyj.ytlyyttyyyyyy,ly

rytyrjjtyryy,y,

. . . ......),

y...

Page 132: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

112

(vii) Aj'r'

A or Aj t, A PAj) .-A #(A) (continuity).clMost of these are properties of measures in general. The complementation property(ii) is special to #, although an analogous condition holds for any finite

measure, with #(X replacing 1 in the formula. (iii)confinns P is a measure, onthe definition.

Proof Applying 7.1(a), (b), and (c),P(A) + PAc) = PA t.JAc) = #(f) = 1,

Probability

from which follow (i) and (ii),and also (iii)on setting A = (. (iv)-(vi)followby 3.3, and (vii)by 3.4. w

To create a prbability space, probabilities are assigned to a basic class ofevents C, according to a hypothesis about the mechanisms underlying the randomoutcome. For example, in coin or die tossing experiments we have the usual hypo-thesis of a fair coin or die, and hence of equally likely outcomes. Then, providedC is rich enough to be a determining class for the space, (f,@,#) exists by 3.8(extension theorem) where ?F= c(C).

7.3 Example Let Sp,1) = (# fn g0,1), # e f), where T is the Borel field on R.Then ((0,1), fm,1J, m), where m is Lebesgue measure, is a probability space,since ,7:(:0,1)) = 1. The random elements of this space are real numbers between 0and 1, an a drawing from the distribution is called a random variable. It is saidto be distributed rfnt/r/zl/y on the unit interval. The inclusion or exclusion ofthe endpoints is optional, remembering that /1((0, 1J) = r?2440,1)) = 1. n

The atoms of a p.m. are the outcomes (singletonsets of ) that have positiveprobability. The following is true for tinitemeasures generally but has specialimportance in the theory of distributions.

7.4 Theorem The atoms of a p.m. are at most countable.

Proof Let l be an atom satisfying P((1 J) #(( J)for al1 (0 6 f1, 1et (t'/2satisfy#(f tl)2)) k #4(1) for all e - f(t)1), and so forth, to generate a sequence with

#(l1l) 2 #(t2l) 2 #(tt%l) ... (7.3)The partial sums Z7=1#(fj)) form a monotone sequence which cannot exceed#(D) = 1, and therefore converges by 2.11, implying by 2.25 that limn-hxftf.t)a 1)=

0. All points with positive probability are therefore in the countable set f(, ie N l . .

Suppose a random experimentrepresented by the space (fk9,#) is modified so asto confine the possible outcomes to a subset of the sample space, say A c f1. Forexample, suppose we switch from playing roulette with a wheel having a zero slotto one without. The restricted probability space is derived as follows. Let 5%denote the collection fEcj A, E e @). 5%is a c-tield (compare1.23) called 5h andis called the trace of %on A. Defining #a() = #(m/#(A) for E e Fa, Ph can beverified to be a p.m.

'lhe

triple (A,Fa,#A) is called the trace of (f1,F,#) on A.

Page 133: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Probability Spaces

This is similar to the restriction of a measure space to a subspace, except thatthe measure is renormalized so that it remains a p.m.

In everyday language, we are inclined to say that events may be Aimpossible'

orcertain' . lf such events are none the less elements of @, and hence technically

random, we convey the idea that they will occur or not occur with tcertainty' byassigning them probabilities of zero or one. The usage of the tenn

tcertain' hereis deliberately loose, as the quotation marks suggest. To say an event cannotoccur because it has probability zero is different from saying it cannot occurbecause the outcomes it contains are not elements of f1. Similarly, to say an eventhas probability 1 is different from saying it is the event f1. In technical discus-sion we therefore make the nice distinction between sure, which means the latter,and almost sure, which means the former. An event E is said to occur almostsurely(a.s.), or equivalently, with probability one (w.p.1) if M = fl - E has prob-abilitymeasurezero.This terminologyis synonymouswithlfv/l/eveqwercta.e.)in the measure-theoretic context. When there is ambiguity about the p.m. beingconsidered, the notation 4a.s.(#j'

may be uded.

7.2 Conditional Probability

A central issue of probability is the treatment of relationships. When randomexperiments generate a multi-dimensioned outcome (e.g.a poker deal generatesseveral different hands) questions always arise about relationships between thedifferent aspects of the experiment. The natural way to pose such questions is:<ifI observe only one facet of the outcome, does this change the probabilities 1should assign to what is unobserved'?' (Skilled poker players know the answer tothis question, of course.)

The idea underlying conditional probability is that some but not all aspects ofa random experiment have been observed. By eliminating some of the possibleoutcomes (thoseincompatiblewithourpadialuowledge), wehave to consideronly

a pal't of the sample space. In (f2,1,#),suppose we have partial information aboutthe outcome to the effect that Gthe eventA has occurred', whereA e 5. How shouldthis knowledge change the probabilities we attach to other events? Since theoutcomes in Acare ruled out, the sample space is reduced from f to A. To generateprobabilities on this restricted space, define the conditional probability of anevent B as #(#IA) = #(A fa)/#(A), for A,B e @,#(A) > 0. #(. IA) satisfies theprobability axioms as long as P does and #(A) > 0. In particular, #(A IA) = 1, andP(Bc IA) = 1 - PB lA), since B /-74 and BcDA are disjoint, and their union is 4.The space (A,@x,fk), the trace of the set A on (D,T,#), models the random exper-iment from the point of view of an observer who knows that (l) e A. Events A and #are said to be dependent when PB IA) # #(#).

In certain respects the conditioning concept seems a little improper. A contextin which the components of the random outcome are revealed sequentially to anobserver might appear relevant only to a subjective interpretation of prpbability,and lead a sceptical reader to call the neutrality of the mathematical theory iptoquestion. We might also protest that a random event is random, and hak no business

Page 134: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

114

defining a probability space. ln practice, the applications of conditional proba-bility in limit theory are usually quite remote from any considerations of sub-jectivity, but there is a serious point here, which is the difficulty ofconstructing a rigorous theory once we depart from the restricted goal of predict-ing random outcomes a priori.

The way we can overcome improprieties of this kind, and obtain a much morepowerful theol'y into the bargain, is to condition on a class of events, a c-sub-field of 9. Given an event B e F, let the set function

Probability

/74#1 ):'

F-> (0,11

represent the contingent probability to be assigned to B after drawing an event Afrom N, where V i @.We can think of V'as an infonuation set in the sense that,for each A G V, an observer knows whether or not the outcome is in A. Since theelements of the domain are random events, we must think of #(# lN) as itself arandom outcome (arandom variable, in the terminology of Chapter 8) derived fromthe restricted probability space (f1,V,#). We may think of this space as a model ofthe action of an observer possessing information V, who assigns the conditionalprobability #(# lA) to B when be observes the occurrence of A, viewed from thestandpoint of another observer who has no prior infonnation. N is a c-tield,because if we know an outcome is in A we also know it is not inAc, and if we knowwhether or not it is in Aj for each j = 1,2,3,..., we know whether or not it is inU#.j. The more sets there are in V the larger the volume of information, all theway from the trivial set 5 = (f,0) (completeignorance, with 8#1 T) = #(#) a.s.)to the set 5 itself, which orresponds to almost sure knowledge of the outcome.In the latter case, P@ 11) = 1 a.s. if ( e B, and 0 otherwise. lf you know whether

8or not (J) G A for every A e @, you eftctively know (J).

7.3 Independence

A pair of events A, B e 5 is said to be independent if #(A ch#) = #(A)#(#), or,equivalently, if

PB jA) = #(#). (7.4)lf, in a collection of events C, (7.4)holds for every pair of distinct sets A andB from the cotlection, C is said to be pairwise indepenttent. In addition, C issaid to be totally independent if

(7.5)

for every subset # i containing two or more events. This is a stronger conditionthanpairwise independence. Suppose Cconsists of setsA, #, and C. Knowing thatShas occurred may not influence the probability we attach to A, and similarly forC) but thejoint occunrnce of B and C may none the less imply something about A.Pairwiseindependence implies thatfA fa B) =PIAIPCBI,PCA rn C) =#(A)#(C), andPB rn C' = PB)PC4, but total independence would also require P(A f'n B rn C) =

P(A)PB)PC).

#(.D-,A)

- J,'tA'

Page 135: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Probability Spaces 115

Here are two useful facts about independent events. In each theorem let C be atotally independent collection, satisfying ptf)xs .gA) = l7ze.y#(A) for each subset#' I C.

7.5 Theorem The collection C' which contains yt and Ac for each A e C is totallyindependent.

Proof It is sufficient to prove that the independence of A and B implies that ofAc and B, for B can denote any arbitrary intersection of sets from the collectionand (7.5)will be satisfied, for either A or

AC. This is certainly true, since if#(A f''h B) = #(A)#(#), then

PCACr'A#) = PAc ra #) + P(A fa #) - #(A)#(#)

= #(#) - P(A4PB4 = PAPB4. w (7.6)

7.6 Theorem Let () be a countable disjoint collection, and let t e collections

consisting of Bj and the sets of C be totally independent for each-j. Then, if B =

LjBj, the collection consisting of B and C is als independent.

Proof Let #' be any subset of C. Using the disjointness of the sr-ts of B, andcountable additivity,

P j rnxOs,A)- P (UyBj ra xOs,A)- XyP (#?ra zOssA)

=XyPCB?PtjsA)

= Plhj ,PA4. .

7.4 Product Spaces

Questionsof dependence and independence arise when multiple random eyperi-ments run in parallel, and product spaces play a natural role in the analysis ofthese issues. Let (f x E, 5 @V,#) be a probability space where 5 (*)V is thec-fieldgenerated by the measurable rectangles of (-1x E, and PL x E) = 1. Therandomoutcome is a pair (,4). This is no more than a case of the general theoryof j7.1 (wherethe nature of (l) is unspecified) except that it becomes possible toaskquestions about the part of the outcome represented by ) or ( alone. #04/7 =

#(Fx E) for F e F, and #E(G) = Pl x G4 for G e N, are called the marginal

probabilities.(1,1,#0) and (E,T,#E) are probability spaces representing anincompletelyobserved random experiment, with ) or k,respectively, being theonly things observed in an experiment generating (,().

On the other hand, suppose we observe ( and subsequently consider the experi-ment'of observing (t). Knowingtmeans thatforeachf x Gweknowwhetherornot(,() is in it. The conditional probabilities generated by tlzis two-stage exper-

b itten by a slight abuse of notation as #(Fl V), although strictlyimentcan e wr1 t events are the cylinders Fx E, and the elemts of thespeakingthe re evan

ditinninermfiold nre 0./ G for G G W. qia uzn n'zala' 'n uzrln enHzw+lxlv.- lllv,-Cnn

Page 136: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

116

#(Fx E lfl x T). ln this context, product measure assumes a special role as themodel of independence. In (f x E, 5 (& V,#), the coordinate spaces are said to beindependent when

PFx G) = #0(#)#s(G)

Probability

for each F e 5 and G e T.Unity of the notation is preserved since Fx G = (Fx E) fa (( x G). We can also

write #(Fx EIf x G) = /704/-), or with a further slight abuse of notation #(FIG) =

#n(/'), for any pair Fq 5 and G e N. Independence means that knowing ( does notaffect the probabilities assigned to sets of ;. Since the measurable rectangles

are a determining class for the space, the p.m. P is entirely detennined by themarginal measures.

Page 137: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

8Random Variables

8. 1 Measures on the Line

Let (f1,1,#) be a probability space. A real random variable (r.v.)is an 5lB-9 That is to say, aY4)) induces an inverse mappingmeasurable f'unction X: ( F.-: R .

-1 5 for every B e B where B is the linear Borelfrom B to 5 such that X (#) l ,

field. The term T-measurable r.v.' may be used when the role of B is understood.The symbol g.will be generally used to denote a p.m. on the line, reserving P forthe p.m. on the underlying space.

Random variables therefore live in the space (R,S,g,), where g. is the derived-1 X e B) The term distribution is syn-measure such that g(#) = PX (#)) = #( .

onymous with measure in this context. The properties of nv.s are special cases ofthe results in Chapter 3; in particular, the contents of j3.6 should be reviewedin conjunction with this chapter. If g: R F-> R is a Borel function, then gox =

-1p(A)) is also a r.v., having derived p.m. g,: according to 3.21.

If there is a set S e B having the property g(& = 1, the trace of (R,S,g) on Sis equivalent to the original space in the sense that the same measure is assignedto B and to B r''hS, for each # e S. Which space to work with is basically a matterof technical convenience. If X is a r.v., it may be more satisfactory to say that

2 is a r v. distributed on R+ than that it is distributed onthe Borel function X . ,

R but takes values in R+ almost surely. One could substitute for (M,S,g,) theextended space (F,S,g) (see1.22), but note that assigning a positive probabilityto infinity does not lead to meaningf'ul results. Random variables must be finitewith probability 1. Thus (R,f,g), the trace of (F,S,g) on R, is equivalent to itfor nearly a1l purposes. However, while it is always finite a.s., a r.v. is notnecessarily bounded a.s.; there may exist no constant # such that 1X(l I f B fora1l (l) e C, with #(D - C4 = 0. The essential supremum of X is

ess sup X = inf (x:#41XI > x) = 0),

and this may be either a finite number, or +x.

8.2 Distribution Functions

The cumulative distribution f'Ifrlcft??z @.b.f.)Of X is the function F: 2-F-> g0,1J,where

x < xj, x s F. (8.2)F(x) = g((-x, x1) =

. . . . ...:' l ( ( :

.

'

. . . .

we take the domain to be R-since it iq fty/, .t .:$j.i.

gn the values 0 and 1 to' ...- t( .. .. .q

j'q..(().(y)':)j)t..::.)' (Ejjj:-(.y.'.,

t.:.'.

....j

tk..j.,.((y....y

y..

... ...

.

.....(.. (('..t'jr(t.t:.....-'

..'.'' .))(-.3E,l'(i-)(j).jL?,j.y)y(y)jgE.t..-'(.:,.t.yy..f;.t

...yr.j(,?:jyj.jy.ji,j..r)()....,.E.y.....)..... .

.......)

. . . .

,,;.j..j;...(..j.

.. .. .

. ..

c . .

Page 138: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

118

F(-x) and F(+x) respectively. No other values are possible so there is no contra-diction in contining attention to just the points of R. To specify a distributionfor X it is sufticient to assign a functional form for F; p, and F are equivalentrepresentations of the distribution, each useful for different purposes. Torepresent g,(A) in terms of F for a set A much more complicated than an intervalwould be cumbersome, but on the other hand, the graph of Fis an appealing way todisplay the characteristics of the distribution.

To see how probabilities are assigned to sets using F, start with the half-openinterval (x,y)for x < y. This is the intersection of the half-lines (x,y) and(-x,

-xzf

= (.x,+x).Let A = (-=, x) and B = (-x,y), so that g(A) = F@) and g(#) =

F(y); then

g,(@,)/1)=BIAC

ra #) = 1 - p,tA tJ#C)

= 1 - (g(A) + 1 - g(#)) = g(#) - g(z4) = F@) - F@),

Probability

A and Bc being disjoint. The half-open intervals form a semi-ring (see1.18), andfrom the results of j3.2 the measure extends uniquely to the sets of f.

As an example of the extension, we detennine g,((x)) = PX = a) for x c R(compare 3.15). Putting a7 = y in (8.3)will not yield this result, since A cjAc =

0, not (xl . We could obtain fxl as the intersection of (-x, x) and rx,+x)=

(-x, xlc, but then there is no obvious way to tind the probability for the openinterval (-x, x) = (-x, xl - f.xl. The solution to the problem is to consider themonotone sequence of half-lines (-x, x - 1/n) for n e N. Since @- 1/p,

.:71

= (-x,x - lln ra (-x, aj, we have g,(@- 1/n,<) = F(x) - Fx - 1/n), according to(8.3). Since fx) = 0':nltx- 1/n, x), (a7Je T and g,((x)) = Fx) - F@-), whereF@-) is the left limit of F at x. Fx) exceeds Fx-j (i.e.F jumps) at the atomsof the distribution, points .z: with g,((xJ)> 0. We can deduce by the same kind ofreasoning that p,(@,y)) = F(y-4 - Fx), g((x,y)) = F@-) - F(x-), and that,generally, measures of open intervals are the same as those of closed intervalsunless the endpoints are atoms of the distribution.

Certain characteristics imposed on the c.d.f. by its definition in tenns of ameasure were implicit in the above conclusions. The next three theorems establishthese properties.

8.1 Theorem F is non-negative and non-decreasing, with F(-x) = 0 and F(+x) = 1,and is increasing at

.x

e R iff every open neighbourhood of x has positive measure.

Proof These are all direct consequences of the definition. Non-negativity is from(8r2), and monotonicity from 7.2(iv). F is increasing at x if Fx + :) > F@ - E)

for each E > 0. To show the asserted sufficiency, we have for each such E,

F@ + :) - F@ - :) k F((x + 8)-) - F(x - 6,) = g,(5'@,y)). (8.4)For the necessity, suppose g,(5'(x,E)) = 0 and note that, by monotonicity of F,

g,(5'(x,e))= F(@ + e,)-) - F@ - E) k Fx + :/2) - F@- :/2). w (8.5)The collection of points on which F increases is known as the sunnort of 11 IfA

Page 139: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

119

complement in R, the largest set of zero measure, consists of points that must a11liein open neighbourhoods of zero measure, and hencemustbe open.-rhe supportof

g, is accordingly a closed set.

8.2 Theorem F is right-continuous everywhere.

Proof For x l R and n 2 1, additivity of the p.m. implies

g((-x, .v + 1/nJ)= g,((-x, x1) + g,(@,x+ 1/n1). (8.6)As n

.-.+

x, g,(4-x, x+ 1/aJ)1,g.((-x, x1) by continuity of the measure, and hencelimn-yxjtttx, .v + 1/n1) = 0. lt follows that for er> 0 there exists Ne such thatg,(@,x + 1/nJ)< E, and, accordingly,

P,((-=, .Y1) f t((-=, x+ 1/n1) < p,tt-x, x1)+ E, (8.7)for n k Ne. Hence F(x+) = F(x), proving the theorem since x was arbitrary. .

If F@) had been defined as g,((-x, x)), similar arguments would show that it wasleft ontinuous in that case.

Random Variables

'W

Fig. 8.1

8.3 Theorem F has the decomposition

F(x) = F'x) + F'x) (8.8)

whereF'@) is a right-continuous step function with at most a countable number ofjumps,and r'(x) is everywhere continuous.

Proof By 7.4, the jump points of F are at most countable. Letting f-x1,x2,...)

denote these points,

Page 140: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

120 Probability

F'@) = X Fxi, - Fxi-j)xisx

(8.9)

is a step function with jumps at the points xi, and F'x) = Fx) - F'x) hasF(xi-) = Flxi) at each xi and is continuous everywhere. *

Fig. 8.1 illustrates the decomposition.This is not the only decomposition of F. The Lebesgue decomposition of g with

respect to Lebesgue measure on R (see4.28) is g = p,1+ g.2where g,l is singularwith respect to m (is positive only on a set of Lebesgue measure 0) and g2 isabsolutely continuous with respect to Lebesgue measure. Recall that g2(A) =

fxjdxo A e B , where Tisthe associated Radon-Nikodym derivative (densityfunction). If we decompose F in the same way, such that Ff(x) = w((-x,x1) for i

= 1 and 2, we may write Flx) = JQ.f(()J), implying that f@)= dFz /#( I(=x.Thismust hold for almost all x (Lebesgue measure), and we call F2 an absolutelycontinuous f'unction,meaning it is differentiable almost everywhere on its domain.F' S F1 since F1 may increase on a set of Lebesgue measure 0, and such sets can beuncountable, and hence larger than the set of atoms. It is customary to summarizethese relations by decomposing F' into two additive components, the absolutelycontinuous part &, and a component F3 = F'- F2 which is continuous and alsosingular, constantexcept on a setof zero Lebesgue measure. This componentcan inmost cases be neglected.

The collection of half-lines with rational endpoints generates B (1.21)andshould be a determining class for measures on @,f). The following theorem estab-lishes the fact that a c.d.f. defined on a dense subset of R is a unique represen-tation of g,.8.4 Theorem Let z be a tinite measure on (R,f) and D a dense subset of R. Thefunction G defined by

81) = IX(-=,-Y1), .<

e DGx) = (8.10)

F(x+), x e R - D

is identical with F.

Proof By definition, R q X'and the points of R - D are a11closure points of D.For each x e R, not excluding points in R - D, there is a sequence of points in Dconverging to x (e.g.choose a point from Sx,lln) r'AD for n g N). Since F isright-continuous everywhere on R, p,((-x,x)) = F(x+) for each x e R -D. wFinally, we show that every F corresponds to some g,,as well as every g to an F.

8.5 Theorem Let F: V --A E0,1) be a non-negative, non-decreasing, right-continuous function, with F(-x) =0 and F(+x) = 1. There exists a uniquep.m. g on(R,S) sqch that Fx) = g((-x,x1) for a11a7 G R. u

Right continuity, as noted above, correspopds to the convention of defining F by(8.2). If instead we defined F@) = g((-x,x)), a left-continuous non-decreasing F

Page 141: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Random Variables 121

would represent a p.m.

Proof Consider the function #: (0,11F- 2', defined by(4If) = inf (x: u < F(x) ) . (8.11)

4)can be thought of as the inverse of F; (40)=-x, (41)= +x, and since F is non-

decreasing and Iight-continuous, # is non-decreasing and left-continuous; # istherefore Borel-measurable by 3.32(ii). According to 3.21, we may define a meas-

-1 #) for each B e f, where m is Lebesgue measure on theBorelure on (R,f) by ?n4 (sets of (0,11.

ln particular, consider the class C of the half-open intervals (J,:1 for a11a,b

e R with a < b. This is a semi-ring by 1.18, and c(C) = B by 1.21. Note that

-1 4) = (/g:inf (x: u K F(x)) e (tz,h1) = (F((7),Fblj. (8.12)t) ((J,

For each of these sets define the measure

-1 :j)) = Fb) - Fa). (8.13)P,((J,#1) = /19 ((J,

The fact that this is a measure follows from the argument of the preceding para-graph. C is a determining class for (R,O), and the measure has an extension by3.8. It is a p.m. since g.(R) = 1, and is unique by 3.13. x

The neat construction used in this proof has othef applications in the theory ofrandom variables, and will reappear in more elaborate form in 922.2.The g'raph of() is found by rotating and reflecting the graph of F, sketched in Fig. 8.2; to seethe fonner with the usual coordinates, turn the page on its side and view in amirror.

Fig. 8.2

If F has a discontinuity at x, then 4)= x on the interval (F@-), F(x)1, and-1 F(x-), F(x)j. Thus, g,((x)) = -((F(x-), F@)1) = F(x) - F(x-), as41 (lxl) = (

required.On the other hand, if an interval (J,!4 has measure 0 under F, F is

Page 142: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

122

constant on this interval and # has a discontinuity at Fa) = F(b) = c (say).4)-1takes the value a at this point, by left continuity. Note that () (c) = (J,>1, sothat g,((J,:J) = mc) = 0, as require.

Probability

8.3 Examples

Most of the distributions met with in practice are either discrete or continuous.A discrete distribution assigns zero probability to a11but a countable set ofpoints, with F' = 0 in te decomposition of 8.3.

8.6 Example The Bemoulli (orbinal'y) nv. takes values 1 and 0 with fixed proba-bilities p and 1 -p. Think of it as a mapping from any probability space contain-ing two elements. such as

isuccess' and tFailure' <Yes' and tNo' etc. n; 1

8.7 Example The binomial distribution with parameters n and p (denotedBn,pl)is the disibution of the number of ls obtained in n independent drawings fromthe Bernoulli distribution, having the probability function

4 x n-xP(X = x) = p (1-p)

, x = 0,...,p. Ia (8.14)x

8.8 Example The limiting case of (8.14) with p = hln. as n-..-y

x, is the Poissondistribution, having probability function

1 -h,

x () j z (8.15)P(X = x) cp-w

. , x =, , ,...

X'.

This is a discrete distribution with a countably infinite set of outcomes. D

In a continuous distribution, F is absolutely continuous with FI = 0 in theLebesgue decomposition of the c.d.f. The derivative f = #F/#x exists a.e.(>1 on R ,

and is called the probability densit.v function(p.d.f.)of th p.m. According tothe Radon-Nikodym theorem, the p.d.f. has the property that for each E e f,

g,(f) = jyflvdx. (S'16)

8.9 Example For the unkf'orm distribution on g0,11 (see7.3),

0, x < 0

F@) = x, 0 f .x S 1.

1, x > 1

(8.17)

The p.d.f is constant at 1 on the interval, but is undefined at 0 and 1. E1

8.10 Example The standard normal or Gaussian distribution has p.d.f.

1-x2/2

f(x) = e-x < x < +.= ,

/-2a '(8.18)

Page 143: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Random Variables 123

whose graph is the well-known bell-shaped curve with mode at zero. n8.11 Example The Cauchy distribution has p.d.f.

1fxq =

,-cx, < x < +x,

zr(1+x2) (8.19)

which, like the Gaussian, is symmetric with mode at 0. n

When it exists, the p.d.f. is the usual means of characterizing a distribution.A particularly useful trick is to be able to derive the distribution of gzk' fromthat of X, when g is a function of a suitable type.

8.12 Theorem Let #: S F-->1- be a 1-1 function onto 7, where S and 1- are open-1 b tinuously differentiable with dhldy # 0 forsubsets of R, and let h = g e con

all y e 7. lf X is continuously distributed with p.d.f. fx, and F = #(.X), then Fis continuously distributed with p.d.f.

s(,)-

.fx(,-1(,))

),I. u (8.20)

The proof is an easy exercise in differential calculus. This result illustl-ates3.21, but in most other cases it is a great deal harder than this to derive aclosed form for a transformed distribution.

8.13 Example Generalize the uniform distribution (8.9)from (0,1) to an arbitraryinterval gtz,:1.The transformation is linear,

y = a + b - Jlx, (8.21)-1 d jjxd on gc,)j byso that fyy) = b - a) on (J,:), by (8.20).The c.d.f. is e

F@) = (.y- allb - a). (8.22)Membership of the uniform family is denoted by X - &(c,:1. n

8.14 Exapple Linear transformations of the standard Gaussian r.v.,

X = jt + cZ, c > 0, (8.23)generate the Gaussian family of distributions, with p.d.f.s

1-tx-jtlz/acz

fx'' g.,0-)= e ,'-= <

.7

< += . (:I/--2ac

(8.24)

The location parameter ytand scaleparameter have better-known designatiops asmoments of the distribution; see 9.4 and 9.7 below. Membership of the Gaussian

2family is denoted by X - #(g,,c ).8.15Example A family of Cauchy distributions is generated from the standardCauchy by linear transformations X = v + Z, > 0. ne family of p.d.f.s withlocationparameter v and scale parnmeter take the form

Page 144: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

k=N Probability

1 1J@;v,) =

k''g z ,-= < .z7 < +x. n

1+ r(x- v)/)(8.25)

8.16 Example Consider the square of a standard Gaussian nv. wit p.= 0 and c = 1.Since the transfonnation is not monotone we cannot use 8.12 to detennine thedensity, strictly speaking. But consider the Khalf-normal' density,

2J(ff), u k 0zlt&l = (8.26)

0, u < 0

where f is given by (8.18).This is the p.d.f. of the absolute value of a Gaussian2 is 1-1 so the p.d.f. of Z2 isvariable. The transformation g Iu I) = u ,

1 -un -1/2

() < u < .fz24&)= e u , ,

W2V(8.27)

applying(8.20).This is the ch-squared distribution with one degree of freedom,241).It is a member (withp = (x = ) of the gnmma family,or z

f! -ea

-1

.G(?z; ',pj = vpe((xI/)P

, 0 < u < x, a > 0, p > 0, (8.28)

herej-+4 = J7(P-1e-Q(is the gammafunction, having the properties F() = W2'V,W

and 1R?zl= (n - llF(n - 1). c!

8.4 Multivariate Distributionsk h k-dimensional Borel field Tk is c(S$ where RkIn Euclidean k-space R , t e ,

k h ts of the form #1 x B1x ... x Bkdenotes the measurable rectangles of R , t e sewhere Sf e B for f = 1,...,k. In a space (D,@,#), a random vector (msXi,...,.V)'= X is a measurable mapping

kX1 fl .-+ R .

k d E s 5 thelf t is the derived measure such that g,(A) = PE) for A G T an ,

ltivariate c.d.f., F: Fk ..-+

(0,1), is defined for (x1,...,x' = x bymu

F@) = g,((-=,x11x ... x (-x, aJ). (8.29)The extension proceeds much like the scalar case.

8.17 Example Consider the random pair (X,D.Let F(x,y) = g((-x, XJx (-x, y1).The measure of the half-open rectangle (x,x+ Ax1x (z,y +Ay) is

M7x, y) = F@ + Ax,y + Ay) - Fx + Ax,y) - F@, y + Ay) + Fx, y4. (8.30)2 illustrated in Fig. 8.3:To show this, consider the four disjoint sets of R

Page 145: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Random Variables 125

:4 = @,x + AXJx @,y + AyJ,C = (x,x+AXJx (-x, yq,

B = (-x, x) x (.y,y +Ay),

D = (-x,x) x (-x,yj.A is the set whose probability is sought. Since #(A wB tp Ck.)D) =

Fx+ Ay + Ay), #(# QJ D) = Fxby + Ay), PCCwD) = Fx + Ax,y), and #(D) =

Fx, y), the result is immediate from the probability axioms. n

Fig. 8.3

Extending the approach of 8.17 inductively, the measure of the k-dimensionalle Xil@f,xi + AQ can be shown to berectang ,

AF(I1,..., A'k) = 77(:f..6/), (8.31)j

k d the F are the values of F at each ofwhere the sum on the right has 2 terms, an jthe verices of the k-dimensional rectangle extending from @1,....q)'with sidesof length Mj, i = 1,...,k. The sign pattern depends on k', if k is odd, the Fjhaving as arguments even numbers of upper vertices (pointsof the form xi + A'&))take negative signs, and the others positive; wilile if k is even, the Fj with oddnumbers of upper vertices as arguments are negative. Generalizing the monotonic-ity of the univmiate c.d.f., F must satisfy the condition that AF@1,...,

.q)

be' R d (A.x Ax )' s (R)+non-negative for every chice of (x1,...,

.q)

e an 1,..., k .

Applying 3.19 inductively shows that the class of-dimensional

half-open rect-k.angles is a semi-ring, so that the measure defined by F extends to the sets of T ,

Bk is a probability space derived from L1,5,P).hence (R ,

,g)

If the distribution is continuous with p.d.f. flx),Fubini's iheoremgives

Fx) =

.f((1,...,(z)t:61

..

.dlk

(-x,xl1x,.,x(-x,xk)

xl xk

= ... J(k1,..

.,4:2)t2)1 ...#(k.

(8.32)

Theorem 8.12 has the following generalization. A dteomorphism (also,coor-

Page 146: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

126

k lli hdinate transformationj is a function g: S F- -T(S and 7 open subsets of R ) w cis 1-1 onto and continuously differentiable with etglx') # 0 for all x e S,where glx' is the Jacobian matrix whose (f,.j)thelement is gllxi for i,j =

1,...,1. The inverse of a diffeomorphism is also continuously differentiable.

8.18 Theorem If F = g where g is a diffeomorphism, the p.d.f. of F is

-1 yj (gag)h%l= flg (.:))1 .

- 1 j 'j

uwhereJ = detl: ) y .

Probability

This is a standard result in the theory of multiple Lebesgue integrals (seee.g.Apostol 1974: 15.10-15.12).

8.19 Example Letting.f

denote the standard Gaussian p.d.f. (see8.10), considerk

9(z)= 1-1lzi) =(zzo-Wexpf -z1(z'z)).

(8.34)i=1

This is a k-dimensional p.d.f., and the corresponding random vector Z =

(z1,...,Zk)', is the standard Gaussian vector. The affine transformation

X = AZ+ !z, (8.35)whereA (kxk nonsingular) and M kx 1) are constants, is 1-1 continuous with

-1X - p.) having J = IA-1 l = 1/lA I. Detine E = 4A' such thatinversez = A ( ,

-1 .:4.-1 ,4:4..)-1= x-1 and j j4 j

-1

j = jEI-1/2

the positive square root being(A ) = ( , ,

nderstood.Applying 8.18 produces

-1 1ix, = 9(A @- p.))

jx Ilnl-mj14-1Ijexpf

-yx

- Jz)'(A-l)'A-1(x - Izlj.

-1/2 x j-12

t-j(x

.s;,x-1(.x

.s)

j.= (21) j exp (8.36)

This is the multinormal p.d.f., depending on parametets p. and X. Every suchdistribution is generated by an affine transform applied to Z. Membership of themultinormal family is denoted X - N#,X). n

8.5 lndependent Random Variables2T1 interested only inSuppose that, out of a pair of r.v.s (.X,1')on @ ,

,g),

we arepredictingX. In this situation the events of interest are the cylinder sets in

2R , having the form B xR, B e S. The marginal distribution of X is detined by(R,S,g,x) where

p,x(X) = t(X X R) (8.37)for W e T. The associated marginal c.d.f. is Fx = F@?+x).

The notion of independence defined in 97.4specializes in the following way.xand

Page 147: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Rattdom Variables

F are called independent r.v.s iff

pt(XX #) = >x(A)g.l'(S) (8.38)for al1 pairs of events A,B e f, where g.x is detined by (8.37)and g.y is anal-

ogous. Equivalently, g, is the product measure generated by g,x and yty.

8.20 Theorem X and J' are independent iff for each x,y G R

Fxnyq = Fx@)Fy(X. (8.39)lf the distribution is continuous the p.d.f. factorizes as

x,L = fxxjhb. (8.40)

Proof Obviously, (8.39)is true only if g,satisties (8.38).The problem is to showthat the former condition is also suftkient. Consider the half-open rectangles,

C = ((tz,:)x (c,#1,a.c e R-,b,d e R ).lf ahd only if (8.39)holds,

p,((J,/8 X (c,#1)= Fb,dl - F(:,c) - Fa,dl +Fa,c4

= Fxbq - Fxabttdj - Fv)

=g,x((J,:1)p,i'((c,tf1), (8.41)

7 C is a determining class for (R2 S2) and gwhere the first equality is by 8.1 . , ,

is detined by the extension of the measure satisfying (8.41)for ordinary rect-angles or, equivalently, satisfying (8.39).The Extension Theorem (uniquenesspartl shows that this is identical with the product measure satisfying (8.38).Theextension to p.d.f.s follows directly from the detinition. .

With more than two variables there are alternative independence concepts- k pk itj to(compare j7.3). Variables m,...,Xk distributed on the space (R ,

,g.)

are sabe totally independent if

(8.42)

for all k-tuples of events A1,...,Akl 0. By contrast, pairwise independence canhold between each pair Xi&. without implying total independence of the set.Another way to think of total independence is in terms of a partitioning of avector X = X3....,X' into subvectors .Y1 (jx 1) and Xz k

-j4

x 1) for 0 < j< k. Under total independence, the measure of X is always expressible as theproduct measure of the two subvectors, under a11orderings and partitionings ofh lements.t e e

1 k

p,XAf = l-lp,xJAd)f=1 f=1

Page 148: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

9Expectations

9. 1 Averages and lntegralsWhen it exists, the expectation, or mean, of a r,v. X()l in a probability spaceL1,5,P) is the integral

EX4 = Joxtovptl' (9.1)

F(m measures the central tendency of the disibution of X. It is sometimesidentified with the limiting value of the sample average of realized values xtdrawn in n identical random experiments,

1 n

xn = -.t,

>1(9.2)

as n becomes large. However, the validity of this hypothesis depends on the methodof repeating the experiment. See Part IV for the details, but suffice it to say atthis point that the equivalence certainly holds if F(A') exists and the randomexperiments are independent of one another.

The connection is most evident for simple random variables. lf X = jxjjej

where the ()) are a partition of f1, then by 4.4,

/?(r)cr !F7osllj).j

When the probabilities are interpreted as relative frequencies of the events Ej =

(: X'4(0)= z)) in a large number of drawings from the disibution, (9.2)wlthlarge n should approximate (9.3).The values xj will appear in the sum in aproportion roughly equal to their probability of occurrence.

F(m has a dual characterization, as an abstract integral on the parent prob-ability space and as a Lebcsgue-stieltjes integral on the line, under the deriveddistribution. lt is equally conrct to write either (9.1)or

+x

EX4 = xdFxx). (9.4)

Which of these representations is adopted is mainly a matter of convenience. lf1x() is the indicator function of a set A G T, then

'(1,4m= jgxldlk= Jx(x).x#F@), (9.5)

Page 149: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 129

where X(A) e B is the image of A under X. Here the abstract integral is obviouslythe more direct and simple representation, but by the same token, the Stieltjesfonn is the natural way to represent integration over a set in B.

If the distribution is discrete, X is a simple function and the formula in (9.3)applies directly. Under the derived distribution,

F(m =

'k-xj

p,(ta)l), (9.6)

where xj, j = 1,2,..., are the atoms of the distribution.

9.1 Example If X is a Bernoulli variable (8.6),EX) = 1.# + 0.(1-#)

= p. n9.2 Example If X is Poisson (8.8),

x ax x az-1

F(m = e- x'A'

= e-,E'A'

= ,. uWx. (x - l )1.

vl vl(9.7)

For a,continuous distribution, the Lebesgue-stieltjes integral of a' coincides withthe integral in ordinary Lebesgue measure of the function xfx).

9.3 Example For the uniform distribution on the interval gtz,)1(8.13),1 b

F(m = xdx = 1a(c + b). nb - a a(9.8)

9.4 Example For the Gaussian family (8.19),

1-(x-jt)2/2o.2

F(A3 = xe #? = g.Wfic(9.9)

This can be shown by integration by parts, but for a neater proof see 11.8. I:a

ln a mixed continuous-discrete distzibution with atoms x1,x2,..., we can use thedecomposition F = F1 + F2 where F1(x) = Zouxgltlayl)and F1(x) is absolutelycontinuous with derivative hx). Then

EX) = 77.)p,1(tA)l) + jxhxtdx. (9.10)

The set of atoms has Lebesgue measure zero in R, so there is no need to excludethese from the integral on the right-hand side of (9.10).

Some random variables do not have an expectation.

9.5 Example Recall the condition for integrability in (4.15),and note that forthe Cauchy distribution (8.11),

1 +u jxj

-aJ-a(1 + x)2dx -- '>9 as f'

-->

=. El (9.11)

Page 150: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

130

9.2 Expectations of Functions of X

lf X is a r.v. on the probability space @,f,g), and gl R F-- R is a Borel func--1 d in j8.1. This leadstion, goX =

'(m is a nv. on the space t'R,f ,g.' ), as noteto the following dual characterization of the expectation of a function.

9.6 Theorem If g is a Borel function,

Probability

F('(A')) = jglxtdbkxb=/dB#-1t3')' (9.12)

Proof Define a sequence of simple functions &n):R+ F- R+ by

m i - 1z(a)(x)= 77 n

1s,.@),I 2

(9.13)

where m = nln + 1 and Bi = (2-&(- 1), 2-'') for = 1,...,-. Then, Z(n)(x)'t

.x

-1 jfor x k 0, by arguments paralleling 3.28. According to 3.21, (R,S,pw ) s a-1B) = g(#-1(#)) for B e S, and so by the monotonemeasurespace where g' (

convergencetheorem,

-l(y). y;m - 1 -1(sj)

.-.y jydvg-t).JZ(n)(A')t*'

zn>'

f=1(9.14)

Consider first the case of non-negative g. Let 1s@) be the indicator of the set BG 0, and then if g is Borel, so is the composite function

1, glx) 6 B(1so')@)= = 1#-l(s)(x).

0, glx) e B(9.15)

Hence, consider the simple function

m i - 1 m i - 1(Z(,,)o#)@) = 77 n (1sfo#)(x)= 77 n1#-1(sp@). (9.16)

=1 2 f=I 2

By the same arguments as before, Znjog'1'

g, and Elznlog) -->

Eg) = JWg.However,

m j . j .j

Eznjog) = 77 ng,(# Bi))

j=1 2

m i - 1 - j - 1

=X - g,# Bi) = ztalllfp.' (A),gn=1

(9.17)

and (9.12)follows from (9.14).To extend the result to general g, consider the non-negative functions g' =

maxf ',0J and g- = g'b -

g separately. lt is immediate that

Page 151: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 131

E(Zn)og+4 - Eznjog-) -- Eg') - Eg-) = Eg), (9.18)

so consider each component of this limit separately.

m j.j ..j

Eznlog-) = >7 p,((#+) Bi)42/=1

m j-1 -1

w;;-..y jooycyg-jo),= X ptt# ( i

2* 0=1(9.19)

+-1 -1

f i k g sjnce thewhere the second equality holds because g ) (Bi) = g Bi) orelements of Bi are a11 positive for these cases, whereas for i = 1 the tel'mdisappears. Similarly, -Zn(x) 1 x for .z'

< 0, and

?rj j.j .j

-znjog-)-

-7r

p.((#-) Bi4)2/f=1

m i- 1 -1 0 ..j

--:7

znp,t, (#7)) ---> j--ydv.g(T),

j=1(9.20)

where B; = (-2-&,-2-''(f

- 1)1, and in this case-

-1 -1

j j u z uencebecause (g ) (#j) = g (-Bi) or .

the second equality holds

+)- Ezn,og-) -- j-ydjw-lol,Ezn,og (9.21)

and the theorem follows in view of (9.18).wThe quantities F(X),for integer k 2 1, are called the moments of the distri-

bution of X, and for k > 1 the central moments are defined by

k pEX - Exlk = X

- Eujh-kjxllk-ij

- (9.22)

2 2 2 jA familiar case is the variance, Vartm = E(X- F(m) = EX ) - F(A') , the usuameasure of dispersion about the mean. When the distribution is symmetric, withPX- F(A') e A) = #(F(m - X e 4) for each A e tB,the odd-order central momentsare all zero.

9.7 Exampl. For the Gaussian case (8.14),the central moments are

k'sk k even,k kll '

EX - g) = 2 (kll)! (9.23)

k odd.

This formlamay be derived after some manipulation from equation (11.22) below.2 d 11 the finite-order moments exist although the sequenceVartm = c , an a

increases monotoniclly. n

Page 152: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

132

The existence of a moment of given orderrequires the existence of the correspond-ing absolute moment. If FlXlP < =, for any real p > 0, X is sometimes said tobelong to the set Lp (offunctions Lebesgue-integrable to order p), or otherwise,to be Lp-bounded.

9.8 Example For X - N(0,c2), we have, by (8.26),

Probability

y)-1/2j*.x e-xllloldxFlXl = 2(2a()

l?2= (2/1) c. n

(9.24)

'raking the corresponding root of the absolute moment isconvenient forpumoses ofcomparison (see9.23) and for X e Lp, the Lp-ttorm of X is defined as

11111= (FI#I')'* (9.25)p

The Gaussian disibution possesses a11finite-order moments according to (9.23),but its support is none the less the whole of R, and its p-norms are not unlrmlybounded. If IIXII,has a finite limit as p -- x, it coincides with the essential

supremum of X, so that arandom variable belonging tofaxis bounded almost surely.

9.3 Theorems for the Probabilist's Toolbox

The following inequalities for expected values are exploited in the proof of innu-merable theorems in probability. The first is better known as Chebyshev 's inequal-f@ for the special case p = 2.

9.9 Markov's inequality For er> 0 and # > 0,

Fj aYjf#( l11 k f:) .

: P(9.26)

Proof tpp l.Y1 :) =Epjj

yvdFx) K JjzjzalxlPdFx) f E 1115.*Al

This inequality does not bind unless F1X1PIZP < 1, but it shows that if Fj -Yjr < =,

the tail probabilities converg to zero at the rate z-P as f:...-4

x. The order offv-boundedness measures the tendency of a distribution to generate outliers. TheMarkov inequality is a special case of (at least) two more general inequalities.

9.10 Corollary For any event A e F,

j dp u j IxI,g,. (9.27)EArnt lllkcl A

Equivalently, #((: IX()) I :1 fa A) f F(1z lXI#)/e'.Proof Obvious from 9.9. .

9.11 Corollary Let g: R F-> R be a function with the property that x a impliesgxj ga) > 0, for a given constant a. Then

Page 153: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 133

&'(m)PX k a) < . . .

ga)(9.28)

Proof gajpx k a4 =gajxyadFx) S jxyggxldFx) K EgT). w

An increasing function has the requisite property for all a > 0.Let 1 ERbe any interval. A function (: I F-.: R is said to be convex on 1 if

9((1- )x +,)?)

f (1- )(x) + *(.y) (9.29)for a11x, y e I and . e (0,11. lf

-9

is convex on t # is concave on 1.

9.12 Jensen's inequality lf a Borel function (1)is convex on an interval 1 contain-ing the support of an integrable r.v. X, where ((m is also integrable,

9('(A')) f f@(A')). (9.30)For a concave function the reverse inequality holds. n

The intuition here is easily grasped by thinking about a binary r.v. taking values

xl with probability p and ,:2 with probability 1 -p. A convex 4)is illustrated inFig. 9.1. #(m = px3 + (1

,-pjxz,

whereas F@(A')) = p#(x1) + (1-p)9(x2).

Thispoint is mapped from F(m onto the vertical axis by the chord joining .x1 and -:2

on9, while 9(F(m) is mapped from the same point by (j itself.

Fig. 9.1

A proof of the inequality is obtaied from the following lemma. Let 10 denote theinterior of 1.

9.13 Lemma If 4)is convex there exists a function A@) such that, for a11.x

e 10and y e 1,

X(-:)@-x4

f #(A')- 9(*. (9.31)Proof A convex function possesses right and left derivatives at all points of I0.

Page 154: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

134

This follows because (9.29)implies for h > 0 that

Probability

(9.32)

The sequence tn(#tx+ 1/n) - (T)(x)),n e N) is decreasing, and has a limit 9;(x).In the case < 0 the inequality in (9.32)is reversed, showing the existence of4,-/(x)as the limit of an increasing sequence. Note that ()-'@) < (J(x).Tnking thelimit as 2. 4. 0 with > 0 tixed in (9.32)and y =

.z7

+ h gives

t)-'@)tJ'-x4

S K(-:')(J'-xt

S t)(A') - 9(.1) (9.33)whereas the parallel argument with h < 0 gives, for y < x,

;(.z7)@-x)

S #-'@)1-x)

; th(A')- t)@) (9.34)lnequality (9.31)is therefore satisfied with (say)A@) = #+'(x)..

Proof of 9.12 Set x = F(m, y = X in (9.31)to give

A('(m)(X-f(m) f #(m- #(-)). (9.35)Taking expectations of both sides gives inequality (9.30),since the left-handside has expectation zero. w

Next, we have an alternative approach to bounding tail probabilities which yieldsthe Markov inequality as a corollary.

9.14 Theorem If X is a non-negative r.v. and r > 0,

(x + kh) - 9(x)s(j(.7 + h) - #(x) N s ((),jj.kh h

'

r)-rj-xr-zpx

> xvx.EX0

(9.36)

Proof lntegration by parts gives, for some b > 0,

joxrdFx)= bryb) -

rjoxr-ypdx

V--l(s(,)- Fpldx-

rjo

r-lp < x u dx.= r x (.x0

(9.37)

The theorem follows on letting b tend to infinity. .

If the left-hand side of (9.37)diverges, so does the light, and in this sense thetheorem is true whether or not EX) is finite.

9.15 Corollary lf X is non-negative and integrable,

-xdF- spx k o + j-px> xvx.J- -

(9.38)

Page 155: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 135

Proof Apply 9.14 Fith r = 1 to the r.v. 1(xk:)X. This gives

'dF= J-tlfxkslx> xvxJ- o

$(1(xk:)X> xltfx +

j*PX> xldx=Jo :

E c

= #(X k ejodx +jvP(X

> xjdx. . (9.39)

Not only does (9.38)give the Markov inequality on replacing non-negative X byIXI; for p > 0 and arbitral'y X, but the error in the Markov estimate of the tailprobability is neatly quantified. Noting that P( IXI E) = 81.11#2 zP),

##( I-Yl k e) = j-IXIJWF - j-P IA-I'> xldxs # gPe

# px

= FIXl# - j IXIPJF - I #(IXIP > xqdx,0 J eP

(9.40)

where both the subtracted terms on the right-hand side are non-negative.

9.4 Multivariate Distributions

From one point of view, the integral of a function of two or more random variables

presents no special problems. For example, if2 IRgl R F-y

-1 Bl for every B e B, thenis Borel-measurable, meaning in this case that g B4 e() = #(#((l)),F()) is just a F/f-measurable r.v., and

F('(X,F)) = Jjatldptl (9.41)

is its expectation, which involves no new ideas apart from the particular way inwhich the t.v. hlk hppens to be defined.

Alternatively, the Lebesgue-stieltjes form is

Eg-) =jpzgx'ybdFx'L'

(9.42)

where dFxny) is to be thought of as the limiting case of &F@,yl defined in(8.30) as the rectangle tends the differential of area. When the distribution iscontinuous, the integral is the ordinary integral of '(x,y)J@,y)with respect toLebesgue product measure. According to Fubini's theorem, it is equivalent to aniterated integral, and may be written

+x +x

&'(X,l?)) = J.xj-yqx,ytx'ybdydx.(9.43)

But caution must be exercised with formula (9.42)because this is not a double

Page 156: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

136

lFx y) instead ofintegral in general. lt might seem more appropriate to write d ,

JF(x,y), but except in the continuous case this would not be correct. The abstractnotation of (9.41)is often preferable, because it avoids these ambiguities.

In spite of these caveats, the expectation of a function of (say)X alone can inevery case be constructed with respect to either the marginal distribution or tejoint distribution.

9.16 Theorem EglxL = jvz#(X#F(mA') = jpgxjdklxj.Proof Define a function

Probability

+ R2 (Rg : b..A

* 11 e R :*-14#) is a cylinder in R2 with base :-1(#)by setting g @,y)= glx), a y .

* i Sz/s-measurable. For non-negative g, lete B for B e S, and g sm j. . j

* = ls#(a) n il=1

(9.44)

n 1 d E = (@y): 2-''(f - 1) S #*(x,y) < 2-NI') e Bl Since Ei =where m = nl + an i , .

.j xR where Af = (x: l-nli - 1) K #(x) < 2-% l , and tx(Af) = g,(&),

m i - j m i - jF(#*(,')) = X n

g,t&l = X np,x(A) = Elgnlj.

f=1 2 =1 2(9.45)

By the monotone convergence theorem the left and right-hand membrs of (9.45)converge to E(g.) = *x,yldFx,y) and Egj = xjdFxxj respectively. Extendfrom non-negative to general g to complete the proof. *

The means and variances of X and F are the leading cases of this result. We alsohave cross moments, and in particular, the covariance of X and 1' is

Cov(X,F) = E((X- &m)(1'- F(F))) = ECXT- F(mF(F). (9.46)Fubini's theorem suggests a characterization of pairwise independence:

9.17 Theorem lf X and F are independent nv.s, Cov@(m,v(F)) = 0 for al1 pairsof integrable Borel functions ( and v.Proof Fubini's theorem gives

F(#(mv(F)) = jp,#@)v(y)#F@,y)

=jpxtdlxjvlsbdFkl-bq

=&#(m)F(v(F)). w (9.47)The condition is actually suffictent as well as necessary for independence,although this cannot be shown using the present approach; see 10-25 below.

Page 157: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 137

Extending from the bivariate to the general k-dimensional case adds nothing ofsubstance to the above, and is mainly a matter of appropriate notation. lf X is arandom k-vector,

F(m =jxdFx) (9.48)

denotes the k-vector of expectations, EXi) for = 1,...,k. The variance of ascalar r.v. generalizes to the covariance matrix of a random vector. The k x kmatrix,

2X1 X1X2 . . . X3Xk

2X2X1 X2XX' =

2Xkxj Xk

(9.49)

is called the outer product of #, and ECXX') is the k x k positive semi-detinitematl'ixwhose elements are the expectations of the elements of XX'. The covariancematrix of X is

Vartm = ELCX- F(m(# - F(m)'q = ECXX') - ExTEX4'. (9.50)Vartm is positive semi-detinite, generalizing the non-negative property of ascalar variance. lt is of full rank (notwithstandingthat XX' has rank 1) unlessan element of X is an exact linear function of the remainder. The followinggeneralizes 4.7, the proof being essentially an exercise in interpreting thematl'ix formulae.

9.18 Theorem lf F= BX+c where X is an k-vector with EX4 = g and Vartm = E,and B and c are respectively an m x k constant matrix and a constant v-vector,then

(i) ElY) = B# +r.

(ii) Vartp = BSB'. a

Note that it m > k Vartl'l is singular, having rank k.

9.19 Example If a random vector Z = (Z1,...,Zk)' is standard Gaussian (8.19),itis easy to verify, applying 9.17, that Ezj = 0 and EZZ') = Ik. Applying 9.18 tothe transformation in (8.35)produces F(A') = g and

Vartm = EX - p)(#- g4' = F(AZZ'A') = AF(ZZ')A' = M' = E. n

9.5 More Theorems for the Toolbox

Thefollowingcollection of theorems, together with thelensen and Markovinequal-ities of j9.3, constitute the basic toolbox for the proof of results in proba-

Page 158: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Probability

bility. The student will tind that it will suffice to have his/her thumb in these

pages to be able to follow a gratifyingly large number of the arguments to beencountered in subsequent chapters.

9.20 Cauchy-schwartz inequality

2 < EXl)Eyl) (9.51)Ex ,

with equality attained when F = cX, c a constant.

Proof By linearity of the integral,

2 lEXl) + ZXIEIXLI+ E(Yl) OELaX+ F) ) = al d holds as anfor any constant a. (9.51)follows on setting a = -EIXYIIECX ), an

equality if and only if aX+ F = 0. .

1/2 dingjy jies inThe correlation c/elcdaf, rxy = Cov(X,F)/(Var(mVar(D), accorthe interval (-1,+11.

The Cauchy-schwartz inequality is a special case (forp = 2) of the following.

9.21 Hlder's inequalil For any p 2 1,

FI-YFIK 11-Y11,111'11z?. (9.52)where q = p/(p - 1) if p > 1, and q = x if p = 1.

Proof The proof for the case p > 1 requires a small lemma.

9.22 Lemma For any pair of non-negative numbers a,b,

tzp bqab S - + -.

P q(9.53)

Proof If either a or b are zero this is trivial. lf both are oositive, let s =

s& y . etlq ay =rlogtz and t = q log b. Inverting these relations gives a = e , ,S/PTf#

d (953) follows from the fact that ex is a convex function of x,d , an .

noting3Iq = 1 - 1/p and applying (9.29).*

Choose a = )XI/II-Yllp,b = lr1/11F11t?.For these choices, Eap) = Ebq) = 1, and

FIXFI= Eab) S 1/# + $Iq = 1.XIp F q

(9.54)

For the case p = 1, the inequality reduces to FI-YFIS Fl Xl ess sup F, which holdssince F < ess sup i' a.s., by definition. w

The lder inequality spawns a range of useful corollaries and special cases,includingthe following.

9.23Liapunov's inequality (norminequality) If r > p > 0, then jlxllr I1-YIlp.Proof Let Z F:zlXIF, F = 1, s = r/p. Then, (9.52)gives FIZFIf IIzII,IIFIIs(,-1),or

Page 159: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations

# < E 111 pslls= E I11 rfr

. (9.55)E(1X 1) .

This result is also obtainable as a corollary of the Jensen inequality.

9.24 Corollary For each A e. and 1/p + jIq = 1,

1/p p 1Iqjg IXI'IdP < Jxl71 PdP Jx11'IqdP .

Proof ln (9.52),replace X by Xlx and i' by i'1x. x

Alternative variants of the result are not explicitly probabilistic in character.

9.25 Corollary Let x1,...,x,, and y1,...,yn be any sequences of numbers. Then

n n 1/, n 1/<

77lxafl S 771x/1J' 77IyfI for 1/p + Iq = 1. nizz1 izzz1 fzz1

9.26 Corollary Let fl and gl be Lebesgue-integrable functions of a realvariable. Then

J-'-ftkbgtkbkdts (j-I?f)I,z,)'''-(J-I,,(,)l-?,)'' or 1/,+ lq - 1. a

Proofs are left as an exercise. The sequences in 9.25 and the functions in 9.26

can be either real or complex-valued (see jl 1.2).

9.27 Minkowski's inequality For r k 1, IIX+ i'llr < IIXIIr+ 11i'llr.Proof For r = 1 this follows direct from the triangle inequality,

IX+ i'I < I-YI+ I1'l , (9.56)on taking expectations. For r > 1, note that

FIx+ FIr= F(I#+ FII-Y+FIr-1)

r-1S E IXI+ IFI)IX+ l'I )= E IxIIX+ l'Ir-1)

+ E Irl Ix+ I'Ir-1). (9.57)

Applying the Hlder inequality to the two right-hand-side terms yields

E IX+ FI' < (II#IIr+ 11FIIr)(FIX+ l'Ir)1-1/r.(9.58)

Cancelling A'IaY+FIr and rearranging gives the result. w

By recursive application to the sum of m variables, the Minkowski inequalitygeneralizes directly to

Yxi K 77Ilmllrf=1 r

=1

(9.59)

for r k 1. For an intinite series pne can write

Page 160: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

140 Probability

77m : X IImlIr+77xi=1 r f=1 i=?n+1 r

If IIZ7=,,,+1XjIIr-- 0 as m -- tx. it is permissable to conclude that

(9.60)

o/ X

xi s IIx)Ilr,i= 1 r f=1

(9.61)

not ruling out the possibility that the right-hand side is infinite.

9.28 Love's cr inequality For r > 0,

Fq r m

e' Txi s cr XA'1 XiIr,

f=1 i=1

r-l j) jwhere cr = 1 when r S 1, and cr = m w en r .

Proof This goes by proving the inequality

(9.62)

m r m

ai <- c r Iai Ir

/=1 f=1(9.63)

for real numbers JI,...,tza, then substituting random variables and taking expecta-tions. Since 1Z0,zzkai lr S (E't',.1 Iai I)r, it will suffice to 1et the ai be non-nega-tive. For the case 0 < r S 1, 0 f zi f 1 implies z; k zi and hence if Z'?,xlzf = 1,XTt=lz(,

k 1. (9.63)follows on putting zi = ai /(Z0./ctajj. For r k 1, on the otherhand, convexity implies directly that

j m r 1 m

- X ai K - XX. *

icz:l izz,1

9.29 Theorem If X, 1', and Z are non-negative r.v.s satisfying X f tz(F+Z) a.s.for a constant a > 0, then, for any constant M > 0,

FtltxyvlA') f zJtftlf y>M/2c)F)+F(1(z>ArJw)Z)). (9.64)Proof lf we can prove the almost sure inequality

1(x>M)XS 2J(1ty>&z/2u)F+1(z>M/2u)Z),a.s., (9.65)the theorem will follow on taking expectations. 1(x>M)Xis the r.v. that is equalto X if X > M and 0 otherwise. If X K M (9.65)is immediate. At least one of theinequalities F k X/M, Z k Xlla must hold, and if X > M, (9.65)is no lessobviously true. w

9.6 Random Variables Depending on a Parameter

Let G(,0): f x 0- F-> R, O i R, denote a random function of a real variable 0, orn other words, a family of random variables indexed on points of the real line.

Page 161: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Expectations 141

The following results, due to Cramr (1946),are easy consequences of the domi-nated convergence theorem.

9.30 Theorem Suppose that for each (l) e C, with #4C) = 1, G4,0) is continuous atapoint %,and IG((t),0)I< F((t))for each 0 in an open neighbourhood No of %whereF(i') < x. Then

lim EG(44 = F(G(0o)). (9.66)8-+%

Proof Passage to a limit 00 through a continuum of points in 0-, as indicated in(9.66), is implied by the convergence of a countable sequence in 0-. Let (0v,v eNJbe such a sequence, in No, converging to 0o. Putting Gv((t))= G(,0v) defines acountable sequence of r.v.s. and limsupvGvt) and liminfvGvt) are r.v.s by 3.26.By continuity, they are equal to each other and to G(,%) for (l) e Q in otherwords, G(0v) --> G(%) a.s. The result follows from the dominated convergencetheorem. .

9.31 Theorem 1f, for each ) e C with #(C) = 1, (JG/J0)((t)) exists at a point %and

G(),% + h4 - G(,%)< n((o)

for 0 f < Jl1, where F(l'1) < (x, and l is independent of ), then

KdoEa) j(., - e.ttjG()I

..). (9.67)

Proof The argument goes like the preceding one, by considering a real sequence(v ) tending to zero through positive values and hence the sequence of r.v.s (SVJwhere Hv = gG(0o+/lv)- G(%)1/v, whose limit H = S(0()) exists by assumption. .

The same sort of results hold for integrals. Fubini's Theorem provides theextension to general double integrals. The following result for Riemann integrals

on intervals of the line is no more than a special case of Fubini, but it isuseful to note the requisite assumptions in common notation with the above.

9.32 Theorem Suppose thatforeach (l) E C, withf'to = 1, G(,0) is continuous ona t'initeopen interval atb), and IG(,0) l < 1'2() for q < 0 < b, where F(F2) < x.

Then

(9.68)

lf J7'IG(,0) lJ0 < 1'3(t,))for (t) E C and Ek) < cx), (9.68)holds for either or bothof a =

-x and b = +=.

Proof For the case of t'initea and b, consider Hlsb,tl = JJG(,0)#0. This has thepropertiesIH% I < b - J)F2(t,)), and I dHld I = lG(),f) I < F2(), for each

b4 Hence, EH) exists for each /, and by 9.31,t e a, .

s(c(e)):e-s(js(e)ts)

.J- -

Page 162: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

l42 Probability

d dHEH) = E dt= EGtj). (9.69)

E(Gt)) is continuous on a,b4 by the a.s continuity of G and 9.30, and hence

a(,)-jtEol4d -s(s(?))

J(9.70)

is differentiable on (a,b4,and dhldt = 0 at each point by (9.69).But by defi-nition, H,a) = 0 for e C, so that A(J) = 0, and hence hb) = 0 which is equiv-alent to (9.68).

Under the stated integrability condition on G(,0),JX

G(,0)#0 exists and is-X

finite on C. Hence Htt) = J!xG(t0,0)#0 is well defined and has an expectation forall t e R, and th argument above goes through with a =

-.x and/or b = x. >

Page 163: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

10Conditioning

10.1 Conditioning in Product Measures

It is difficult to do justice to conditioning at an elementary level. Withoutresort to some measure-theoretic insights, one can get only so far with the theorybefore nznning into problems. There are none the less some relatively simpleresults which apply to a resicted (albeitimportant) class of distributions. Weintroduce the topic by way of this

'naive'

approach, and so demonstrate the difti-culties that arise, before going on to see how they can be resolved.

In the bivariate context, the natural question to pose is usually: dif

we know X

= x, what is the best predictor of F?' For a random real pair lX,l') on (,T, #) wecan evidently define (see j7.2) a class of conditional distribution functions forK For any A e B such that PX e A) > 0, let

PX e A, F < y)F@ lX G A) =

.

PX s A) (10.1)

This corresponds to the idea of working in the trace of (f1,T,#) with respect to A,once A is known to have occurred. Proceeding in this way, we can attempt toconstruct a theory of conditioning for random variables based on the c.d.f. We maytentatively define the conditional distribution function,F(.yIx),when it exists,

2 R hich for tixedx q R is a non-decreasing, right-con-as a mapping from R to wtinuous function of y with F(-x lx) = 0 and F(+x lx)= 1, and which for fixed y e Rsatisties the equation

Px e,4, J' < y) =

jgFl.y Ix)#Fx(x) (10.2)

for any A e B (compareRao 1965: j2a.8). Think of the graph of F(.yIx)in y-spaceas the protile of a

islice' through the sudace of the joint distribution func-tion, parallel to the y-axis, at the point x.

However, much care is needed in interpreting this construction. Unlike theordinm'y c.d.f., it does not represent a probability in general. If we try tointerpret it as P(Y S y IX = x), we face the possibility that P(X = x) = 0, as in acontinuous distribution. Since the integral of F@ lx)over a set in the marginaldistribution of X yields a probability, as in (10.2),it might even be treated as

a type of density function. Taking A = (.Y f.x)

shows that we would need

,(x,y)- j-,'(,I()#,k(t) - J-j' g,'(vI():,'x(t)ex

ex ex

(10.3)

Page 164: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

144

2 F bini's theorem implies that F@ 1x)to hold. Since F(x,y) is an integral over R , uis well detined only when the integrals in (10.3)are with respect to a product

Probability

meastlre.

If Xand Fare independent we can say unambiguously (butnot very usefully) thatFLyjx) = Fy%). F(? j.x)is also well defined for continuous distributions. Let Sxdenote the support of X (the set on which fx > 0); the conditional p.d.f. is

fx,yj sJ# Ix) = ,.x

e x,.fx@) (10.4)

where jxx) is the marginal density of X. We may validly write, for A e B f''h Sx,

px e x, r s y) - jgj'.ftvlo/'axltxo

- j z-twylxlyxt.xlts-x #

(10.5)

where

(10.6)

The second equality of (10.5)follows by Fubini's theorem, since the functionfxny) is integrated with respect to Lebesgue product measure. However, (10.6)appears to exist by a trick, rather than to have a firm relationship with ourintuition. The problem is that we cannot work with the trace (A,Ta,#a) when A =

(: X()) = xl and #(A) = 0, because then #a = #/#(A) is undefined. lt is not clearwhat it means to

<consider the case when (,Y = x) has occurred' when this eventfails to occur almost surely.

Except in special cases such as the above, the factorization dF(y,xl =

dF@ Ix)#Fx@) is not legitimate, but with this very important caveat we can definethe mean and other moments of the conditional distribution. The conditional expec-tation of a measurable function #(-Y,i'), given X = x, can be defined as

(10.7)

also written as Elgx,Y) lX = #. The simplest case is where g(X,j') is just F.f(l'Ix) is to be understood in terms of the attempt of an observer to predict Fafter the realization of Xhas been observed. When Xand Fare independent, &F1x)

= E, where EY4 is the ordinary expectation of F, also called the marginal orunconditional expectation. ln this case, the knowledge that X = x is no help inpredicting K

F(..lxl - j' Avlxlfo.-K

+*

Ex,n Ix) = J glx-ybdFytxq,-X

10.1 Example These concepts apply to the bivariate Gaussian distribution. From(8.36), the density is

-1

1 ,r a611 C12 X- /1

fx,yj =j/2 exp - z't.x- g.1,y - g,21

C11 C12 C12 C22 F- /2

2ac12c22

Page 165: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning 145

2cl2(A'- g,2)- (.:7- g,1) 2c1l (x- jzl)1= exp -

-

, (10.8)l/2 zcjj2J(cl1c22 - c212) c212

2 c22 -

c11

where the last equality is got by completing the square in the exponent.Evidently, fxky) = fy Ixjfxx) where

2c12(A,- p,2)-

'--fx

- M,1)c111fl.y1I) =

ja exp -

2 c21ac12WIVc2a - 2 c22-

c11 c11

and

(10.9)

1l @- B1)fxxj = exp

-

zo..

2zrc11 11(10.10)

Thus,(T12

F(Flx) = g,2+----t.r

- p,1),c22

(10.11)

and

c212VartFlxl = 622 -

.

c11(10.12)

If c1a = 0, fy lx)reduces to fvy), so that the joint density is the productof the marginals, and x and y are independent. n

10.2 Conditioning on a Sigma Field

In view of the limitations of working directly with the distribution of (aY,l'),wepursue the approach inoduced in j7.2, to represent partial knowledge of thedistribution of F by specifying a c-field of events N g 5 such that. for eachG e N, an observer knows whether or not the realized outcome belongs to G.

The idea of knowing the value of a random variable is captured by the concept ofsubfield measurability. A random variable .X()): fl F-> R is said to be measurablewith respect to a c-field V c 5 if

x-2B) = (: .Y4(0)G #) e V, for al1 B e f. (10.13)The implication of the condition V c 5 is that the r.v. X is not a complete

Page 166: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Probabilit.y

representation of the random outcome ). We denote by c(m the intersection of allc-fields with respect to which X is measurable, called the c-fild generated by X.Jf, on being confronted with the distribution of a random pair (X()),1'())), welearn that X =

.x,

we shall know whether or not each of the events G e ctm hasoccurred by determining whether X(G) contains x. The image of each G e c(A3under the mapping

2(X()), F()))2 D F--> R2 d the p.m. defined on c(A') is the marginal distributionis a cylinder set in R . an

of X.

10.2 Example The knowledge that xl fk X ; .x2

can be represented by

R = c(f(-x, x1: x < xl ), ((x, x): x >.x2),

R).

Satisfy yourself that for every element of this c-field we know whether or not Xbelongs to the set; also that it contains all sets about which we possess thisknowledge. The closer together a'1 and .x2

are, the more sets there are in R. Wben

.v1= x2, R = ctm, and when xl =-x, .:2 = +x, J4 = 5 = lO,R 1. n

The relationships between transformations and subfield measurability are sum-marized in the next theorem, of which the first part is an easy consequence of thedefinitions but the second is trickier. lf two random variables are measurablewith respect to the same subfield, the implication is that they contain the sameinformation; knowledge of one is equivalent to knowledge of the other. This meansh t evety Borel set is the image of a Borel set under g- This is a strongert a .

condition than measurability, and requires that g is an isomorphism. It sufficesfor g to be a homeomorphism, although this is not necessary as was shown in 3.23.

10.3 Theorem Let X be a r.v. on the space (S,fs,p), and let i' = glX) wheregl S F-+ 'T is a Borel function, with S i R and 'T i R.

(i)ctl') c ctm.(ii) c(1?)= ctm iff g is a Borel-measurable isomorphism.

-1 hich in turn has an image inProof Each B e f.r has an image in fs under g , w-1 This proves (i).c(m under X .

-1#): B e B ). To everyTo prove (ii),define a class of subsets of S, C = (.g ( v. -1A S there corresponds (sinceg is a mapping) a set B 'T such that A = g B),and making this substitution gives

-j-1

s g jj(; j4;fs lz4: '(z4)

e .r1 = fp (#): gg (#)) e v l =, .

-1 d the second equality is becausewhere the inclusion is by measurability of g , an-' #)) = B for any B c 'T, since g is 1-1 onto. It follows from (10.14)thatgg (

-1-4)

s j: z4 e ls ) g (x-1(:-1(,)) c f: B s f.y). (10.15)fx (If F is V-measurable for some c-field V 5 (suchthat V contains the sets of theright-hand member of (10.15)), then X is also V-measurable. ln particular, ctmctl?). Part (i) then implies c(m = c(F), proving sufficiency of the conditions.

To show the necessity, suppose first that g is not 1-1 and glxL) ='(.n)

= y

Page 167: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning

(say)for xl y: .n. The sets (x1) and (x2)are elements of Ss but not of C, which-1 Hence Bs C, and 3 a Os-set X for whichcontains only g (fy)) = fxl) k.p (x2).

:-1 B) = A This implies that

.:-1(4)

Gthere is no Or-set B having the property g ( .

ctm but c(F), so that c(F) c ctm.1 If

-1

is not Borel-measurable, then byWe may therefore assume that g is 1- . g-1 B h thatgtA) =B e tB.r,and henced C; and again,definition 34 = g (B) G s suc

Ss C, so that c(l') c ctm by the same argument. This completes the proof ofnecessity. .

We should briefly note the generalization of these results to the vector case. Ak i b1e with respect to 97c 5 ifrandom vector #4): f F-> R s measura

-1 s v s s pk (1().j6)X B4 = f : A)l e #) G ,.

If ctAr) is the c-tield generated by X, we have the following result.

k jaere10.4Theorem Letfbe arandom vector on the probability space (S,fs,p) wd Bk = fl f-h s: B e Sk) and consider a Borel function$ c R an s ,

F = g(m: S F-- 1', T' g R2'. (10.17)(i) G'(1') l c(A').

(ii) If m = k and g is 1-1 with Borel inverse, then c(1') = ctm.

Proof This follows the proof of 10.3 almost word for word, with the substitutionsk d Bk for Ss and S.r, X and F for X and F, and so forth. .of fs an r

10.3 Conditional ExpectationsLet i' be an integrable r.v. on (f,T,#) and N a c-tield contained in F. The termconditional expectation, and symbol F(FIN), can be used to refer to any inte-grable, N-measurable r.v. having the property

jgFtrl U)dP = joj'dp= F(FIG)#(G), all G e N. (10.18)

Intuitively, F(FIV)()) represents the prediction of F() made by an observerhaving information V, when the outcome (.t) is realized. The second equality of(10.18) supplies the definition of the constant S(l'l G), although this need notexist unless #4G) > 0. The two extreme cases are F(FI@)= i' a.s., and F(Jrl 5) =

EY) a.s., where T denotes the trivial c-field with elements (f,0 ). Note that f

s 5', so integrability of F is necessal'y for the existence of F(FIV).The conditional expectation is a slightly bizarre construction, not only a r.v.,

but evidently not even an integral. To demonstrate that an object satisfying(10.18) actually does exist, consider initially the case F 0, and define

v(G) =JgF##. (10.19)

10.5 Theorem v is a measure, and is absolutely continuouj with respect to #.

Page 168: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

148

Proof Clearly, v(G) 2 0, and #(G) = 0 implies v(G) = 0. lt remains to showcountable additivity. If ll is a disjoint sequence, then

v U b = v,o,Ydp = X o,Ydp - Xvt(l, (10.20)j

*' J J

j*' ' ./

where the second equality holds under disjointness. .

Probability

So the implication of (10.18)for non-negative F turns out to be that F(FIT) isthe Radon-Nikodym derivative of v with respect to #. The extension from non-negative to general F is easy, since we can write F = F+ - F- where F+ and Y-are non-negative, and from (10.18),F(Fl V) = F(F+IV) - E(Y- IN), where both ofthe right-hand r.v.s are Radon-Nikodym derivatives.

The Radon-Nikodym theorem therefore establishes the existence of F(FI5:); atany rate, it establishes the existence of at least one nv. satisfying (10.18).Itdoes not guarantee that there is only one such r.v., and in the event of non-uniquenes, we speak of the different versions of F(l'IV). The possibility ofmultiple versions is rarely of practical concern since 10.5 assures us that they

are al1 equal to one another a.s.g#j, but it does make it necessary to qualify anystatement we make about conditional expectations with the tag a.s.', to indicatethat there may be sets of measure zero on which our assertions do not apply.

ln the bivariate context, F(Fl c(m), which we can write as FtFlm when thecontext is clear, is interpreted as the prediction of F made by observers whoobserve X. This notion is related to (10.7)by thinking of F(l'l x) as a drawingfrom the distribution of F(FIm.10.6 Example ln place of (10.11)we write

61 2&Flm(t,)) = p,2+ ---(X() - p.1),

C1 1(10.21)

which is a function of X(l, and hence a r.v. defined on the marginal distributionof X. F(FIm is Gaussian with mean g,2and variance of c2la/cll. E!

Making F(FIx)a point in a probability space on R circumvents the difficultyencountered previously with conditioning on events of probability 0, and ourconstruction is valid for a11distributions. lt is possible to define F(FIG4 when#(G) = 0. What is required is to exhibit a decreasing sequence fGn e N) with #(Gn)> 0 for every n, and Gn1 G, such that the real sequence (F(1'1 Gn)) converges. Thisis why (10.7)works for continuous distributions. Take Gn = gx,x + 1/p1XR sc(A'), so that G = (x)xR. Using (10.4)in (10.18),

+co f*x+1/nJ..Jx

yfl,ytdldy+x

f(rl Gn) =

+. ujtnvjvy

-- j-giy'xsdy= &Fl*,

uJ-?'(''.''(10.22)

Page 169: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning

as n--->

x. Fubini's theorem allows us to evaluate these double integrals onedimension at a time, and to take the limits with respect to n inside the integralswith respect to y.

Conditional probability can sometimes generate paradoxical results, as thefollowing case demonstrates.

10.7 Example Let X be a drawing from the space (g0,11, p,IJ, m), where m isLebesgue measure. Let N c fp,1) denote the c-field generated from the single-tons fxl, x e (0,1). A1l countable unions of singletons have measure 0, while allcomplements have measure 1. Since either #(G) = 0 or #4G) = 1 for each G e V, itis clear from (10.18)that F(XIN) = F(m =

, a.s. However, consider the followingargument. 'Since fx)e N, if weknowwhetheror notxc Gforeach Ge V,weknow

x. ln particular, V contains knowledge of the outcome. It ought to be the casethat F(XIV) = X a.s.' n

The mathematics are unambiguous, but there is evidently some difficulty with theidea that V should always represent partial knowledge. lt must be accepted thatthe mathematical model may sometimes part company with intuition, and generateparadoxical results. Whether it is the model or the intuition that fails is a nicepoint for debate.

10.4 Some Theorems on Conditional Expectations

10.8 Law of iterated expectations (LlE)

FgF(J'IN)j = Ek.

Proof lmmediate from (10.18),setting G = D. .

The intuitive idea that conditioning variables can be held 'as if constant' under

the conditional distribution is confirmed by the following pair of results.

10.9 Theorem lf X is integrable and V-measurable, then F(#IN) = X, a.s.

(10.23)

Proof Since X is N-measurable, E += 1): X() > F(XIN)()1 e N. lf #(F+) > 0,

then

jyydp - jg-yxfN)J# = jyjx-F(.YIN))J# > 0. (10.24)

This contradicts (10.18),so PE*) = 0. By the same argument, P(E-) = 0 where E-

= (tt):X((t7)< #(xl V)(@) E V. *

10.10 Theorem lf Fis F-measurable and intepable, X is V-measurable for V g @,and E IA-FI< x, ten &FaY1T) = -YSIFIN) a.s.

Proof By definition, the theorem follows if

xYFIFI%)dP = JXYdP a.s., for al1 G e T. (10.25)

o s

Let xn) ='z-lajlsj be a V-measurable simple f.v., with F1,...,Fn a pmition of

Page 170: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

150

f and Ei e V for each i. (10.25)holds for X = X(n) since, for all G e V,

Probability

nwj s(I,Itvpjgxtnwtylcldp - x ac,i= 1

- .ijyw, - j x-,vdp,=X GIAEt' G/=1

(10.26)

noting G fa Ei e N when G e V and Ei e V.Let X k 0 a.s., and let (X(,,)Jbe a monotone sequence of simple N-measurable

functions converging to X as in 3.28. Then XnlY,--+

XY a.s. and IX(n)F( K lXFI,where F1XFI< x by assumption. Similarly, X(n)F(FIN) -- #FIl'IV) a.s., and

e-l-Ytnlftrl N)I = F1F(A)n)FIN)I S EE 1-Y(n)1'lI9-))

= FlaYtall'l < e-IA'l'I< x, (10.27)

where the first inequality is the conditional modulus inequality, shown in 10.14below, and the second equality is the LlE. It follows by the dominated convergencetheorem that Jc#(n)F(rI%)dP --> JcxftFl N)##, and so (10.25)holds for non-negative X. The extension to general N-measurable X is got by putting

x = X+ -X- (10.28)where X* = maxfmo) 2 0 and A'- 2 0, and noting

eFXl 9') =EIYX'P

- FX- IN) = ECYX'IN) - EYX- lN)

= X* - X-)F(l'IN) = XFIFIV) a.s., (10.29)

using (10.33)below and the result for non-negative X. .

X does not need to be integrable for this result, but the following is an impor-tant application to integrable X.

10.11 Theorem lf F is F-measurable and integrable and F(rl V) = El') for N c @,then Covtmp = 0 for integrable, V-measurable X.

Proof From 10.8 and 10.10,

ECXYI= #(A'(A-FlN)) = Ft-tftrl V)J. (10.30)If F(FIV) = EY) a.s. (a constant), then ECXYI= EkEr). w

Note that in general Covt#,F) is defined only for square integrable r.v.s X and KBut Cov(X,Y) = 0, or ECXYI= FtmFtF), is a property which an integrable pair cansatisfy.

The following is the result that justifies the characterization of the condi-tional mean as the optimal predictor of J' given partial information. optimal'issen to have the specific connotation of minirnizing the mean of the squaredprediction errors.

Page 171: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning 151

g.: 10.12 Theorem Let denote any V-measurable approximation to F. Then

t .( :,.. jlY- F(rIN)112< (1Y- f'112. (10.31)y...$...- y.j;

) ... -

.).... .. ..../..-. a,q- a..v

f (F- f)2= (F-F(FIN)j2+2(y-F(Fj V)jgF(J'l T) - Fj + gF(FIN) - F)2, andProohence

F((F- f)2IT) = Fg(F-F(Fj N))2jNj + (F(yl N) - f12a.s., (10.32)

noting that the conditional expectation of the cross-product disappears by defin-ition of F(1'1V), and 10.10. The proof is completed by taking unconditionalexpectations tluough (10.32)and using the LIE. w

The foregoing are results that have no counterpart in ordinary integrationtheory, but we can often exploit the fact that the conditional expectation behaveslike a

%real' expectation, apart from the standard caveat we are dealing with

nv.s so that diferent behaviour is possible on sets of measure zero. Linearityholds, for example, since

EaX+ hFIN) = aExt T) + :FIFIT), a.s., (10.33)is a direct consequence of the detinition in (10.18).The followipg are condi-tional versions of various results in Chapters 4 and 9. The first extends 4.5 and4.12.

10.13 Lemma(i) If X = 0 a.s., then F(aYIV) = 0 a.s.

(ii) If X f F a.s., then F(.YlT) S F(Fl V) a.s.(iii) If X = F a.s., then F(aYlN) = F(FIV) a.s.

Proof (i) follows diretly from (10.18).To prove (ii),note that the hypothesis,(10.18) and 4.8(i) together imply

jaExk N)## = jaxdpS hk'dp=JgF(Fl G4dP

for a11G e V. SinceA = f: F(xYIT)((t))> 6(FIT)((t))1e V, itfollows that #(A) = 0.The proof of (iii)uses 4.8(ii), and is otherwise identical to that of (ii). w

10.14 Conditional modulus inequality IF(FIT) I f E Il'l IT) a.s.

Proof Note that IFI = F++ Y-. where F+ and Y- are defined in (10.28).These

arenon-negative nv.s so that F(F+l N) > 0 a.s. and EY- jN) 0 a.s. by 10-13(i)and (ii).For ) e C with PC4 = 1,

lF(F+ - Y- lN)()1 = IF(F+l T)() - /(F-1 97)())1

< F(l'+IT)((o) + EY- lN)()

= Ev+w y- I )(), (10.34)

where both the equalities are by linearity. .

Page 172: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Probability

10.15 Conditional monotone convergence theorem lf L f F and L 1-1' a.s.,then F(FnI5') 1' F(1'1 N) a.s.

Proof Consider the monotone sequence Zn = L - K Since Zn f 0 and Zn S Zn+I,

10.13 implies that 4he sequence Ezn lN)) is negative and non-decreasing a.s., andhence converges a.s. By Fatou's Lernma,

ja limsup Ezn IN) dP k limsup jaE(Zn lU4dPn'-/oo n'->

= limsup Zndp = 0GN-->=

(10.35)

for G e V, the tirst equality being by (10.18), and the second by regular monotoneconvergence. Choose G = f : limsupaAzn lT)() < 0J, which is in N by 3.26, and(10.35) implies that #4G) = 0. It follows that

lim F(ZnIV) = 0, a.s. . (10.36)n'-co

10-16 Conditional Fatou's lemma If Fn k 0 a.s. then

liminf &FnIV) k E liminf L IN a.s.n-oo n-'+oo

Proof Put L' = influnrk so that l%'is non-decreasing, and converges to F =

liminfarr,. Then E'n' 5') --+ EY' IW)by 10.15. F,, k Fr, and hence E l5>)1E'n' lT) a.s. by 10.13(ii). The theorem follows on letting n

--)

,. .

Extending the various other corollaries, such as the dominated convergencetheorem, follows the pattern of the last results, and is left to the reader.

10.17 Conditional Markov inequality

(10.37)

F(l rlplT)#(1IFl P E.l IT) S , a.s.EP

Proof By Corollary 9.10 we have

s'jolfIrlkcll' f JglY'PdP, G e @. (10.38)

By definition, for G e N,

jopf I1'l k El I*4dP = Jcltll'lk.:)tf#, (10.39)

andjaEf FIPIUtdp = JclYPdP. (10.40)

Substituting (10.39)and (10.40)into (10.38),it follows that

Page 173: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning 153

JcEEf#(tl1'I2 El I5') -F(I F1#1U4dp f 0. (10.41)

The contents of the square brackets in (10.41)is a V-measurable r.v. Let G e Ndenote the set on which it is positive, and it is clear that #(G) = 0. .

10.18 Conditional Jensen's inequality Let a Borel function () be convex on aninterval 1 containing the support of a i-measurable r.v. F where F and (/(J') areintegrable. Then

4'tftrlT)) < F@tl'')l N), a.s. (10.42)Proof The proof applies 9.13. Setting x = F(l'IV) and y = J' in (9.31),

A(F(1'l V))(F-F(Fl T)) < :(F) -#(F(1'IN)). (10.43)However, unlike A(F(D),A(F(FIV)) is a random variable. It is not certain thatthe left-hand side of (10.43)is integrable, so the proof cannot proceed exactlylike that of 9.12. The extra t'rickis to replace F by 1sF, where E = ftl): F(FIN)()f #1 for B < x. F(Fl V) and hence also ls are V-measurable random variables, soF(1sFIT) = 1sF(l'IV) by 10.10, and

e'@(1sl')IN) = F(9(F)ls+ #(0)1rIT)

= lsfttF)l V) + (1 - 1s)#(0). (10.44)

Thus, instead of (10.43),consider

A(F(ls1'IT))(1sF- lsF(l'l V)) k #(1sF) -#(1sF(l'l T)). (10.45)

The majorant side of (10.45)is integrable given that (/(i') is integrable, andhence so is the minorant side. Application of 10.9 and 10.10 establishes that theconditional expectation of the latter term is zero almost surely, so with (10.44)we get

#(1sF(l'IV)) < 1sF@(1?)l V)+ (1 - 1s)9(0), a.s. (10.46)Finally, 1et # -- x so that ls

-->

1 to complete the proof. .

The following is a simple application of the last result which will have usesubsequently.

10.19 Theorem Let X be V-measurable and fv-bounded for r k 1. If F isF-measurable, X+ F is also Lr-bounded, and F(Fl V) = 0 a.s., then

FIxY+ FIr k FI-YIr. (10.47)Proof Take expectations and apply the LlE to the inequality

F(1-Y+ FIr! N) 1 IEX+ irj N)Ir = lx1r a.s. . (10.48)Finally, we can generalize the results of j9.6. It will suffice to illustrate

with the case of differentiation under the conditional expectation.

Page 174: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

154 Probability

10.20 Theorem Let a function G(,0) satisfy the conditions of 9.31. Then

e'(-JI:-o Ir) -

dedl 9'' I , -.s.8=00

(10.49)

Proof Take a countable sequence (v, v e (Nl with hv -- 0 as v -- x. By linearityof the conditional xpectation,

G(0()+ h - G(%) F(G(% + v) lV) - (G(%) IT)E T = a.s. (10.50)hv hv

If Cv 6 V is the set on which the equality in (10.50)holds, with #(Cv) = 1, thetwo sequences agree in the limit on the set Ovcv,and #(OvCv) = 1 by 3.6. Theleft-hand side of (10.50)converges a.s. to the left-hand side of (10.49)byassumption, applying the conditional version of the dominated convergencetheorem. Since whenever it exists the a.s. limit of the right-hand side of (10.50)is the right-hand side of (10.49)by definition, the theorem follows. .

10.5 Relationships between SubfieldsTI 5 and Mzc 5 are independent subfields if, for every pair of events Gj e V1and Gz s N2,

#(Gj r7 G = #(G1)#(Gc). (10.51)Note that if F is measurable on Vl it is also measurable on any collectioncontaining V1,and on ; in particular. Theorems 10.10 and lo-llcover cases whereF as well as X is measurable on a subfietd.

10.21 Theorem Random variables X and F are independent iff ctm and ctF) areindependent.

Proof Under the inverse mapping in (10.13),G1 e c(m if and only if #1 = X(G1)i T with a corresponding condition for c(F). It follows that (10.51)holds foreach G1 e c(m, G1 G c(F) iff PX* Bj, i'e Bz4= PX e #1)#(Fe Bz) for eachfl =

X(G1), Bz = F(G2). The only ifo of the theorem then follows directly from thedefinition of ctm. The 'if' follows, given (8.38),from the fact that every Bi eB has an inverse image in any subfield on which a r.v. is measurable. w

The only ifo in the first line of this proof is essential. Independence of thesubtields always implies independence of X and F, but the converse holds only forthe infimal cases, c(m and f5(1').

10.22 Theorem Let F be integrable and measurable on V1.Then F(FIV) = EY)a.s. for a11V independent of 5'I

.

Proof Define the simple Vl-measurable r.v.s l'(n)G11,...,G1s of ( where G1j e V1, each i, with l%)

= Z'lzzlyflclfOn a partition1 F as in 3.28. Then

Page 175: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning 155

n vl, - ynyjacjf ra c)jynlb 5')J# =X'faFtltyl/l

f=1 *' %'' =1

=#(G)XX#(G1f) = PG4EYn)4 fOr all G e V, (10.52)

f=1

Ejnj) -- F(i') by the monotone convergence theorem. F(F(n)IN) is not a simplefunction, but EYn) IV)

'1'

F(Fl V) a.s. by 10.15, and

joEn) IVV# -- jcfk''VV#' (10.53)

by regular monotone convergence. Hence for Vl-measurable F,

JgFIFIM4dP = PGjEY) for a1l G e V. (10.54)

Frop the second equality of (10.18) it follows that F(FIG) = EY4 for a11G e V,which proves the theorem. w

10.23 Corollary If X and F are independent, the &FI.)

= E').

Proof Direct from 10.21 and 10.22, putting V = ctm and V1 = c(F). .

10.24Theorem Apairof c-fields V1cF and Vac 5 areindependentiff Cov(X,D=

0 for every pair of integrable r.v.s X and F such that X is measurable on V1and Fis measurable on N2.

Proof By 10.22, independence implies the condition of 10.11 is satisfied for N =

V1,proving <only ito. To provetif'

, consider X = lcj, G1 e V1,and F = 1cz forG2 e N2. X is Nl-measurable and F is Na-measurable. For this case,

Cov(X,D= #(GI fa G2) - #(Gl)#(G2). (10.55)Cov(X,D= 0 for every such pair implies V1and V2are independent by (10.51).w

10.25 Corollary Random variables Xand F are independent iff Cov@(m,v(F)) =

0 for every pair of integrable Borel functions 4)and v.Proof By 10.3(i), 9(A') is measurable with respect to c(m for al1 , and v(F) isctpimeasurable for a1l v. If and only if al1 these pairs are uncorrelated, itfollows by 10.24 that c(A3. and cti') are independent subfields. The result thenfollows by 10.21. w

An alternative proof of the necessity part is given in 9.17.The next result generalizes the law of iterated expectations to subfields. We

say that c-fields V1 and V2 are nested if V1 c V2.

10.26 Theorem lf N1 i Va @, then for T-measurable F,

(i)EEF(l'IN2)1N11= F(l'l N1) a.s.(ii)FEe-(1'lN1)1T21 = F(l'IT1) a.s.

Page 176: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

156 Probability

Proof By definition,

jLELECY' V2)IUjjdp = J(JIFI 5'2)## for al1 G e N1. (10.56)

But, since G e T1 implies G 6 N2, (10.18)and (10.56)imply that

jcyE'' V2)1T11## = jo-dpfOr all G e T1, (10.57)

so that A'(F(Fl T2)IV1 is a version of F(Fl N1),proving (i).Part (ii) is by 10.9,sinceF(FINI) is a Vc-measurable r.v. w

A simple application of the theorem is to a three-variable disibution. lf3 ble on T, 1etc(Z) and c(F,@ be(X(),l'(),Z())) is a random point in R , measura

the infimal c-fields on which Z and (F,@ respectively are measurable, and c(/c(F,@ 5. Unifying notation by writing F(l'Iz) = F(FIc(z)) and F(rlmz) =

F(FIc4-Y,/), 10.26 implies that

F(e'(i'I-Y,z)IZ1 = F(A'(FIz)I-X-,Z)4 = F(1'1z). (10.58)Our final results derive from the conditional Jensen inequality.

10.27 Theorem Let F be a F-measurable r.v. and T1 i Uz T. lf 9(.)is convex,

F(t(F(FI5-2)))2 F(9(e'(rI971))(). (10.59)Proof Applying 10.18 to the Tz-measurable r.v. F(FIW2)gives

F@(f(FIT2))1V1) k #(F(F(l'I91)1N1)) = #(F(Fl V1)) a.s. (10.60)where the a.s. equality is by 10.26(i). The theorem follows on taking uncon-ditional expectations and using the LlE. .

The application of interest here is the comparison of absolute moments. Since lxlP

is convexforr 1, the absolute moments of F(FIT2)exceed thoseof F(FIT1)whenN1 i N2. ln particular,

2 k FgF(rIN1)2j. (10.61)FgF(rIT2) 1SinceFti'l NllandFtFl Vzlbothhavemean ofE, (10.61)impliesVar(F(FIT2))k

Var(F(FIT1)). Also, F(F(FIVj)2)+F((F- F(FIVj))2) = EYl) for i = 1 or 2, (theexpected cross-product vanishes by 10.10), so that an equivalent inequality is

2 J Fg(r-F(J'j T1))2j. (10.62)Fgtr-e'(J'l N2))1The intepretation is simple. V1represents a smaller infonnation set than V2,andif one predictor is based on more information than another, it exhibits morevariation and the prediction error accordingly less variation. The exeme casesare &FI ;) = Fand F(FIT) = F(1'), with variances of Vartp and zero respectively,This generalizes a fundamental inequality, that a variance is non-negative, to thepartial infonnation case.

While (10.61)generalizes from the square to any convex function, (10.62)does

Page 177: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Conditioning 157

not. However, there is the following nonn inequality for prediction errors.

10.28 Theorem If F is T-measurable and N1 i N2 i ;,

IIF-FIFIN2)11,K 2I1F-F(F1Tl)IIp,p k 1.

Proof Let n = F- F(FIV1). Then by 10.26(ii),

(10.63)

n- F(n lN2) = F- F(r1 N1)-&F1 T2)+&&FIV1)1T2)

= J'-s(rI va). (10.64)

The theorem now follows, since

IIq- F(q IT2)l1p< IIq11,+11F(nIT:2,)11,f 2IIq11p (10.65)

the (Minkowsk.i and conditional Jensen inequalities, and theby, respectively,L1E. .

10.6 Conditional Distributions

The conditional probability of an eventA e # can evidently be defined as #(4 IN) =

F(lx 1N), where 1x() is the indicator function of A. But is it therefore meaning-ful to speak of a conditional distribution on (D,@), which assigns probabilitiesPA 1N) to each A e @?nere are two ways to approach this question.

First, we can observe straightforwardly that conditional probabilities satisfythe axioms of probability except on sets of probability 0 and, in this sense,satisfactorily mimic the properties of true probabilities, just as was found forthe expectations. Thus, we have the following.

10.29 Theorem(i) P(A IN) k 0, al1 A e F.

(ii) Pf IN) = 1 a.s.(iii) For a countable collection of disjoint sets Aj e @,

# UAjIT = X#(4jI9') a.s. (10.66)

Proof To prove (i),suppose 3 G e N with PA IV)()) < 0 for a11(t) e G. Then, by(10.18),

jaudp= Jg#(4I@4dP < 0, (10.67)

which is a contradiction, since the left-hand member is a probability. To prove(ii), note that #(f1 lV) is N-measurable and 1et G+ e N denote the set of (l) suchthat #(fl V)4)) > 1. Suppose #46*) > 0. Then since G+ rn = G+,

#(&*) = josdp< jo.#(QIT)##= jawezdp= #(G+), (10.68)

Page 178: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

158

which is a contradiction. Hence, #(G+) = 0. Repeating the argument for a set G- onwhich #4f1 I5')() < 1 shows that #(G-) = 0. For (iii), (10.18)gives, for anyG 6 V,

Probability

J(,#(U,A' jIU4dP = jocvjpdp= jvjacpdp= Xyjocvbdp'(10.69)

since the sets G /'7 Aj are disjoint if this is tl'ue of the Aj. By definition thereexists a version of #(l N) such that V G 6 5',

jocujdp= Js#(X/IV) dP, (10.70)

and hence

jop-v'/1 T)#/' = XJSPIA/IU4dP = jo(Xy#(X/IV))

dP../

(10.71)

The left- and right-hand members of (10.71)define the snme measure on W(see10.5)and hence #(U.?A.j lN) = Q#(AjIT) a.s. by the Radon-Nikodym theorem. w

But there is also a more exacting criterion which we should consider. That is,does there exist, for fixed (z), a p.m. g,r,lon (f,F) which satisfies

yu(A) = P(A l534)), each A e 5 (10.72)for all E G where #(C) = 1? lf this condition holds, the fact that conditionalexpectations and probabilities behave like regular expectations and probabilitiesrequires no separate proof, since the properties hold for jt(o. lf a family of p.m.slg,(o, e fJ satisfying (10.72)does exist, it is said to detine a regular condi-tional probability on V.

However, the existence of regular conditioning is not guaranteed in every case,and counter-examples have been constructed (seee.g. Doob 19532 623-4). Theproblem is this. ln (10.66),there is allowed to exist for a given collection

.4

=

(Ae TJ an exceptional set, say Ca with PCa) = 0, on which the equality fails.This in itself does not violate (10.72),but the set Ca is specific to

.d,

andsince there are typically an uncountable number of countable subsets

.4

5, wecannot guarantee that Poaj = 0, as would be required for g,foboth to be a p.m.and to satisfy (10.72).

This is not a particularly serious problem because the existence of the familylju) has not been critical to our development of conditioning theory, but forcertain purposes it is useful to know, as the next theorem shows, that p.m.s onthe line do admit regular condittonal distributions.

*

10.30 Theorem Given a space L1,5,P) and a subfield Nc @,a random variable Fhas a regular conditional distribution defined by

Fr/l N)() = #((-x,y1 lT)(), y e R, (10.73)jfor e C with #(C) = , where Fg(. IV)() is a c.d.f. for all e f1.

Page 179: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

159

Proof Write F(o*(y)to denote a version of #((-x,y1 I9')(). Let Mq denote the set of) such that F' ri) > F(o*(rj)for ri, rj e :) with ri < (/ . Similarly, 1et Ri denote

the set of (l) on which limn-yx/trj + 1/n) # FLlril,ri e e. And tinally, 1et Ldenote the set of those ) for which F(l(+x) # 1 and F(o*(-x)y: 0. Then C =

-&lc

rn(Uj#C

rn Lc is the set of on which FfLvlis monotone and right-continuous ata11rational points of the line, with F(l(+x) = 1 and Ff)(-x) = 0. For y e R let

Fg'@), y e ()

(,)e c,Fr(. I5')() = F.(.y+), y e R

-0

G(X, Otherwise,

(10.74)

where G is an arbitrary c.d.f. ln view of 10.29, #(Mj) = 0 for each pair i,j,PRi4 = 0 for each i and PL) = 0. (1fneed be, work ln the completion of the

space to define these probabilities.) Since this collection is countable, PC) =

1, and in view of 8.4, Fy(. IV)() is a c.d.f. which satisfies (10.73),as it wasrequired to show. .

Conditioning

lt is straightforward, at least in principle, to generalize this argument tomultivariate distributions.

For # e B it is possible to write

F(1s I5')(t,)) = jedkyfV)() a.s., (10.75)

and the standard lgument by way of simple functions and monotone convergenceleads us full circle, to the representation

+x

syy(y,a.,.F(FI9')() = J y#&@l-X

(10.76)

If V = ctm, we have constructions to parallel those of j10.1. Since no restric-tion had to be placed on the distribution to obtain this result, we have evidentlyfound a way around the difficulties associated with the earlier detinitions.

However, Fy(. IV)() is something of a novelty, a c.d.f. that is a randomelement f'rom a probability space. lntuitively, we must attempt to understand this

as representing the subjective distribution of F(@ in the mind of the observerwho knows whether or not ) e Gfor each G e V.The particular case Fy(. IN)4) isthe one of interest to the statistical modeller wheh the outcome ) is realized.Manyrandom variables may begeneratedfromtheelements of (D,1,#), notonly theoutcome itself - in the bivariate case the pair F(),X() - but also variablessuch as F(FIm(), and the quantiles of Fv@ Im().A11these have to be thoughtof as different aspects of the same random experiment.

Let X and F be r.v.s, and N a subfield with V c Rx = c(A') and V ? Rv = c(F).We say that X 4nd F are independent conditional on V if

Fxrtmyl 5') = Fx@l S'IFI-I.yIV) ms. (10.77)

Page 180: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

160

This condition implies, for example, that f(Xl'IV) = F(XIV)F(FIV) a.s. Let ju =

g(.,(o) be the conditional measure such thatg,(o(lX 6 (--,A1, i' 6 (--,A'1 1) = Fxvx'y 1V)().

with (t) fixed this is a regular p.m. by (thebivariate generalization ofl 10.30,and gwotAfa #) = g(o(A)g(o(#) for eachA e Rx and B e Xy, by 10.21. ln this sense,the subtields Rx and Ry can be called conditionally independent.

10.31 Theorem. lf X and F are independent conditional on V, then

f(l'1 Rx4 = F(1'l N) a.s. (10.78)

Probability

Proof By independence of Rx and Rv under u we can write

JxftFl XaYll/t =JxFtfltto

= BtX)JFtV' X e Rx.

This is equivalent to

(10.79)

&1z&F1Jx)l 9')(t,)) = F(lxl'l 5')()

= F(1xF(1'IT) IT)() a.s.(#1, (10.80)

were tlaetirst equality also follows from 10.26(i) and 10.10. lntegrating over fwithrespect to #, noting f e V, using 4.8(ii) and the LIE, we arrive at

JxF(FIRdp = jgk'dp=JxF(Fl U4dP. A e Rx. (10.81)

This shows F(Fl Rx) is a version of F(FIN), competing the proof. .

Thus, while F(FIRx) is in principle Xvmeasurable, it is in fact almost surely I#Jequal to a V-measurable r.v. Needless to say, the whole argumentis symmetric in Xand Rp

The idea we are capturing here is that, to an observer who possesses the infor-mation in V (knowswhether (t) e G for each G e V),observing Xdoes not yield anyadditional information that improves his prediction of F, and vice versa. Thisneed not be true for an observer who does not possess prior information. Equation(10.77) shows that the predictors of F based on the smaller and larger informationsets are the same a.s.(#), although this does not imply F(Fl R = EY4 a.s., sothat X apd F are not independent in the ordinary sense.

Page 181: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

11Charqcteristic Functions

11.1 The Distribution of Sums of Random Variables

Let a pair of independent r.v.s X and F have marginal c.d.f.s Fx@) and Fy@).The c.d.f. of the sum J'P = X+ F is given by the convolution of Fx and Fs thefunction

+x

Fx*Fy(w) =J..Fx(w -ytdFv(9.

F are independent, then

Fx*Fy(w) = #(X+ F f w) = FI'*Fx(w).

(11.1)

11.1 Theorem If r.v.s X and

(11.2)

Proof Let 1w(my) be the indicator function of the set fmy:x K w - y), so that#(X+ F < w) = F(1w(X,F)). By independence Fxty) = FxxqFvy), so this is

+x P+xJs21w@,A')F@,A')= J..J.x1w@,.')tFx(I) dFyIyl

+x w-y

=J..J..JFx@) dFyvl

+x

= Fxtw - y)#Fy(y),-X

(11.3)

where the first equality is by Fubini's theorem. This establishes the first equal-ity in (11.2). Reversing the roles of X and i' in (11.3) establishes the second. w

For continuous distributions, the convolution f = fx*fyof p.d.f.s fx and h is+x

f(w)= j-.fxw-').fy(')*,

(11.4)

such that Jlx/'ttlx = F(w).

11.2 Example Let X and Fbe independent drawings from the uniform distribu-tion on 0,11, so that fxx) = 1m,1j@).Applying (11.4)gives

1

/k+r(w)= llw-l,wj#y.()

lt is easily veritied that the graph of this function forms an isosceles trianglewith base (0,21and height 1. n

Page 182: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

162

This is the most direct result on the distribution of sums, but the formulaegenerated by applying the rule recursively are not easy to handle, and otherapproaches are preferred. The moment generatinghlnction (m.g.f.)of X, when itexists, is

Mxtf) = Eeth =jedFx),

t e R, (11.6)

Probability

where e denotes the base of natural logarithms. (Integrals are taken over (-x,+x)unless otherwise indicated.) If X and i' are independent,

Mx-vvt) = jjetx*'gdt-xx4dll-y,=

jetxdnxjetydFyy)

= Mx(f)Mg(/). (11.7)

This suggests a simple approach to analysing the distribution of independent sums.The difficulty is that the method is not universal, since the m.g.f. is not

hdefined for every distribution. Considering the series expansion of e , all t emoments of X must evidently exist. The solution to this problem is to replace thevariable f by it, where i is the imaginary number, VX. The characteristicfunction (ch.f.)of X is defined as

itX itx9x(f)= Ee ) = e dFx). (11.8)

1 l.2

Complex Numbers

A complex number is z = a + ib, where a and b are real numbers and i = V-X.a andb are called the real and imaginary parts of the number, denoted a = Re(z) and b =

lm(z). The complex conjugate of z is the number i' = a - ib. Complex adthmetic is2 j) - j /3mainly a matter of carrying i as an algebraic unknown, and replacing y ,

4 1 tc wherever these appear in an expression.by -, i by , e .,

One can represent z as a point in the plane with Cartesian coordinates a and b.The modulus or absolute value of z is its Euclidean distance from the origin,

1/2 2 jp 1/2 j 9)lz l = (zD = a + ) . (1 .

Polar coordinaies can also be used. Let the complex exponential be defined byfo 0 + jsin 8 (11.10)e = cos

for real 0. All the usual properties of the exponential function, such as multi-plication by summing exponents (accordingto the nzles of complex arithmetic) gothrough under this definition, and

i l = (coszo+ sin2e)1/2 = 1 (11.11)Ie

Page 183: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 163

for any 0, by a standard trigonometric identity. We may therefore write z =

IzIei where Retz) = IzIcos 0 and 1m(z) = IzIsin 0. Also note, by (11.11),that

z j.Re(z)+Im(z) j =@Re(z)jef1m(z)j =

@Re(z) (j. j. jg;Ie 1 =. .

If X and i' are real random variables, Z = X+ iY is a complex-valued randomvariable. lts distribution is defined in the obvious way, in terms of a bivariatec.d.f., F@,y). In particular,

EZ) = F(A')+ iElY). (11.13)

Whereas EZ4 is a complex variable, FIZI is of course real, and since 1zl <

IXl + Il'I by the triangle inequality, integrability of X and F is suftkient forthe integrability of Z. Many of the standard properties of expectations extend tothe complex case in a straightforward way. One result needing proof, however, isthe generalization of the modulus inequality.

11.3Theorem If Z is a complex random variable, 4F(Z) I S FI . I.

Proot Consider a complex-valued simple nv.

zn)- 7'-(a./+ lllsy,j=L

(11.14)

where the xj and I are real non-negative constants and the Ej e 5 for j = 1,...,pconstitute a partition of f1. Write Pj = F(1sy). Then

2 2IEznl) 12 = 77xjpj + 77fjjpj

- 77(a??+f5,5/,7?+ 7777(%a,+ fbbrjn,j j*k

(11.15)

whereas

2 1/2 2CEIz(n)I) = >7(a,?+ p,?)Pj

./X (( + f5?)#?+ XX (%?+ Y?)'/2(ak2+ jk2)1/2#pk. (11.16)= J J J j./ j*k

The modulus inequality holds for Zn) if

0 S (E IZ(n)I)2- IF(z(n)) I2

7777E(a?+p,?)''2(a,2+p2,)1/2- (%a,+ 1%)1#./#,.= J (11.17)

The coefficients of Pjpk in this expression are the differences of pairs of non-negative terms, and these differences are non-negative if and only if thedifferences of the squares are non-negative. But as required,

Page 184: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

164 Probability

? If)(a2k+ p2,)- (%a.+ Ip.)2= a,?p2,+ a%? - zwakjf'lk(a,+ ,

2z O (1118)=(ay;'J.- a.I$) . .

This result extends to any complex r.v. having non-negative real and imaginaryparts by letting Z(n) = Xn) + in)

'1k

Z = X+ iY, using 3.28, and invoking themonotone convergence theorem. To extend to general integrable nv.s, splitXand Finto positive and negative parts, so that Z = Z+ - r, where Z+ = X* + iY*with X+ 2 0 and 1'+ k 0, and Z- = A?'-+ iT, with X- k 0 and lr- k 0. Noting that

IEZ) I < IE(Z+ + r) l < e'lz++ r I = FIzl (11.19)

completes the proof. .

11.3 The Theory of Characteristic Functions

We are now equipped to study some of the properties of the characteristic functionx(/). The fact that it is detined for any diskibution follows from the fact

xI = 1 for all x; E( Icfl'fl) = 1 and Eeth is finite regardless of thethat Iedistribution of X. The real and imaginary parts of #x(J) are respectivelyFtcos t.k and fsin tX4.

k hen11.4 Theorem If E IA-I< x, tkd px(/) k itx

= Eiuk e ).d

(11.20)

Proof

t-bht- #x(J) += eixt'ht - eitx

= #(x),h -x h (11.21)

where, using (11.10),

t-h) itx j (/+ 4- siu txe - e cOs xt + h4 - cOs tx s n.7

= + i .

h h h

The limits of the real and imaginary terms in this expression as h --> 0 arerespectively -x sin tx and i(x cos /x), so the limit if the integrand in (11.21)is

it:xi cos tx - sin fx) = fccos tx + i sin tx) = ie .

since1tfxlefr-l= Ixl, the integral exists if A'C-Y'I This proves (11.20) forthe case k = 1. To complete the proof, the same argument can be applied

k-1 itx j k . z a sinductively to the integrands (ix) e or , ,...

It follows that the integer moments of the distribution can be obtained byrepeated differentiation with respect to t.

Page 185: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 165

k h11.5 Corollary lf Fl#! < x, t en

24x(f) k k= i EX ). nkxdt

(11.22)

An alternative way to approach the last result is to construct a series expan-sion of the ch.f. with remainder, using Taylor's theorem. Tlzis gives rise to avery useful approximation theorem.

k hen11-6 Theorem lf FI.Y1 < x, t

k tffytil(1)x(/)- 77 !jj=0

l I/XIk IfXl +1

f E min , .

k! (k+ 1)! (11.23)

Proof A function f which is differentiable k times has the expansion

z'(0)1 Jz''(0)3./*(a/)

' o),+ , + , + ... + p,f(f) = f(0) + f ( z 6 :!it? )

where 0 f (x ; 1. The expansion of ft) = e glves#

'

#itx y'l

itxf Itx Ie = + yk,y! k!

k k jwx j; ayjtjwhere yk = i sgntfx) (e -

1, tx k 0sgnt/x =

.

- 1, tx < 0

Applying (11. 10) and (11 . 11), we can show that Iyk I = (2- 2cos a/x)1/2 < 2.However, by extending the expansion to tenn k + 1, we also have

(11.24)

k'

k+1itx itx/ ltx I

e =. + zk,g! k + 1)!

jzzo(11.25)

k+1 k+1 ift.T j () < (g s j and jzkj = 1. Gjven that both ofwhere zk = i sgnt/x) e or ,

(1 1.24) and (11.25) hold, we may conclude that

k'

k 1+l

itx- y.litxf

c min 2 Itx I Itx Id ,.

./! k! (k+ 1)! (11.26)

The theorem now follows on replacing x with the r.v. #in (11.26),taking expect-ations and using the modulus inequality:

Page 186: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

166 Probability

k j4e'tx./li

k (/m./t s , e - :79x(/)- 77 ! y!./ j--,j

l I/.X'1k I/-YI+1

f E min , . wk! (k+ 1)! (11.27)

k+1 i t for this theorem to hold, and we can thinkThere is no need for FIXl to ex sof it as giving the best approximation regardless of whether I/XI is large orsmall. To interpret the expectation on the right-hand side of (11.27), note that,for any pair of non-negative, measurable functions g and ,

Ftminf#tm,tml) = inf F(#(m1,4 + tmlxc), (11.28)Ae 3

the infimal set being the one containing those points x on which .(x)

< hlx). Inparticular,for any s 2 0, the set A = ( IXI> EJbelongs to the class over whichthe intimum in (11.28)is taken, and we get the further inequality,

k xl k+1 2 I/aYIk I?-YIk+1

.2 Itx I ItE m1n , f E 1(jxjlsj + E !lt jxjssjk'! k + 1)! k! (k+ 1)

2/ It Ik+1E,k+1k1 ) +E IA'1 f IxI> .:) k + 1)!k!

S

2/ I/'Ik+1

'1. ) + F1x'1 ktF(lxI ( IxI>e) (.+ 1)!k!

(11.29)

The second alternative on the light is obtained in view of the fact thatk+1 k xj k sothof these versions of theA'(I-YI1( IxI<c)) = F(l-Yl 1.X'11(1xI<e)) S A'1 e,.

bound on the truncation error prove useful subsequently.T*o other properties of the characteristic function will be much exploited.

First, for a pair of constants a and b,

fr(cx+h) ibt 1j aokax+b = E e ) = e #x(J/). ( .

The second is the counterpart of (11.7).For a pair of independent randomvariables X and F,

#x+y(/)=jjeitx*'tdlkxtdFy'

=jeitxdFxxljeivdFvlp

=$x(f)#,'(/). (11.31)

An interesting case of the last result is F = -X' where X' is an independent draw-ing from the distribution of X. The disibution of X- X' is the snme as that of

Page 187: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 167

X'-X, and hence this r.v. issince

symmetric about 0. The ch.f. of X - X' is real,

2 11 32)#x(f)9x(-f)= I4)xtflI , ( .

i f the fact that bx- = Ee-ith = #aV).It can be verified from thein v ew oexpansion in (11.23) that with a real ch. f., a1l the existing odd-order momentsmust be zero, the trademark of a symmetric distribution.

Considering more generally a sum S = :=1Xj where (X1,...,aL1 are a totallyindependent collection, recursive application of (11.31)yields

(1),(/)= 11(l)x(/).f=1

(11.33)

To investigate the distzibution of S, one need only establish the fonnulae linkingthe ch.f.s with the relevant c.d.f.s (or where appropriate p.d.f.s) which areknown for the standard sampling distributions.

11.7 Example For the Poisson distribution (8.8),x (j. yy..j jitX -- )u ;: i, )k(e9x(f',)= Ee ) = e 77w e = e . nX!

(11.34)

11.8 Example In the standard Gaussian case, (8.10),1 +oo 2itLl itz -z /29z(J) = Ee ) = e dz.

/-2l - x(11.35)

2/2 =-(z

- jf)2/2 - /2/2 and henceCompleting the square yields itz -

z

/2 1 += -(z+ff)2

-/2

z(f) = e- e dz = e .

27: -x

(11.36)

(The integral in the middle member has the value ulV for any choice of t, note.)Accordingly, consider X = cZ+ po whose p.d.f. is given by (8.24).Using

(11.30), we obtain

(11.37)

Equation (11.22) can be used to verify the moment formulae given in 9.4 and 9.7.With g, = 0 the ch.f. is real, reflecting the symmetry of the distribution. n

11.9 Example The Cauchy distribution (8.11)has no integer moments. The ch.f.-lfI hich is not differentiable at t = 0, as (11.22) would leadturns out to be e w

us to expect. The ch.f. for the Cauchy family (8.15)is: frv-lfl j j ag;#x(J; V, ) = e . E1 ( .

. c) .e/w-c2p/z(xlt, po .

The ch.f. is also defined for multivariate disibutions. For a random vector X(??zx 1) the ch.f. is

Page 188: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

168

#(f)= expfif'fl), (11.39)

where t is a v-vector of arguments. This case will be especially important, notleast because of the ease with which, by the generalization of (11.30), the ch.f.

can be derived for an affine transformation of the vector. Let F = BX +.d (kx 1),where B (kx m) and d k x 1) are constants, and then we have

g(/) = F(exp(ff'F)) = exp(I'/'#)F(exp(//'#.Y)) = expfit'dklxlB't). (11.40)

Probability

11.10 Example Let X m x 1) be multinormal with p.d.f. as in (8.36).rrhe ch.f. is

1 += +=

#x(f;M,E) =... expl/'.t - @- g)'E-1(x - M)l#-4rrn/z x j1/2 -. ..(2a) 1

= exp l it'p - rlfzy.z). (11.41)

The second equality is obtained as before by completing the square:' 1 - )'E-l (x - p) = itp - j/zD - 1(x - g - j&)I-1(x - g - j&)it x - g(x g, , ,

where it can be shown that the exponential of the last term integrates over R tomI1 x j1/2(2,n) I . n

11.4 The lnversion TheoremPaired with (11.8) is a unique inverse transformation hom #4/)to F@), so thatthe ch.f. and c.d.f. are fully equivalent representations of the distribution. Theclef step in the proof of this proposition is the construction of the inversetransfonnation, as follows.

11.11 Lemma If 9(f)is defined by (11.8), then

yn-- ittl -- itli1 e - eFb) - F(J) = 1im (dt2a ..w itF->x

(11.42)

for any pair a and b of continuity points of F,' with a < b. The multivariategeneralization of this formula is

m z' z' k -jx./

-

e-j.%+Ax.j)1 eAF(xl ,...,.q)

=

za1im ... J''I jjy-w

-z

jT-+x jzz

xtxj...xktt,---ntktdtb---dtk.(11.43)

wher AF(x1,...,.q) is defined in (8.31)and the vertices of the rectangle basedat the point x1,...,ak, with sides Lxj > 0, are a11continuity points of F. n

-- 1., wxza..1 xrolon: f.n /1 1 q0' Tt onn lw xzorifiofl Ilqine ( 1 1 10 thnt

Page 189: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 169

-ita-itb

d-d

--: b - a as t--> 0.it

The integrals in (11.42)and (11.43) are therefore well defined in spite ofincluding the point t = 0. Despite this, it is necessary to avoid writing (11.42)

as+x

'-ita '-itb

e - eFb) - F(J) = (f)Jf,lnit (11.44)

because the Lebesgue integral on the right may not exist. Forexample, suppose thea'it.j jrandom variable is degenerate at the point 0; this means that (/) = e =

,

and

F- ita

-itb

r j.1 e - e 1dt k -

--dt

- log F,21 -w it 7: 1 t

(11.45)

so that the criterion for I rbesgue integrability over (-x,+x) fails. However, thelimits in (11.42)and (11.43) do exist, as the proof reveals.

Proof of 11.11 Only the univariate case will be proved, the multivariate extensionbeing identical in principle. After substituting for () in (11.43) we can inter-change the order of integration by 9.32, whose continuity and a.s. boundednessrequirements are certainly satisfied here:

'-ita '-itb

d-d

dt =

r lnit

F-ita '-itb

e - e oo j/xe dF@) dt

21f/ -x-F

+x T itx-a) itx-b)d

-e

= t dF(x) .

lxit-x -F(11.46)

Using (11.10.),

' itx-a) itfx-b) z' j yx.u) F sjn j(x

.y)

e - e S nt = t - t,-.w lxit ()

'lt

() zt(11.47)

noting that the cosine is an even function, so that the terms containing cosines(which are also the imaginary terms) vanish in the integral. The limit as F

-..j

x

of this expression is obtained from the standard fonuula

zIl, tx > 0'='sin t () (:

.() (j j.4g)jot

dt =,

-1/2, tx < 0.

Page 190: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

170

Substituting into (11.46) yields the result

Probability

0, x < a or x > bl.x fftx-a) il(2'-8

e-d

t = l x = c or x = b7:lxit1, a < x < b.

(11.49)

Letting F-->x in (11.46) and applying the boundedconvergence theoremnow gives

T-ita -itb

+xe -

elim (tldt - (lfc)w lgit -x

F-px- + 1(s) + ta,bdFx)

= r1(F(:) + F(b-) - Fa) -F(J-)), (11.50)

which reduces to F(b4 - Fa) when a and b are continuity points of F. wLemma 11.11 is the basic ingredient of the following key result, the one thatprimarily justifies our interest in characteristic functions.

11.12 Inversion theorem Distributions having the same ch.f. are the same.

Proof We give the proof for the unvariate case only. By (11.42), the c.d.f.s ofthe two distributions are the same at every point which is a continuity point ofboth c.d.f.s. Since the set of jump points of each c.d.f. is countable by 8.3their union is countable, and it follows by 2.10 that the set of continuity pointsis dense in R. It then follows by 8.4 that the c.d.f.s are the same everywhere. .

A simple application of the inversion theorem is to provide a proof of a wellknown result, that affine functions of Gaussian vectors are also Gaussian.

11.13 Example Let,f - Ng,X) m x 1) and F= BX-d

n x 1) where B n x m) andd n x 1) are constants. Then by (11.42),

#g(f) = exptff'#lFtexpt#'sfll

= expfit'Bg +d) - qt'B'SBtj . (11.51)

If ranktS'xS) = n, (implyingn < m), 11.12 implies that F has p.d.f.

' #xn')-l(y -Bg-#))

expf-(y

- Bg - d) (.f(J) =

tz , 1/a.

(2r)n I#IN I(11.52)

lf rankt#xs') < n, (11.51) remains valid although (11.52) is not. But by the samearguments, every linear combination c'F, where c is p x 1, is either scalarGaussian with variance c'BSB'c, or identically zero, corresponding to the casesB'c :# 0 and #'c = 0 respectively. ln this case F is said to have a singularGaussian distribution, I::I

Page 191: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 171

11.5 The Conditional Characteristic Function

Let F be a F-measurable r.v., and 1et V c @. The conditional ch.f. of FIV,4)ysg(f), is for each t a random variable having the property

jgtrj g(/)## = jaeitp,all G e V. (11.53)

The conditional ch.f. shares the properties of the regular ch.f. whenever thetheory of conditional expectations parallels that of ordinary expectations accord-

ing to the results of Chapter 10. Its real and imaginary pms are, respectively,

the V-measurable random variables Ftcos /F1V) and fsin fFl N). It can beexpanded as in 11.6, in tenns of the existing conditional moments. lf X is

itxpjeityks; byV-measurable, the conditional ch.f. of X+ F is #x+gIv(/) = e10.10. And if i' is Vl-measurable and N and V1 are independent subfields, then#i'l5'(J) = l'(/) a.S.

The conditional ch.f. is used to prove a useful inequality due to von Bahr andEssen (1965).We start with a technical lemma which appears obscqre at firstsight, but turns out to have useful applications.

11.14 Lemma Suppose FIZIr < cx), 0 < r < 2. Then

+x1 - Re(#z(/))Fl zl r =

A'(rlj..

jyj 1+r t (11.54)

here K = (J+*(1- cos ?g)/1 u Il+rfJfl-l= a-1r'(r+ j) sin ra/2. nW -x

The last equality, with F(.) denoting the gamma function, is a standard integralformula for 0 < r < 2.

Proof The identity for real z,

r+oo 1 - cos ztlz l = Ar)J..jjj 1+, dt, (11.55)

is easily obtained by a change of variable in the integral on the right. The lemmafollows on applying 9.32 and noting that Re(9z(/)) = Acos tZ4. .

Tl'lis equality also holds, for ) e C with PC) = 1, if F( 1Zl rl N)((t))and 9zls(J)())

are substituted for Fl zl r and #z(f).In other words, the conditional rth momentand conditional ch.f are almost surely related by the same formula.

So consider fv-bounded r.v.s z and X, where Z is T-measurable, and X isN-measurable for N c 5. Suppose that #z!v(/)is a real r.v. almost surely. Thenfor each e f1,

itX1 - Re@x+zIs(f))() = 1 - Retd #zIs(f))())

= 1 - (cosJX())(zIg(/)())

S (1-

cos /X()) + (1 - 9zIs(J)()), (11.56)

Page 192: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

172 Probability

thedifference between the last two members being (1 - cos /X()))(1 - #z1g(/)()))whichis non-negative for all . Hence, for 0 < r < 2,

+x 1 - Re@x+zIs(/))E IX+ zl rl N) = #(r)J..jjj j+r t

+=1 - cos tx += 1 -

zlv(/)f #(r)J..jyj 1.r t + Ar)J..jyj j+, t

= l#l '+ E Izl r IV), a-s.

and taking expectations through yields

FIA-+zIr < FIXIr + f'l ZIr. (11.58)

For the case 0 < r S 1 this inequality holds by the cr inequality for general Zand X, so it is the case 1 < r < 2 that is of special interest here.

Generalizing from the remarks following (11.31), the condition that (zIv(/) bereal a.s. can be fulfilled by letting Z = Y- Y', where F and F' are identicallydistributed and independent, conditional on V. Note that if R = c(F), then

F(F'IX) = F(F'IT), a.s., (11.59)by 10.31. Identical conditional distributions means simply that Fv(. 1V) =

Fg(. IV) a.s., and equivalently that (gIs(/) = #glv(/) a.s. Hence

ffy-fry'

s)#r-r'1p(/)= Ee e litvs eje-itvjv)= Ee l )

2 j j 6o)= IrIs(/)l , a.s., ( -

(11.57)

where the right-hand side is a real r.v. Now, for each (t) e f1, the followingidentity can be verified:

a 22(1- Re(#y,v(/)()) = 1 - g1s(/)() l + Il - 9rIs(f)((.0) I . (11.61)

Applying (11.60) and 11.14, and taking expectations, this yields the inequality

2A'1FIr Fl Y- F'Ir, 0 < r < 2, (11.62)noting that the difference between the two sides here is the non-negative functionf r #(r)J+=F1 1 - 9y.1g(/) l2/j J'j 1+rJ/O , ..x

.

These arguments lead us to the following conclusion.

11.15 Theorem Suppose F(l'IV) = 0 a.s. and X is V-measurable where V i R =i

c(F), and both variables are Lr-bounded. Then

FI.Y+ F1r s FIXIr + 2FIFIr, 0 r 2. (11.63)

Proof Let F' be independent and identical with F, conditional on V. Applying(11.59), these conditions jointly imply f(F'I R) = &F'IV) = F(FI9') = 0. Noting

Page 193: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Characteristic Functions 173

that X+ F is A-measurable, it follows by 10.19 (in applying this result be care-ful to note that R plays the role of the subfield here) that

E IX+ FIr < FIX+ (i'- r) I'. (11.64)

The conclusion for 1 < r < 2 now follows on applying (11.58)for the case Z =

F- i'', and then (11.62). The inequality holds for 0 < r f 1 by the crinequality, and for r = 2 from elementary considerations since Elk'.,Y'I= 0. In theselatter cases the factor 2 in (11.63) can be omitted. .

This result can be iterated, given a sequence of r.v.s measurable on an increasing

sequence of c-tields. An easy application is to independent rov.s A'1,...,aL,forwhich the condition E(XtIc(-Y1,...,m-1)) = 0 certainly holds for t = 2,...,n.Letting Sn = ntzzkxt, c(&) = c(X1,...,Xn) and 11.15 yields

E IsnIr < A'l 'n-t Ir + 2Fl XnIr

< 2XA'ImIr, 0 < r < 2.>1

(11.65)

If the series on the majorant side converges, this inequality remains valid as n--> x. lt may be contrasted for tightness with the cr inequality for general Xt,9 62). In this case, 2 must be replaced by nr-1 for 1 < r S 2, which is of no use( .

for large n.

Page 194: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 195: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

lll

THEORY OF

STOCHASTIC PROCESSES

Page 196: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 197: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

12Stochastic Processes

12.1 Basic Ideas and Terminology

( 5 #) be a probability space, let 'T be any set, and let RV be the productLet ( , ,

space generated by taking a copy of R for each element of 'F. Then, a stochasticis a measurable mapping x: f r--y

R'T whereprocess j

xttDl = l-V((,)),':

e (12.1)'T is called the index Jtrf, and the r.v. Xs() is called a coordinate of theprocess. A stochastic process can also be characterized as a mapping from x'T toR. However, the significant feature of the definition given is the requirement ofjoint measurability of the coordinates. Something more is implied than havingXs(() a measurable nv. for each 1.

Here, 'Ir is an arbitrary set which in principle need not even be ordered,although linear ordering characterizes the important cases. A familiar example isT' = (1,...,k) , where x is a random k-vector. Another important case of 7 is an

'T the space ofinterval of R, such that x()) is a function of a real variable and Rrandom functions. And when 'T is a countable subset of R, x = fXv()), ':

G 7-)defines a stochastic sequence. Thus, a stochastic sequence is a stochastic processwhose index set is countable and linearly ordered. When the Xz represent randomobservations equally spaced in time, no relevant information is lost by assigninga linear ordering through ENor Z, indicated by the notations (.V((t)JO

andf-V()))''*-x.The definition does not rule out 1- containing information aboutdistances between the sequence coordinates, as when the observations are irreg-ularly spacd in time with ':

a real number representing elapsed time from a chosenorigin, but cases of this kind will not be considered explicitly.

Fnmiliarly, a time series is a time-ordered sequence of observations of (say)economic variables, although the term may extend to unobserved or hypotheticalvariables, such as the errors in a regression model. Time-series coordinates arelabelled t. If a sample is defined as a time series of finite length n (or moregenerally, a collection of #uch series for different variables) it is convenientto assume that samples are embedded in infinite sequences of dpotential' observa-tions. Various mathematical functions of sample observations, statistics orestimators, will also be well known to the reader, characteristically involving asummation of terms over the coordinates. The sample moments of a time series,regression coefficients, log-likelihood functions and their derivatives, arestandard examples. By letting n take the values 1,2,3,..., these functions of nobservationsgeneratewhatwemay call#crfvdtfsequences.The notionof asequence

Page 198: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Theor.y of Stochastic Processes

in this case comes from the idea of analysing samples of progressively increasingsize. The mathematical theory often does not distinguish between the types of

sequence under consideration, and some of our definitions and results applygenerally, but a clue to the usual application will be given by the choice ofindex symbol, / or n as the case may be.

A leading case which does not fall under the detinition of a sequence is where 'Tis partially ordered. When there are two dimensions to the observations, as in apanel data set having both a time dimension and a dimension over agents, x may becalled a random .#e/J. Such cases afe not treated explicitly here, although inmany applications one dimension is regarded as tixed and the sequence notion isadequate for asymptotic analysis. However, cases where 'Tis ither the product set; x(N, or a subset thereof, are often met below in a different context. Atriangular stochastic array is a doubly-indexed collection of random variables,

X1l.Yla

11l

Xz1X22

X51

X51

X3k1

(12.2)

e'D

h r f/c)'=' is some increasing integercompactly written as ((Akmlmil)n=1, w e n a=l

sequence. Array notation is called for when the points of a sample are subjected

to scale transformations or the like, depending on the complete sample. A standardexample is ( lA-n,))=1)=n=j,where Xnt = Xtlsn, and sn = Z7=1Var(X,), or somesimilar function of the sample moments f'rom 1 to n.

12.2 Convergence of Stochastic SequencesConsider the functional expression (Ak())

';

for a random sequence on the spaceL1,5,P). When evaluated at a point (l) e this denotes a realization of the

sequence, the actual collection of real numbers generated whenthe outcome isdrawn. lt is natural to consider in the spirit of ordinal'y analysis whether thissequepce converges to a limit, say Al. Tf this is the case for every e f2, wewould say that Xn

--y Xsurely (orelementwise) where, if Xnis an T/f-measurablenv. for each n, then so is X, by 3.26.

But, except by direct construction, it is usually difficult to establish interms of a given collection of distributional properties that a stochastic

sequence converges surely to a limit. A much more useful notion (becausemoreeasily shown) is almost sure convergence. Let C ? fl be the set of outcomes such

Page 199: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Processes 179

that, for every (t) e C, Xn) --> .Y()) as n-->

x. If #(C) = 1, the sequence is said

to converge almost surely, or equivalently, with probability one. The notations Xn-F.&: X, or Xn

-->

X a.s., and a.s.lim-,, = X are al1 used to denote almost sureconvergence. A similar concept, of convergence almost everywhere (a.e.),wasinvoked in connection with the properties of integrals in j4.2. For many purposes,almost sure convergence can be thought of as yielding the same implications assure convergence in probabilistic arguments.

However, attaching probabilities to the convergent set is not the only way inwhich stochastic convergence can be understood. Associated with any stochasticsequence are various non-stochastic sequences of variables and functionsdescribing aspects of its behaviour, moments being the obvious case. Convergenceof the stochastic sequence may be defined in tenns of the ordinary convergence of

2 h isan associated sequence. If the sequence fExn - A'') 17converges to zero, t erea clearly a sense in which Xn

-->

X.,this is called convergence in mean square. Orsuppose that for any 6: > 0, the probabilities of the events ( ): IX,,() - X()) l <:1 e 5 form a real sequence converging to 1. This is another distinct convergenceconcept, so-called convergence in probability. In neither case is there anyobvious way to attach a probability to the convergent set; this can even be zero!These issues are studied in Part lV.

Another convergence concept relates to the sequence of marginal p.m.s of thecoordinates, fgwl!', or equivalently the marginal c.d.f.s, (F,,l7. Here we canconsider conditions for convergence of the real sequences fgwtAlITfor various

sets A e , or alternatively, of (F,,(x) J7 for various x e R . ln the latter case,uniform or pointwise convergence on R is a possibility, but these are relatively

strong notions. It is sufficient for a theory of the limiting distribution ifconvergence is confined just to the continuity points of the limiting function F,or equivalently (aswe shall show in Chapter 22) of gwtAl, to sets A having g,tAl

= 0. This condition is referred to as the wdck convergence of the distributions,and forms the subject of Part V.

12.3 The Probability Model

Some very important ideas are implicit in the notion of a stochastic sequence.Given the equipotency of INand 1, it will suffic to consider the random element(A((l)))T, e f1, mapping from a point of to a point in infinite-dimensionalEuclidean space, R=. From a probabilistic point of view, the entire infinite

sequence corresponds to a single outcome ) of the underlying abstract probability

space. ln principle, a sampling exercise in this umework is the random drawingof a point in R=, called a realization or sample path of the random sequence', wetnay actually observe only a tinite segment of this sequence, but the key idea isthat a random experiment consists of drawing a complete realization. Repeatedsampling means observing the same tinite segment (relativeto the origin of theindex set) of different realizations, not different segments of the samerealization.

The reason for this characterization of the random experiment will become clear

Page 200: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

18O

in the sequel; for the moment we will just concentrate on placing this slightlyoutlandish notion of an intinite-dimensioned random element into perspective. Toshow that there is no difficulty in establishing a correspondence between aprobability space of a familiar type and a random sequence, we discuss a simpleexample in some detail.

Theot'y of Stochastic Processes

12.1 Example Consider a repeated game of coin tossing, generating a randomsequence of heads and tails; if the game continues for ever, it will generate asequence of infinite length. Let 1 represent a head and 0 a tail, and we have arandom sequence of ls and 0s. Such a sequence corresponds to the binary (base2)representation of a real number; according to equation (1.15) there is a one-to-one correspondence between infinite sequences of coin tosses and points on theunit interval. On this basis, the fundamental space (f,T) for the coin tossingexperiment can be chosen as ((0,1),S(0,l)).The form of P can be deduced in anelementary way from the stipulation that Ptheads) = Pttails) = 0.5 (i.e.the coinis fair) and successive tosses are independent. For example, the events ftails onfirst toss ) and lheads on Erst toss) are the images of the sets (0,0.5)andg0.5,1)respectively, whosemeasures must accordingly beO.seach. More generally,the probability that the first n tosses in a sequence yields a given configurationof heads and tails out of the ln possible ones is equal in every case to

1/2N,

sothat each sequence is (in an appropriate limiting sense) tequally likely'. Thecorresponding sets in (0,1)of the binary expansions with the identical pattern ofOs and ls in the tirst n positions occupy intervals al1 of width precisely $l2n inthe unit interval. The conclusion is that the probability measure of any intervalis equal to its width. This is nothing but Lebesgue measure on the half-openinterval g0,1).n

This example can be elaborated from binary sequences to sequences of real vari-ables without too much difficulty. There is an intimate connection betweeninfinite random sequences and continuous probability distributions on the line,and understanding one class of problem is frequently an aid to understanding theother. The question often posed about the probability of some sequence predictedin advance being realized, say an infinite run of heads or a perpetual alternationof heads and tails, is precisely answered. In either the decimal or binary expan-sions, all the numbers whose digit sequences either tenuinate or, beyond somepoint, are found to cycle perpetually through a finite sequence belong to the setof rational numbers. Since the rasionals have Lebesgue measure zero in the spaceof the reals, we have a proof that the probability of any such sequence occurringis zero.

Another well-known conundrum concerns the troupe of monkeys equipped withtypewriters who, it is claimed, will eventually type out the complete works ofShakespeare. We can show that this event will occur with probability 1. For thesake of argument, assume that a single monkey types into a word processor, and hisAscll-encoded output takes the form of a string of bits (binarydigits). SupposeShakespeare's encoded complete works occupy k bits, equivalent to kl5 charactersallowing for a 3z-character keyboard (upper-caseonly, but including som

Page 201: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Processes 181

i f the lk ossible strings of k bits.punctuation marks). This stling s one o pAssuming that each such string is equally likely to arise in k/5 random keypresses, the probability that the monkey will type Shakespeare without an error

-k H ever the probability that the second string of lkfrom scratch is exactly 2 . ow ,

bits it produces is the right one, given that the tirst one is wrong, is'-k2- h the strings are independent. In general, the probability that(1 - 2 ) w en

the monkey will type Shakespeare correctly on the m + llth independent attempt,-k '&2- A11 these eventsgiven that the first m attempts were failures, is (1 - 2 ) .

are disjoint, and summing their probabilities over a11m k 0 yields

fmonkey types Shakespeare eventually) = 1.

In the meantime, of course, the industrious primate has produced much of the restof world literature, not to mention a good many telephone books. It is alsoadvisable to estimate the length of time we are likely tcj wait for the desiredtext to appear, which requires a further calculation. The average waiting time,

d in units of the time taken to type k bits, is 2-E,1=1rrl(1- l-k4m =expressek 1 If we scale down our ambitions and decide to be content with just $TO BE2 -

.

OR NOT TO BE' (5x 18 = 90 bits), and the monkey takes 1 minute over each21 s the Complete Works don'tattempt, we shall wait on average 2.3 x 10 years. o

really bear thinking about.What we have shown is that almost every infinite string of bits contains every

t'inite string somewhere in its length', but also, that the mathematical concept oftalmost surely' has no difficulty in coinciding with an everyday notionindistinguishable from 'never'

. The example is frivolous, but it is useful to bereminded occasionally that limit theory deals in large numbers. A sense of pers-pective is always desirable in evaluating the claims of the theol'y.

The first technical challenge we face in the theory of stochastic processes isto handle distributions on R=. To construct the Borel field 000of events on R7 weimplicitly endow R= with the Tychonoff, or product, topology. It is not essentialto have absorbed the theory of j6.5 to make sense of the discussion that follows,but it may help to glance at Example 6.18 to see what this assumption implies.

Given a process x = f.il7 we shall write

x (jyk jz a)7:k(x)= (m,...,A): R F-> ( .

for each k e EN,to denote the k-dimensional coordinate projection. Let C denotethe collection of jnite-dimensional cylinder sets of R=, the sets

oo Bk k e s ) (124)C = (x e R : ;u@) e E, E e , .

In other words, elements of C have the fonn-1 z 5)C = C:k (F) (1 .

k d some finite k. Although we may wish to consider arbitraryfor some E e T , anfinite dimensional cylinders, there is no loss of generality in considering theprojections onto just the first k coordinates. Any tinite dimensional cylinder can

Page 202: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

182 Theor.y of Stochastic Processes

-1 E f where k is just the largestbe embedded in a cylinder of the form lk (F), e ,

of the restricted coordinates. The distinguishing feature of an element of C isthat at most a tinite number of its coordinates are restricted.

12.2 Theorem C is a tield.Proof First, the complement in R= of a set C defined by (12.4)is

c R= Ec E e fl = a-1(Ff) (126)C = fx e : (ak(x)e , k , .

which is another element of C, i.e. Cc s C. Second, consider the union of sets C-1 C d C' = a-1(F') e C for E, E' G Bk C tp C' is given by (12.4)= Jck E) e an k , .

' C' C Tlzird if E e Bkand E' e Bmforwith E replaced by E tp E , and hence Ck.l e . ,

m-k Bm d so the argument of the second case applies. .m > k, then Ex? e , an

Fig. 12.1

It is not easy to imagine sets in arbitrary numbers of dimensions, but good3visual intuition is provided by thinking about one-dimensional cylinders in R .

Letting @,y,z)denote the coordinate directions, the one-dimensional cylindergenerated by an interval of the .x axis is a region of 3-space bounded by twoinfinite planes at right angles to the x axis (see Fig. 12.1 for a cut-awayrepresentation). A union of x-cylinders is another x-cylinder, a collection ofparallel <walls'

. But the union and intersection of an x-cylinder withy-cylinder are two-dimensional cylinder sets, a cross' and a column' respec-tively (seeFig. 12.2).

k f fixed k isThese examples show that the collection of cylinder sets in R or' he intersection of three mutually orthogonal twalls' in R3 is anot a field, t

bounded Kcube',

not a cylinder set. The set of tinite-dimensionalcylinders is nbtclosed under the operations of union and complementation (andhence intersection)except in an infinite-dimensional space. This fact is critical in consideringc(C), the class obtained by adding the countable unions to C. By the last-mentioned property of unions, c(C) includes sets of the form (12.4)with k tendingto infinity. Thus, we have the following theorem.

Page 203: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Processes 183

12.3 Theorem c(r) = f=, the Borel field of sets in R= with the Tychonoff topol-ogy. I:l

Fig. 12.2

The condition of this result is something we can take for granted in the usualapplications. Recalling that the Borel field of a space is the jmallest c-fieldcontaining the open sets, 12.3 is true by definition, since C is a sub-base forthe product topology (see j6.5) and a11the open sets of R= are generated byunions and finite intersections of C-sets. To avoid explicit topological consider-ations, the reader may like to think of 12.3 as providing the definition of f=.

One straightforward implication, since the coordinate projections are continuousmappings and hence measurable, is that, given a distribution on @=,=), finitecollections of sequence coordinates can always be treated as random vectors. But,while this is obviously a condition that will need to be satistied, the realproblem runs the other way.

'l'he mtypractical method we have of defining distrib-utions for infinite sequences is to assign probabilities to finite collections ofcoordinates, after the manner of j8.4. The serious question is whether this can bedone in a consistent manner, so that in particular, there is one and only one p.m.on (R7S=) that corresponds to a given set of the finite-dimensional distribu-tions. The affirmative answer to this question is the famous Kolmogorov consis-tency theorem.

12.4 The Consistency Theorem

The goal is to construct a p.m. on @*,f*),and, following the approach of j3.2,the plausible first step in this direction is to assign probabilities to elementsf C Let p denote a p.m. on th space (R,f$, for k = 1,2,3,.... We will sayO .

that this family of measures satisfies the consistency property ifm-k

gt' ) = g'-t' X R ) (12.7)

Page 204: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

184

for E G Bk and al1 m > k > 0. ln other words, any k-dimensional distribution canbe obtained from an v-dimensional distribution with m > k, by the usual operationof marginalization.

The consistency theorem actually generalizes to stochastic processes withuncountable index sets 1- (see27.1) but it is sufficient for present purposes toconsider the countable case.

Theory of Stochastic Processes

12.4 Kolmogorov's consistency theorem Suppose there exists a family of p.m.s(gkJwlich satisfy consistency condition (12.7).Then there exists a stochasticsequence

-'r = (m,t e NJon a probability space @=,f=,g,)such that p is the p.m.of the tinite vector of coordinate functions (.L,...,Xk)'. u

The candidate measure for .x is defined for sets in C by

FIC) = 11,1(6-), (12.8)where C and E are related by (12.4).The problem is to show that g,is a p.m. on C.If this is the case, then, since C is a field and f= = c(C), we may appeal to theextension theorem (3.8+3.13)to establish the existence of a unique measure on(R*,%=) which agrees with g for all C e C. The theorem has a simple but importantcorollary.

12.5 Corollary C is a determining class for (R7f=). E1

ln other words, if g and v are two measures on (R=,f=) and jtk = vk for everyfinite k, then jt = v.

To prove the consistency theorem we require a technical lemma whose proof isbeyond us at this stage. It is quite intuitive, however, and will be proved in amore general context as 26.23.

d > 0 there exists K, a compact subset of E,12.6 Lemma For every E G T ansuch that pkE - A') < . n

Bk has nearly all of its mass confined toIn other words, a p.m. on the space (R , )a compact set; this implies in particular the proposition asserted in j8.1, thatrandom variables are finite almost surely.

Proof of 12.4 We will verify that jt of (12.8)satisfies the probability axiomsk C = R= so that the first two probabil-with respect to elements of C. When E = R ,

ity axioms, 7.1(a) and (b), are certainly satistied. To establish finite additiv--1 d C' = zr-1(F') for E e Tk E' e Tmandity, suppose we have C-sets C = gk (AD,an m ,

m k k. If C and C' are disjoint,

' F) + jt (F') = pmEx R'&-$ + p,m(A'')p,(O+ g,tC ) = g( m

qvl-J7 ,

c kp c')= gmE x tp E ) = g,( , (12.9)

where the second equality applies the consistency condition (12.7),and the thirdm-k z z

one uses the fact that E x R and E are disjoint if C and C are.The remaining, relatively tricky, step is to extend tinite additivity to count-

Page 205: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Processes 185

able additivity. This is done by proving continuity, which is an equivalent

property according to 3.5. lf and only if the measure is continuous, a monotonesequence tCj e C) such that . 1 C or G'

'1-

C has the property, g,(G)-- p,IC).Since Cj

'l'

C implies CJ.1e

Cc where g(C9 = 1 - jttt7), it is sufficient toconsider the decreasing case. And by considering the sequence Cj - C there also isno loss of generality in setting C = 0, so that continuity implies g,tGl-- 0. Toprove continuity, it is sufticient to show that if g(() 2 6: for some E > 0, forevery j, then C is nonempty.

lf C e C for some j k 1, then p,(G)= pkqlEj) for some set Ej G Bkct where?/cU) ls the dimension of the cylinder Cj. By consistency, pkqlEp =

'''-U) for any m > 17U),so there is no loss of generality in assumingpmEj X R )kq)that k41) < k42) < ... < lU) < .... We may therefore define sets F1 e LB ,

=

1,...,./, by setting E) = A) and

ko-k i = 1 j - 1. (12.j())1. = Ei xR , ,...,

sincefc/),t,-1is a decreasing sequence, so is the sequence of euhsets(A1).t,-1,for each j k 1.

Consider any tixed j. There exists, by 12-6, a compact set Kj c Ej such that

y+1 !g ! !)bkkqfEj- &')< E/2 . ( .

kil-ki) lk) b analogy with the A''1,and so defineDetine the sets K1 = Kix R e y

F = O#t; e Bkctj .

'=1

Fj i E), and hence Dj ? Cj where

-1 s ) s cDj = lk ./

.

Applying 1.1(iii) and then 1.1(i), observe that

(12.12)

(12.13)

j./

jEj - Fj = Ej ch K'ic = Ej - K*i) c E*i- r*f),

f=1 i=1 f=1(12.14)

where the inclusion is because the sequence tA'tl)'J.1is decreasing. Hence

bkkitEj - F./) f bklilE t, - A-t,)f=1

= bkEi - Kit < E/2.f=1

(12.15)

The first inequality here is from (12.14) by finite subadditivity, which followsfrom finite additivity as a case of 3.3(iii). The equality applies consistency,

(i i lity applies the summation of 2-f-1 Since E - Fj and Fj areand the secon nequa . jdisjoint and got.l = g,(G)> : by assumption, it follows from (12.15) thatbklh) = pkqlFj) > e/2, and accordingly that Dj is nonempty.

Now. we construct a Doint of C. Let Ixlij. i e (N1 denote a seauenc of ooints

Page 206: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

186

of R= with x(J)e Dj for each j, so that

(X1U),...,Xw)U))= lk(j)@U)) Fj.

Note that for m = 1,...,./,

(XlU),...,Akm)U))= '&(,?,)(xU))1 Km,

F/let??'yof Stochastic Processes

(12.16)

(12.17)

by (12.12), where Kmis compact. Now 1et m be tixed.Our reasoning ensures thatkm' is compact if and only if each of the(12.17) holds for each.j k m. A set in R

coordinate sets in R is compact, so consider the bounded scalar scquences,fxibij, j r?z) for f = 1,...,k4-). Each of these has a cluster point A'1, and wecan use the diagonal method (2.36)to construct a single subsequence (Al with theproperty that Xijn)

.-+ A': for each = 1,...,k(-). By the compactness,(X1,...,X(m))e Km i Em. This is true for every m e IN.

Consider the point a7*e R= defined by 7v(,,,)(x*) = (X1,...,X(m))for each m e N.Since x* e Cm for each m, we have x* G O'>;=ICk= C, as required. .

This theorem shows that, if a p.m. satisfying (12.7)can be assigned to thefinite-dimensional distributions of a sequence m x is a random element of aprobability space

('R*,fO,g,).

We shall often wish to think of @=,f7jt) as derivedfrom an abstract probability space (f,T,#), and then we shall say that x is

''O

ble if F1(A>)e 5 for each event E e S*. This statement implies the5lB -measura

coordinates Xt are F/f-measurable r.v.s for each /, but it also implies a greatdeal more than this, since it is possible to assign measures to events involvingcountably many sequence coordinates.

12.5 Uniform and Limiting Properties

Much the largest part of stochastic process theory has to do with the jointdistributions of sets of coordinates, under the general heading of dependence.Before getting into these topics, we shall deal in the rest of this chapter withthe various issues relating exclusively to the marginal distributions of thecoordinates. Of special interest are conditions that limit the random behaviour of

a sequence as the index tends to infinity. The concept of a unform condition onthe marginal distributions often plays a key role. Thus, a collection of rov.stX,r, T e

-1)

is said to be uniformly bounded in probability if, for any 8 > 0,there exists Bg < x such that

sup #( IXzI ) < :.'

It is also said to be unjrmly Lp-bounded for p > 0, if

(12. 18)

supI1ALll?,< B < =.

'(12.19)

For the case p = x, (12.19)reduces to the condition, supmlXzI < x a.s. ln thiscase we just say that the sequence is uniformly bounded a.s. For the case p = 1,

h f'IX I < x and one might think it conrct to refer to this property aswe ave sups z ,

Suniform integrability' , Unfortunately. this term is already in use for a

Page 207: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Processes

different concept (seethe next section) and so must be avoided here. Speak oftuniform fal-boundedness' in this context.

To interpret these conditions, recall that in mathematics a property is said tohold unifonnly if holds for all members of a class of objects, including thelimits of any convergent sequences in the class. Consider the case where thecollection in question is itself a sequence, with 'T = E$and 'r = /. Randomvariables are finite with probability 1, and for each finite t e IN,#( 1Al Be< s always holds for some Bet< x, for any : > 0. The point of a unifonn bound isto ensure that the constants Bet are not having to get larger as t increases.<Bounded unifonnly in t' is a different and stronger notion than tbounded for allt e N' , because, for example, the supremum of the set f lIAIIplTmay lie outside theset. If IIAIIp< Bt < x for every /, we would say that the sequence was Lp-bounded,but not unifonnly fw-bounded unless we also ruled out the possibility that Bt

-->

x

as t-->

x (or-x).

Note that the statement 11.&11p; B, t e s' , where B is the samefinite constant for a11 /, is equivalent to dsupllAllp ; #', because the fonnercondition must extend to any limit of ( 11.Y,11p1.But the xsup' notation is lessambiguous, and a good habit to adopt.

The relationships between integrability conditions studied in j9.3 and j9.5 canbe used here to establish a hierarchy of boundedness conditions. Uniformfv-boundedness implies unitonu fv-boundedness for r > p > 0, by Liapunov'sinequality. Also, uniform fv-boundedness for anyp > 0 implies uniform bounded-ness in probability; the Markov inequality gives

#(IAI B4 (12.20)

terminology we sometimes speak of fvboundedness in the case of (12.18).A standard shorthand (dueto Mann and Wald 1943b) for the maximum rate of

(positive or negative) increase of a stochastic sequence uses the notion otunifonu boundedness in probability to extend the tBig Oh' and Ktaittle Oh' notationfor ordinary real sequences (see j2.6). lf, for E > 0, there exists Be < x suchthat the stochastic sequence fXn)Tsatisfies supa't 1XnI > B < E, we write Xn =

0p(1). lf (L)T is another sequence, either stochastic or nonstochastic, and XnlYn

= 0p(1) we say that Xn = Op(L), or in words, &Xnis at most of order Fn inrobability' . If #( IXnI > :) --> 0 as n

--)

x, we say that Xn = tN(1); morePgenerally Xn = o (L') when Xnlk'n= t7p(1), or in words, GXnis of order lss thanrl'n in probability .

The main use of these notations is in manipulating small-order terms in anexpression, without specifying them explicitly. Usually, Fn is a positive ornegative power of n. To say that Xn = tV1)is equivalent to saying that Xnconverges in probability to zero, following the terminology of j12.2. SometimesXn = Op(1) is defined by the condition that for each s > 0 there exists Be < x and

an integer Ng > 1 such that #( IXnl > B < s for a1l n Nz. But Xn is finitealmost surely, and there necessarily exists (thisis by 12.6) a constant BL< =,

ib1 1 er than Be, such that #( IXnl > #E') < E for 1 f n < Ne. For al1poss y argpractical purposes, the formulations are equivalent.

Page 208: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

188

12.6 Uniform lntegrability

lf a r.v. X is integrable, the contributions to the integral of extreme X values

must be negligible. ln other words, if FIXIf(IX1 1(IxlzM)) --> 0 as M

-->

x. (12.21)

Iheor.y of Stochastic Processes

However, it is possible to construct unifonnly Al-bounded sequences l#nl whichfail to satisfy (12.21)in the limit.

12.7 Example Detine a stochastic sequence as follows: for n = 1,2,3,... let Xn = 0with probability 1 - 1/rI, and Xn = n with probability 1/a. Note that E( lXnl) = nln

= 1 for every n, and hence the sequence is uniformly Al-bounded. But to have

limE IXnIlt lxalA/l) = 0 (12.22)MM=

uniformly in n requires that for each e > 0 there exists Ms such thatE lXnl 1( xajkvl) < e for al1 M > Mo uniformly in a. Clearly, this conditionfails, for e: < 1, in view of the cases n > Me. nSomething very strange is going on in this example. Although E(Xn4= 1 for any n,Xn = 0 with probability approaching 1 as n

--+

x. To be precise, we may show thatXn

-613...:

0 (see18.15). The intuitive concept of expectation appears to fail whenfaced with nv.s taking values approaching intinity with probabilities approachingZero.

The unform integrability condition rules out this type of perverse behaviour ina sequence. The collection (Xz, $ e T'Jis said to be uniformly integrable if

sup 1im F( lXx11( xvl M) ) = 0.Te 7 M-yx

(12.23)

ln our applications the collection in question is usually either a sequence or anarray. In the latter case, uniform integrability of (Xkf) (say) is defined bytaking the supremum with respect to both t and n.

The following is a collection of theorems on unifonn integrability which willtind frequent application later on; 12.8 in particular provides insight into whythis concept is so important, since the last example shows that the conclusiondoes not generally hold without uniform integrability.

12.8 Theorem Let l.YnlTbe a unifonnly integrable sequence. If Xn -6J- X, thenE(Xn)

--$

E (A').

Proof Note that

E iXnI) = E IXnl1(!xnI<Af)) + E lXnl1(IxaIkM))

s A.T+ Ftlxa l1( jx,,1kM)). (12.24)

By choosing M large enough the second tenn on the right can be made uniformlysmall by assgmption, and it follows that FIXnj is uniformly bounded. Fatou's

Page 209: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

stochasticProcesses 189

lemmaimplies that FIXl < x, and F(m exists. Define l',, = I.L --YI, so that i',z--

0 a.s. Since Fn < IaLl+ l-Yl by the triangle inequality, 9.29 gives

F(i'n1(yn>M)) f 2F( IXnl 1( lxnlxv/all + 2F( lXl 1( IxI>v/2)). (12.25)The second right-hand-side term goes to zero as M .--.)

x, so fFn) is uniformlyintegrable if (A%) is. We may write

f(Fn) = '(l%1(yn<M)) + A'(L1(yn>v)), (12.26)and by the bounded convergence theorem there exists, for any E > 0, Ne such thatF(l%1(ya<>)) < E/2 for n > #:, for M < oo. M can be chosen large enough thatF(L1( ys>>)) < :/2 uniformly in n, so that EYn) < E for n > Nv.,or, since 6: isarbitrary, EYn) --.-) 0. But

EYnq = F(I-L--YI) k IF(Xn) -F(ml (12.27)by the lnodulus inequality, and the theorem follows. w

The next theorem gives an alternative form for the condition in (12.23)which isoften more convenient for establishing uniform integrability.

12.9 Theorem A collection fXs, ':

e 1-) of nv.s on a probability space (f1,1,#)isuniformly integrable iff it is uniformly Al-bounded and satisties the followingcondition: V E, > 0, 3 8 > 0 such that, for E E 5,

P(E4 S sup (F41Xzl1s) l < z. (12.28)': e 'T

Proof To show sufficiency, fix : > 0 and 5 E 7. By Al-boundedness and theMarkov inequality for p = 1,

E lxzI#(1-V1k Af) f < x,

M (12.29)

and for M large enough, #(l-YzI Af) S 6, for any > 0. Choosing to satisfy(12.28), it follows since ,1 is arbitrary that

Sup lA'(lX,rl 1f Ixzlwzlll < E, (12.30)TeT

and (12.23)follows since E is arbitrary.To show necessity, note that, for any E e 5 and 'r

E 1-,

F(1 Xvl 1s) = E l-Y'rI1s1( Ixz1<MJ) + .E'(I.YZI1s1( lxvI&z))

< MPCEI+ F(1 -Ysl1(Ixzlkvl). (12.31)

Consider the suprema with respect to'r of each side of this inequality. For e > 0,

(12.23) impliej there exists M < t:'o such that

sup1F( 1AL11s)l s MPCE) + r1:. (12.32)T

Uniform Al-oundedness now follows on setting F = f, and (12.28)also follows

Page 210: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

190

with 8 < zllM. *

Another way to express condition (12.28)is to say that the measures vztp =

Jslxzl## must be absolutely continuous with respect to #, uniformly in :.Finally, weprove aresultwhich showswhy theuniformboundednessof moments

of a given order may be important.

12.10 Theorem lf1+8FIXzI < cxn (12.33)

for 0 > 0, then limv-pxlx'rl 1(IxzIuv)) = 0.

Proof Note that1+e s jx j1+91 )E IX,t1 ( z t l-vzlkxf)

2 S?E (IX1I1( jxvlzwzl) (12.34)

Theor. of Stochastic Processes

for any 0 > 0. ne result follows on letting M..-+

x, since the majorant side of(12.34) is finite by (12.33)..

Example 12.7 illustrated the fact that uniform fal-boundedness is not sufficientfor unifonn integrability, but 12.10 shows that unifol'm falma-boundedness issufficient, for any 0 > 0. Adding this result to those of j12.5, we haveestablished the Merarchy of uniform conditions summarized in the followingtheorem.

12.11 TheoremUniform boundedness a.s. zz: uniform fw-boundedness, p > 1

= uniform integrability= uniform fw-boundedness, 0 < p S 1= uniform boundedness in probability. n

None of the reverse implications hold.

Page 211: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

13Dependence

13.1 Shift Transformations

We now consider relationships between the different members of a sequence. lnwhat ways, for example, might the jointdistdbution of Xt and Xt-kdepend upon k,and upon /? To answer questions such as these it is sometimes helpful to thinkaboutthe sequencein anew way. Havingintroducedthenotionof arandom sequenceas a point in a probability space, there is a useful analogy between studying therelationships within a sequence and comparing different sequences, that is,different sample outcomes (l) e f.

ln a probability space (f1,@,#),copsider a 1-1 measurable mapping, F: F->

(onto). This is a rule for pairing each outcome with another outcome of the space,but if each ) e f maps into an infinite sequence, Finduces a mapping from onesequence to another. F is called measure-preserving if PCTEI= PE) for a11E e F.

i defined byloThe shift transformation for a seqence (X/)JT s

X(F) = X,+1(). (13.1)

F takes each outcome ) into the outcome under which the realized value of Xoccurring in period t now occurs in period t- 1, for every t. In effect, eachcoordinate of the sequence from t = 2 onwards is relabelled with the previous

' index. More generally we can write XtlTkl = Xt-vk), the relationshipperiod sbetween points in the sequence k periods apart becoming a characteristic of the

k Since Xt is a nv. for a1l f, both the shift transformationtransformation F .

-1 h backsh;ft transformation, must be measurable.and its inverse F , t eTaken together, the single r.v. X1(): f r- R and the shift transformation F,

can be thoght of as generating a complete description of the sequence (X/))JT.This can be seen as follows. Given #14), apply the transformation F to ), andobtain Xz = X1(F(0).Doing this toreach ( e ( defines the mapping A%((0):f F-> R,and we are ready to get Xg = X2(F(0). ltrating the procedure generates as manypoints in the sequence as we require.

13.1 Example Consider 12.1, Let (Xf())JTbe a sequence of coin tosses (with1for heads, 0 for tails) beginning 11010010001 1... (say).Somewhere on the intelwalg0,1) of real numbers (in binary representation), there is also a sequence(A()') J7beginning 10100100011..., identical to the sequence indexed by (l) apart

om the dropping of the initial digit and the backshift of the remaindr by oneposition. Likewise there is another sequence fXt(''4 1T,a backshifted version of

' b inning 0100101 1...; and so foih. If we define the transformationtaVt(0)17 eg

Page 212: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

192

-1 ' r-1)' = )j'' etc. the sequence (Xl(() )Tcan be constructed asF by F ) = (l) , , ,1-J

' h t is the sequence of first members of the sequencesthesequence lm(F ))T,t a ,

foundby iterating the transformation, in this case beginning 1,1,0,... I:l

Theory of Stochastic Processes

This device reveals, among other things, the complex stnlcture of the probabilityspace we are postulating. To each point (.t) e fl there must correspond a countably

f hich reproduce the same sequence apart from theinfinite set of points F ) e , wabsolute date associated with m.The intertemporal properties of a sequence canthen be treated as a comparison of two sequences, the original and the sequencelagged k periods.

Econometricians attempt to make inferences about economic behaviour fromrecorded economic data. In time-series analysis, the sample available is usually asingle realization of a random sequence, economic history as it actually occurred.Because we observe only one world, it is easy to make the mistake of looking on it

as the whole sample space, whereas it is really only one of the many possiblerealizations the workings of historical chance might have generated. Indeed, inour probability model the whole economic universe is the counterpm of a singlerandom outcome ); there is an important sense in which the time series analyst is

a statistician who draws inferences from single data points ! But although a singlerealization of the sequenc can be treated as a mapping from a single , it islinked to a countably infinite set of s corresponding to the leads and lags ofthe seqpence. A large part of our subsequent enquiry can be summarized as posingthe question: is this set really rich enough to allow us to make inferences about# from a single realization?

13.2 Independence nnd StationarityIndependence and statioharity are the best-known restrictions on the behaviour ofa sequence, but also the most stringent, from the point of view of describingeconomic time series. But while the emphasis in this book will mainly be onfinding ways to relax these conditions, they remain important because of the manyclassic theorems in probability and limit theory which are founded on them.

The degree to wlch random variations of sequnce coordinates are related tothose of their neighbours in the time ordering is sometimes called the memory of asequence; in the context of time-ordered observations, one may think in tenns ofthe amount of information contained in the cunrnt state of the sequence about itsprevious states. A sequence with no memory is a rather special kind of object,because the ordering ceases to have signiticance. lt is like the outcome of acollection of independent random experiments conducted in parallel, and indexedarbitrarily.

When a time ordering does nominally exist, we call such a sequence seriallyindependent. Generalizing the theory of j8.6, a pair of sequences ((X,())T,l Fr() JT1e R= xR*, is independent if, for a11F1, E1 e 5*,

/:()1.Y,17e F1, (F,1: e E,4 = #(tx,)7 e Ej4pf r,)7 e F2). (13.2)Accordingly, a sequence (A())T is serially independent if it is independent of

Page 213: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Dependence 193

k f 11k ls 0 This is equivalent to saying that every tinite collec-fXt(T (0) IT Or a .

tion of sequence coordinates is totally independent.Serial independence is the simplest possible assumption about memory. Simi-

larly, looking at the distribution of the sequence as a whole, the simplest

treatment is to assume that the joint distribution of the coordinates is invariantwith respect to the time index. A random sequence is called strictly stationary ifthe shift transformation is measure-preserving. Tllis implies that the sequencesfXf17.1 and fAkJ7=1have the same joint distribution, for every k > 0.

Subject to the existence of particular moments, less restrictive versions of thecondition are also commonly employed. Letting w= E(Xt), and Xkt= Covxt,xtn),consider those cases in which the sequence (g'rl7=1, and also the array( fXmtJ';=()l7wl,are well defined. If w = g,, al1 /, we say the sequence is meanstationaty. If a mean stationary sequence has wt= w where (y,,l17'is a sequenceof constants, it is called covariance stationaty, or wide sense stationary.

If the marginal distriution of Xt is the same for any t, the sequence IXCJissaid to be identically distributed. This concept is different from stationarity,which also restricts the joint distribution of neighbours in the sequence.However, Fhen a stochastic sequence is both serially independent and identicallydistributed (or i.i.d.4, this suftices for stationarity. An i.i.d. sequence islikeanarbitrarily indexedrandom sampledrawngomsomeaunderlyingpopulation.

The following clutch of examples include both stationary and nonstationaryCaKS.

''O

b i i d with mean 0 and variance (52< (x,13.2 Example Let the sequence (6>J-x e . . .

and let (0j17be a square-summable sequence of constants. Then fA)7, where*

A = 77bet-j, (13.3)jcst

2 2:.,.0 e?sr everyis a covariance stationary sequence, with Ext) = 0 and Ext) = c J,0

)

t. This is the inhnite-order moving average (MA(=)) process. See j14.3 foradditional details. n13.3 Example lf kt is i.i.d. with mean 0, and

Xt = cos at + st (13.4)for a constant a, then Ex = cos at, depending systematically on /. I:a

13.4 Example Let (xV)be any stationary sequence with autocovariance sequence(ym, m 0J. The sequence (.X)+ A%) has autocovariances given by the array

l 'o + 'sm + Yt+'tt,vm,

m 2 0, t k 1l ,

and hence it is nonstationary. u

13.5 Example Let X be a r.v. which is symmetrically distributed about 0, with2 If X = (-1)W then (X,)Tis a stationary sequence. In particular,variance c . t ,

Ext) = 0, and Cov(Xt,Xt+k4= 5.2 when k is even and-c2

when k is odd, independent

Page 214: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

194

of t. (:1

Theseexamples show that,contraryto acommon misconception, stationarity doesnot imply homogeneity in the appearance of a sequence, or the absence of periodicpatterns. The essential feature is that any patterns in the sequence do not dependsystematically on the time index. It is also important to distnguish betweenstationarity and limited dependence, although these notions are often closelylinked. Example 13.4 is nonstationaty in view of dependence on initial conditions.The square-summability condition in 13.2 allows us to show covariancestationarity, but is actually a limitation on the long-range dependence of theprocess. Treatments of time series modelling which focus exclusively on models inthe linear MA class often fail to distinguish between these propeties, butexamples 13.3 and 13.5 demonstrate that there is no necessary connection betwenthem. '

Theory of Stochastic Processes

Stationarity is a strong assumption, particularly for the description of empiri-cal time series, where features like seasonal patterns are commonly found. It isuseful to distinguish between ilocal' nonstationarity, sometling we might think ofas capable of elimination by local averaging of the coordinates, and global'nonstationarity, involving features such as persistent trends in the moments. Ifsequences (Xr)7=1 and (a'+k)7=1have the same distribution for some (notnecessarily every) k > 0, it follows that (.+x)7.1 has the same distribution as(X)+l7=1, and the same property extends to every integer multiple of k. Such asequence accordingly has certain stationary characteristics, if we think in tennsof the distributions of successive blocks of k coordinates. This idea retainsforce even as k -->

x. Consider the limit of a tinite sequence of length n, dividedinto (rl j blocks of length gn 1,plus any remainder, for some a between 0 and 1.(Note, E.;r)here denotes the largest integer below x.) The number of blocks as wellas their extent is going to infinity, and the stationarity (or otherwise) of thesequence of blocks in the limit is clearly an issue. Important applications ofthese ideas arise in Parts V and VI below.

It is convenient to formulate a definition embodying this concept in terms ofmoments. Thus, a zero-mean sequence will be said to be globally covariancestationary if the autocovariance sequences (y/uf)*r.1are Cesro-summable for each

m k 0, where the Cesro sum is strictly positive in the case of the variances m =

0). The fllowing are a pair of contrasting counter-examples.

13 6 Example A sequence with 'p, - /l5 is globally nonstationary for any j # 0. n@

13.7 Example Consider the integer sequence beginning1,2,1,1,2,2,2,2,1,1,1,1,1,1,1,1,...*,

k k = 1 2 3 The cesro sum of thisi.e. the value changes at points t = 2 , , , ,.,.

sequence fails to converge as n-->

x. It fluctuates eventually between the pointsk k dd and 4/3 at n = lk k even. A stochastic sequence having a5/3 at n = 2 , o , ,

variance sequence of this form is globally nonstationary. (:1

Page 215: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Dependence 195

13.3 lnvariant EventsThe amount of dependence in a sequence is the chief factordetermining how infor-mative a realization of given length can be about the distribution that generatedit. At one extreme, the i.i.d. sequence is equivalent to a true random sample. Theclassical theorems of statistics can be applied to this type of distribution. Atthe other extreme, it ij easy to specify sequences for which a single realization

can never reveal the parameters of the distribution to us, even in the limit asits length tends to intinity. This last possibility is what concerns us most,since we want to know whether averaging operations applied to sequences haveuseful limiting properties', whether, for example, parameters of the generationprocess can be consistently estimated in this way.

Toclarify theseissues, imaginerepeatedsamplingof random sequences (A((t)) JT',in other words, imagine being given a function X1(.) and transfonnation F, makingrepeated random drawings of (.t)from fl and constructing the corresponding random

sequences', 13.1 illustrates the procedure. Let the sample drawings be denoted j,

i = 1,...,N, and imagine constructing the average of the realizations at somei / The average kxto= N-1X%X ((tk)is called an ensemble average,fixed t me e. ,

,=1 owhich may be contrasted with the time average of a realization of length n for

iven )j e D, Xntj) = n-11J=1Xj()j). Fig. 13.1 illustrates this procedure,some gshowing a sample of three realizations of the sequence. The ensemble average isthe average of the points falling on the vertical line labelled /c. lt is clearthat the limits of the time averg and the ensemble average as n and Nrespectively go to infinity are not in general the same; we might expect theensemble average to tend to the marginal expectation Ext ),but the time average8will not do so except in special cases. If the sequence ls nonstationary Extdepends upon /o, but even assuming stationarity, it is still possible thatdifferent realizations of the sequence depend upon random effects which arecommon to a1l t.

OO O O

OO

OO O

O

Fig. 13.1

Page 216: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

196

In a probability space (f1,F,#) the event E e 5 is said to be invariant under atransfonnation Tif PCTEh F) = 0. The criterion for invariance is sometimes given

as TE = F, but allowing the two events to differ by a set of measure zero does notchange anything important in the theory. The set of events in 5 that are invariantunder the shift transformation is denoted #'.

13.8 Theorem,@

is a c-tield.

Theory of Stochastic Processes

Proof Since F is onto, is clearly invariant. Since F is also 1-1,

TEc A E C= CTE4cA Ec = TE A E

by definition. And, given 3En e #, n e IN),PCTE,tAEn) = PITEn - &) + P(En - TEn4 = O

for each n, and also

F UEn A UEn = UTEn A UEn,n l n n

using 1.2(i). By 1.1(i) and then 1.1(iii),

UTEn - UEn = U TEn f'n OEL ? U(Ff n- &),

n n n n ?1

and similarly,

UEn - UTEn U(En - TEn). (13.9)n n n

The conclusion #gT(Un&)A (Un&)1= 0 now follows by (13.6)and 3.6(ii), com-pletingthe proof. .

(13.5)

(13.6)

(13.7)

(13.8)

An invariant random variable is one that is J/f-measurable. An invariant nv.Z() has the property that Z(F) =Z(), and an J/f-measurable sequence (Z,()) ):'is trivial in the sense that Zt = Z1()) a.s. for every t. The invariant eventsand associated r.v.s constitute those aspects of the probability model that do notalter with the passage of time.

13.9 Example Consider the sequence (AX())))where Xt = Ff((z))+ Z(), (F,4(0))being a random sequence and Z((0) a r.v. An example of an invariant event is E =

f (.0: Z() Kz, Ff() e R). Clearly E and FF are the same event, since Z is the onlything subject to a condition.

Fig. 13. 1 illustrates this case'. lf (Ff()) ) is a zero-mean stationary sequence,the figure illustrates the cases Z((l)1) = Z((.0:) = 0, and Z((l)3) > 0. Even if EZ4

= 0, the influence of Z() in the time average is not<averaged out' in the limit,

as it will be from the ensemble average. I:a

The behaviour of the time average as n -- cxn is summarized by the followingfundamental result.

13.10 Theorem (Doob 1953: th. X.2. 1) Let a stationary sequence (.&((.t)))Tbedefined by a measurable mapping X1()) and measure-preserving shift transform-

Page 217: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Dependence

J-1d 1et S () = Z)=1m()).If F( X1j < x,ation T, such that m() = .Y1(F ), an n

1imSnbln = ECXL I,7)()),

a.s. (:I (13.10)n-+oo

In words, the limiting case of the time average can be identitied with the mean ofthe distribution conditional on the c-tield of invariant events.

The proof of 13.10 requires a technical lemma.

13.11 Lemma Let A(I3)= ((t):supatkst)) - nI))> 0) . Then for any set M e #',

Jvca(oXl())J#() 2 F#(Mf-hA(p)). (13.11)

Proof We establish this for the case fs= 0. To generalize to any real p,considerthe sequence fA - p), which is stationary if (XCIis stationary.

Write A for A(0), and let hj = f ): maxlskust) > 0), the set of outcomesfor which the partial sum is positive at least once by time j. Note, the sequencefA.j) is monotone and hj

'1h

A as j -- x. Also 1et

N = T-ih = : max Sklf > 0 . (13.12)nj n-j

l Kk f n'-j

Sincek

&(r?(o)= XxtT) =

>1

j+k

!F71rf(e)) z- sjmke)- SjTb,J=./m 1

(13.13)

by defining & = 0 we may also write

Nnj = ): max sk - Sjbh) > 0 , 0 f j K n- 1.

j- 1f k f n

This is the set of outcomes for which the partial sums of the coordinates f'romj + 1 to n are positive at least once, and we have the inequality (explainedbelow)

(13.14)

n-1XA+1() lxnytl 2 0, all ) e D.

Integrating this sum over the invariant set M gives

(13.15)

n-1 r n-1

0< X Ju-suxjwtodrb = X j,.x.y&+1(rVt,))J#()

j=0 v m '0 Jn y=o

n-1 g n

= X j u-g.X1(9d#((9) = X Jv.ayXl(t9V#(),

jcz0 v R' ''nn J- jc 1(13.16)

lity uses the fact that hn-j = (: T-l(kbe Nnjj and the mea-where the tirst equasure-preserving property of F, and the third is by reversing the order of summa-tion. The dominated convergence teorem applied to fmlMraayl, with ImIas the

Page 218: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

198

dominating function, yields

Theory of Stochatic Processes

Jsyraayxltllptl--> Jxawxltltf#tt9l.(13.17)

This limit is equal to the Cesro limit by 2.26, so that, as required,

m()##() = 1im-1

Xnj m()##4)) 0. .j'ucvs

a-,- n s, ucxxj(13.18)

The inequality in (13.15)is not self-evident, but is justified as follows. Theexpression on the left is the sum containing only those Xt having the propertythat in realization , the partial sums from the point t onwards are positive atleast once, otherwise the tth contribution to the sum is 0. The sum includes onlyXt lying in segments of the sequence over which Sk increases, so that their netcontribution must be positive. It would be zero only in the case Xt S 0 for 1 f tS n. Fig. 13.2 depicts a realization. 'o' shows values of Sk for k = 1,...,a,so the .Yf((0)are the vertical separations between successive &o'

.

<+' shows therunning sum of the terms of (13.15).The coordinates where the Xt are to beomitted from (13.15) are arrowed, the criterion being that there is no

to'

to theright which exceeds the current

'o'.

+ ++

*

* 0

**O

*

Fig. 13.2

Proof of 13.10 The first step is to show that the seqpence (Snlbln ITconvergesalmost surely to an invariant r.v. Sl). Consider 5'() = limsupa Snlln.

&+l() n + 1 X1()RF) = lim sup - = )),

n+ 1 n nvl--hx nkm(13.19)

so that Rl is invariant, the same being true for ltl = liminfn Snln. Hence,the event M(a,p) = (: sttt)) < (x < j3< V((t)llis invariant. Since supnsnlnV1, M(a,p) A(p) where A(p) is defined in the statement of 13.11. Hence,putting M = M(a,j) in the lemma gives

Page 219: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

lDependence 199

J>(oj)X1(t0V#())2 p#(M(?'p)). (13.20)

But now replace AX(t))in 13.11 by -AX)), and observe that M(a,p) c A(-a) =

( ): supnt-xsktl/n) >-a)

. Hence we get

Jsy(a,j)X1(t0V#(tt9S tx#(M(?'p))' (13.21)

Since the left-hand sides of (13.21)and (13.20)are equal and (x < p, it followsthat #(M(a,p)) = 0; that is, R)) = 8)) = 5'() with probability 1.

This completes the first stage of the proof. It is now required to show that S =

Exj I#) a.s., that is, according to equation (10.18),that

juxltdp = jsjdp, each M e (13.22)

Sinc M is invariant,n v

''t

j ,

.,yt

r j.juxLlkdp= a-X Jyrto,X/t0)##((t) = ju-s Snl :#4tl))

>1

and the issue hinges on the convergence of the right-hand member of (13.24)toE(S3u4. Since the sequence (-Y/Jis stationary and integrable, it is also uniformlyintegrable, and the same is true of the sequence (F,l, where Ff = Xtsu and M e #.For : > 0, it is possible by 12.9 to choose an event E e 5 with #(& < 6, suchthat sup E 1FJ I1e,) < e. For the same E, the triangle inequality gives

(13.23)

j j)yn,-,jdp s )x-(JIi-,I#,)

< : .

E >1 >1 E(13.24)

By the same argument, also using stationarity and integrability of Ff,

1n

y-,jdp s-1 n

jl),;jap = s jyj jJIn--, n ,.j

(13.25)

-1 i 1so uniformly integrable, whereHence by 12.9 the sequence tn Z:=1Ff) s a-11 F) = n-v I>. If n-l -->

S a.s., it is clear that n-S 1M --> Sku a.s., son =1

n n n

by 12.8,

J-1

yl :#((t)) --y ( Sudp. (13.26)Mn 2M

Since n is arbitrary, (13.26)and (13.23)together give (13.22),and the proof iscomplete. w

13.4 Ergodicity and Mixing

The property of a stationary sequence which ensures that the time average and theensemble average have the same limit is ergodicity, which is defined in terms of

Page 220: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

200

the probability of invariant events. A measure-preserving transformation F isergodic if either #(F) = l or #() = 0 for a11E e Dswhere x/ is the c-field ofinvariant events under F. A stationary sequence (Xf() )+-7is said to be ergodic if

J-1for every t where F is measure-preserving and ergodic. SomeX() = Xl (F )

authors, such ag Doob, use the tenn metrically transitive for ergodic. Events that

are invariant under ergodic transformations either occur almost surely, or do not

occur almost surely. ln the case of 13.9, Z must be a constant almost surely.Intuitively, stationarity and ergodicity together are seen to be sufficient

conditions for time averages and ensemble averages to converge to the same limit.Stationarity implies that, for example, g, = f(m()) is the mean not just of X1but of any member of the sequence. The existence of events that are invariantunder the shift transformation means that there are regions of the sample spacewhich a particular realization of the sequence will never visit. lf PIFFA E) = tyhen the event Ec occurs with probability 0 in a realization where E occurs.However, if invariant events other than the trivial ones are ruled out, we ensure(that a sequence will eventually visit a1l parts of the space, with probability 1.In this case time averaging and ensemble averaging are effectively equivalentoperations. )

The following corollary is the main reason for our interest in Theorem 13.10.

13.12 Ergodic theorem Let (.X)4(0)IT be a stationary, ergodic, integrable

sequence. Then

1imSnelln = F(X1), a.s. (13.27)n-4-

Proof This is immediate from 13.10, since by ergodicity, EIXj I#) = F(X1) a.s. w

I'heory of Stochastic Processes

In an ergodic sequence, conditioning on events of probability zero or one is atrivial operation almost surely, in that the information contained in

,/

istrivial. The ergodic theorem is an example of a 1aw of large numbers, the first ofseveral such theorems to be studied in later chapters. Unlike most of thesubsequent examples this pne is for stationat'y sequences. Its practicalapplications in econometrics are limited by the fact that the stationarityassumption is often inappropriate, but it is of much theoretical interest, becauseergodicity is a very mild constraint on the dependence, as we now show.

Atransformation thatis measure-preserving eventually mixes up the outcomes in

a non-invariant event A with those in Ac. The measure-preserving property rulesout mapping sets into proper subsets of themselves, so we can be sure that FA jAcis nonempty. Repeated iterations of the transformation generate a sequence of sets

V ) containing different mixtures of the elements of A and Ac. A positivel Tdependence of # on

:4 implies a negative dependence of # onAC; that is, if

#(A f-h B) > P(A4PB) then P(Ac /'A Bj = PB4 - #(A f-'h #) < #(#) - #(A)#(#) =

PIAc)PCB). Intuition suggests that the average dependence of B on mixtures of Aand Ac should tend to zero as the mixing-up proceeds. ln fact, ergodicity can becharacterized in just this kind of way, as the following theorem shows.

Page 221: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

llependence 201

13.13 Theorem A measure-preserving shift transfonnation Fis ergodic if and onlyif, for any pair of events A, B e @,

1 n

lim-xpvkx

rasl = PA)PB).n-yx k=1

(13.28)

t 1 ito let A be an invariant event and B = A. Then PCTPAf'n B)Proof To show on y ,

= #(A) for a11 k, and hence the left-hand side of (13.27)is equal to #(A) for all2 implying 84) = 0 or 17.71'k. This gives #(A) = #(A) ,

To show iif'

, apply the ergodic theorem to the indicator function of the setsV h re T is measure-preserving and ergodic, to giveF , w

1 n

lim - X lztxt(0l = #(4), a.s.n-yx &1

(13.29)

But for any # e 5,

jj-)

x-1,u((o) - ,(.) jdpvk=1 zj I

-tx-

1,u((o) - ,(x) !dp#IN

&1

jJst)x,.-lztxt(')l-#(A))dz'((o) I1

1 nk

= -sX#(F A rhB) -PA4PB4 .

k=1(13.30)

The sequence whose absolute value is the integrand in the left-hand member of(13.30) converges almost surely to zero as n

-->

cxl by (13.29);it is boundedabsolutely by 1 + #(A) uniformly in n, so is clearly uniformly integrable. Hence,the left-hand member of (13.30)converges to zero by 12.8, and the theoremfollows. .

Following from this result, ergodicity of a stationary sequence is often associ-ated with convergence to zero in Cesro-sum of the autocovariances, and indeed, in

a Gaussian sequence the conditions are equivalent.

13.14 Corollary If (Xr4(4)))Tis a stationary, ergodic, square-integrable sequence,then

1 n

*-XCov(XIm) -- 0 as n-+

x.

n al(13.31)

Proof Setting B = A and defining a real sequence by the indicators of F A, Xkk

= 1wu((o), (13.28)is equivalent to (13.31).First extend this result to sequencsof simple r.v.s. Let X1(() = X,'%1Aj(), so that A%((t))= X1(T) = Lwlr-ujt).The main point to be established is tha.tthe differencebetween X1 and a simple

.j

r.v. can be ignored in integiation. In other words, the sets F Aj must form a

Page 222: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

202

partition of f1, apart possibly from sets of measure 0. Since F is measure-preserving, #(UF-V = #(F-1(UfAf)) = #(UjAj) = 1, using 1.2(ii), and hence

-1A) = 0. And since L#(F-V ) = Zf#(Af) = 1, additivity of the#(f1 - UjF i i-1 1 disjoint apart f'rom possiblemeasure implies that the collection (F z4jl is a so

sets of measure 0, verifying the required property.This argument extends by induction to Xk for any k e N. Hence,

Theot'y of Stochastic Processes

7txlxzl= E 777rcslxow-u/)i j

=7)77a,uyrtA rn T-kAjl, ,.-u,w1z'',4

7/.... (13.32)

t.' -

$Jj

.7

/(the sum being absolutely convergent by assumption), and by 13.13,

1 n 1 n -z

lim - TExk) = X ywm lim - X#(Af rn T Aj)n

'- ' nn--yx &1 i j n-yx &1

= X X afW#(Xf)#(Ay) =&XI)2,

i j(13.33)

2 EX 4EXk) for any k, by stationarity. The theorem extends towhereEXj4 = 1

generalsequences by the usual application of 3.28 and the monotone convergencetheorem.>

This result might appear to mean that ergodicity implies some fonn of asymptoticindependence of the sequence, since one condition under which (13.31)certainlyholds is where Cov(X),XJ

-->

0 as k-->

x. But this is not so. The followingexample illustrates nicely what ergodicity implies and does not imply.

13.15 Example Let the probability space (f,T,#) be defined by fl = f0, 1J, sothat 5 = (fO l ,f 0),f 1l , (0,1)) and #() = 0,5 for (.0 = 0 and (l) = 1. Let F be thetransfonnation that sets F0 = 1 and F1 = 0. ln this setup a random sequencefXr(fJ))Jmay be defined by letting .Yl4(0)= , and generating the sequence byiterating F. These sequences always consist of alternating Os and 1s, but theinial value is randomly chosen with equal probabilities. Now, T is measure-preserving', the invariant events are f and 0, both trivial, so the sequence is

-1Z2 PITkA fa B) = P(A)PB) forergodic. And it is easily verified that lima-yxn =j

V f7#) = 0.5 for kevery pair A,B e T. For instance, 1etB = f1), and then #(F.

even and 0 for k odd, so that the limit is indeed 0.25 as required. You canverify, equivalently, that the ergodic theorem holds, since the time average ofthe sequence will always converge to 0.5, which is the same as the ensemble meanof XI()). (a

In this example, Xt is perfectly predictable once we know X; , for any t. Thisshows that ergodicity does not imply independence between different parts of thesequence, even as the time separation increases. By contrast, a mixing sequencehas this property. A measure-preserving, ergodic shift transformation Wis said tobe mixing if, for each A, B e F,

Page 223: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

l7ependence 203

4 cjB4 = PA4PB). (13.34)lim#(Fk-x

h tationary sequence f.V)7is said to be mixing if A() = X1(Ff-')) for eachT e st where F is a mixing transformation.

Compare this condition with (13.28);Cesro convergence of the sequenceV /'7 #) k e EN) has been replaced by actual convergence. To obtain a soundt#(F ,

intuition about mixing transformations, one cannot do better than reflect on thefollowing oft-quoted example, originally due to Halmos (1956).13.16 Example Consider a dry martini initially poured as a layer of vennouth(10% of the volume) on top of the gin (90%).Let G denote the gin, and F anarbitrary small region of the fluid, so that Fcj G is the gin contained in F. If#(.) denotes the volume of a set as a proportion of the whole, #(G) = 0.9 and#(Ff-7 G)/#(/), the proportion of gin in F, is initially eithr 0 or 1. Let Fdenote the operation of stining the martini with a swizzle stick, so that

Vcj G)/#4p is the proportion of gin in F after k stirs. Assuming the fluid#(Fis incompressible, stirring is a measure-preserving transfonnation in that #(F #)

= #4#) for all k. If the stirring mixes the martini we would expect the proportionk hich is PITkFP G)/#(&, to tend to #4G), so that each region Fof gin in F F, w

of the martini eventually contains 90% gin. la

This is precisely condition (13.34).Repeated applications of a lnixing transfonna-tion to an event A should eventually mix outcomes in A and Ac so thoroughly that

V ives no clues about the original A.for large enough k the composition of F g1.4Mixing in a real sequence implies that events such as A = ( ): A((l)) S tz) and F

= f(t): Xt-vk) S JJ are becoming independent as k increases. It is immediate, orvirtually so, that for stationary mixing sequences the result of 13.14 can bestrengthened to Cov(X1,-L)

--y 0 as n--y

x.

13.5 Subfields and RegularityWe now introduce an alternative approach to studying dependence which considersthe collection of c-subfields of events generated by a stochastic sequence. Thistheory is fundamental to nearly everything we do subsequently, particularlybecause, unlike the ergodic theory of the preceding sections, it generalizesbeyond the measure-preserving (stationaty)case.

Consider a doubly infinite sequence (-V,t e r) (notnecessarily stationary) anddefine the family of subfields lX, s f /l, where TJ = nXs,...,Xt) is thesmallest c-field on which the sequence coordinates from dates s to t aremeasurable. The sets of Sf can be visualized as the inverse images of (/ - s4-dimensional cylinder sets in :8*; compare the discussion of j 12.3, recalling that INand Z are equipotent. We can let one or other of the bounds tend to intinity, andaparticularly important sub-family is the increasing sequence (T-'x, t e Z ),which

can be thought of as, in effect, tthe information contained in the sequence up todate /' The c-field on which the sequence as a whole is measurable is thelimitingcase F+-7 = Vts-t.. In cases where the underlying probability model

Page 224: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

204

concerns just the sequence (XZJwe shall identify 1+-Zwith 5.Another interesting object is the remote c-Jcl# (or tail c-field), F-x = OfT-rx.

This c-tield contains events about which we can learn something by observing anycoordinate of the sequence, and it might plausibly be supposed that these eventsoccurred at time -x, the tremote'

past when the initial conditions for thesequence were set. However, note that the set may be generated in other ways, such

O1+= or O5-t .RS t t , t t

One of the ways to characterize independence in a sequence is to say that anypair of non-overlapping subtields, of the fonn Tfzland 1143where /1 k h > t5 k/4, are independent (see j10.5). One of the most famous early results in thetheory of stochastic processes is Kolmogorov's <zero-one law' for independentsequences. This theorem is usually given for the case of a sequence (Al7, and theremote c-tield is defined in this case as

07,.18/D,

.

Theory of Stochastic Processes

lt.(

13.17 Zero-one Iaw lf the sequence (A-fITis independent, every remote event istrivial, having either probability 0 or probability 1.

Proof Let A be a remote event, so that A e i= = 0*2.1@=f, and let T be the col-lection of events having the property #(A rn #) = #(A)#(#) if B e V. By indepen-dence,

T'K'

and @1fare independent subfields, so that @(c V, for every f. We maytherefore say that U7=1T( c V.

lf B,B' U'7=1T(,then B e T( for some t, and B' e T(' for some t', and if t' 1

t (say)ihen B e F1f'so B fa #' e ;1f' c U7=1@1r,

and accordingly U7z=lFlis a tield.Moreover, if fBj e Nl is a monotone sequence and Bj

-->

#, then A f-'h Bj is alsomonotone and converges toA rn SforanyA l 1'-',adl e V'by continuity of #, so Vis a monotone class. By the argument of 1.24, V therefore contains the union ofany countable collection of sets from U7

.191,

and so V u c(U7=1T'()= V*t=l97lf

=

2 s p(4)T.llowevervd e 5, soA e Nand wemay setfl=A, giving#td) =#(A) . ence .

= 0 or 1. .

The zero-one 1aw shows us that for an independent sequence there are no events,other than trivial ones, that can be relevant to a11 sequence coordinates. Butclearly, not only independent sequences have the zero-one property, and from ourpoint of view the interesting problem is to identify the wider class of sequencesthat possess it.

A sequence fX))+-7is said to be regular or mixing if every remote event hasprobability 0 or 1. Regularity is the term adopted by Ibragimov and Linnik (1971),to whom the basics of this theory are due. ln a suitably unified framework,thisis essentially equivalent to the mixing concept defined in j13.4. The followingtheorem says that in a regular sequence, remote events must be independent of a11events in 5. Note that trivial events are independent of themselves, on thedefinition.

13.18 Theorem (lbragimovand Linnik 1971: th. 17.1.1) tall+-Zis regular if andonly if, for every B e 5,

Page 225: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Dependence 205

(13.35)sup l#(A ra #) - PA)PB4 I --A 0 as t.--.)

-=.

A e F!-

Proof To provedif'

, suppose 3 E E F-x with 0 < P(E) < 1, so that (X;)+-=xis notregular. Then for every t, E e @-'xand so

2 () ja a6)S#P I#(A f''h F) - #(A)#(A3 I 2 #(A) - PE4 > , ( .

ziez.f-which contradicts (13.35).

To proveEonly if , assume regularity and define random variables ( = lx - #(A)

(F-lx/f-measurable) and q = ls - #(#) (Wf-measurable), such that#(A fa #) - P(A)PB) = F()n). Then, by the Cauchy-schwartz inequality,

lF((n) I = IuE'tsE'tnI@!-)) I < lItII2IlF(n I@-'-)112, (13.37)where the equality is by the Iaw of iterated expectations because A e F-lx. Notethat 11(112< 1. We show lIF(n l@-,-)112--> 0 as t --

-=, which will complete theroof, since X is an arbitrary element of T-fx.PConsider the sequence (F(ls I?F-fx)()) lt2. For any (.t) e f1,

F(1sI8/'-x)()) -- F(1s l@-x)(t0) as t-->

-=, (13.38)where by equation (10.18) and the zero-one property,

Jw&1sI;-x)()J#() = PE f-7#)

#(#), PE4 = 1= , E e Fux. (13.39)

0, #(A-) = 0

It is clear that setting F(ls I@-x)() = #(#) a.s. agrees with the definition, sowe may say that F(lsl 5-t.) -- P@) a.s., or, equivalently, that

(F(1s IT!x) - /'(#)()2 --) 0, a.s. (13.40)Since 11s()) - #(#) I < 1 for al1 (1) e f, (F(ls I@-fx)() - /:(#)()2 is similarlybounded, uniformly in f. Unifonn integrability of the sequence can therefore beassumed, and it follows from 12.8 that

IlF(n l9-/-)112= IlF(1sI@-'-)(t,))-/'(:)112

-- 0 as t--+

-,x, (13.41)as required. .

In this theorem, it is less the existence of the limit than the passage to thelimit, the fact that the supremum in (13.35)can be made small by choosing

-t

large, that gives the result practical significance. When the only remote eventsfor fixed k must eventuallyare trivial, the dependence of Xt-vkon events in @-x, ,

decline as f increases. The zero-one 1aw is an instant corollary of the necessity

part of the theorem, since an independent sequence would certainly satisfy(13.35).

Page 226: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

206

There is an obvious connection between the properties of invariance and remote-ness. lf T is a measure-preserving shift transformation we have the followingsimple implication.

13.19 Theorem If FA = A, then A e T-x.

Theory of Stochastic Processes

t t + l - l t - 1 - lProof lf A e T-x, then Fz4e @-x and F A e F-x . lf TA = A, F A = A and itfollowsimmediately that :4

e (*s=ts-sxjchtO,k-x@!x)

= T-x. .

tThe last result of this section establishes formally the relatlonship between

regularity and ergodicity which has been implicit in the foregoing discussion.

13.20 Theorem (Ibragimov and Linnik 1971: cor. 17.1.1) If a stationary sequencex((o) = (AX(o))!Zis regular, it is also ergodic.

Proof Every set A e F+-7is contained in a set At e T-ff, with the sequence (A,)non-increasing and Ar

.1,

A. Thus, A, may be constnlcted as the inverse image under ,

.xof the (2/+ ll-dimensional cylinder set whose base is the product of the coor-dinate f-sets for coordinates

-4,...,/

of x(A) e=.

The inclusion follows by1.2(iv). By continuity of P, we can assume that P(At4 --> #(A).

Let A be invariant. Using the measure-preserving property of F, w find

-kA = PITkA chz4l. (13.42)P(At raAl = PAt fa F ) t

Since k is arbitzary, regularity implies by (13.34)that PAt faAl = PAt)PA).2 that P(A4 = 0 or 1, as required. .Letting f -- x yields #(A) = P(A) , so

13.6 Strong and Uniform Mixing

The defect of mixing (regularity)as an operational concept is that remote eventsare of less interest than arbitrary events which happen to be widely separated intime. The extra ingredient we need for a workable theory is the concept of depen-dence between pairs of c-subtieldsof events. There are several ways to character-ize such dependence, but the following are the concepts that have been mostcommonly exploited in limit theory.

Let L1,5,P) be a probability space, and 1et V and R be c-subfields of @;then

(x(V,X) = sup IP(G f'n S) - #(G)#(& I (13.43)G c F , Hq

.'(

is known as the strong mixing coeftkient, and

#(V,A) = sup 1/,(/./'1G4-#(#)

l (13.44)Gq F,f'1eJf; P(G)>0

as the uniform mixing coefficient. These are alternative measures of the depen-dence between the subfields V and R.

If the subfields V and R are independent, then (x(V,J) = 0 and #(V,Jf) = 0, andthe converse is also true in the case of uniform mixing, although not for strongmixing. At first sight there may appear not much to choose between the defini-

Page 227: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Dependence 207

tions, but since they are set up in terms of suprema of the dependence measuresover the sets of events in question, it is the extreme (andpossibly anomalous)

cases which define the characteristics of the mixing coeixcients. The strongmixing concept is weaker than the uniform concept. Since

IPG r-'hff - PCG4PC*I S IPCH,G) - PH) I f 9(5',>t) (13.45)for all G e V and H e R, it is clear that a(N,J#) f ((T,Jf). However, the followingexample shows how the two concepts differ more crucially.

13.21 Example Suppose that, for a sequence of subfields (V-IT,and a subfield X,a%m,R4 .-+ 0 as m -- x. This condition is compatible with the existence of sets Gm

?. f xp(s)e Mmandffe R with thejroperties PGm) = llm, andlGm fo & =almorJ .

But (/497,s,J#)2 IPCH6Gm) - #(/f) I = Ia - PH4 I for every ,?z k 1, showing that sub-tilds Mmand R are not independent in the limit. nEvidently, the strong mixing characterization of tindepndence' does not rule outthe ppssibility qf dependence between negligible sets.

tx and () are not the only rnixing coefficients that can be defined, although theyhave proved the most popular in applications. Others that have appeared in theliterature include

13(V,J#)= sup E j#(SIV) - #(S) j (13.46)HR

and

If((n)Ip(V,J#) = sup ,( 2 n 2(13.47)

where the latter supremum is taken with respect to a1l square integrable, zero

mean, V-measurable r.v.s (, and S-measurable r.v.s q. To ompare these alter-natives, first let (() = #(Sl T)() - #(&, so that

f5(V,X)= sup JI(1dPSGS

I( IdpsupjsGC%D.H6R

sup(.LdP

= a(T,3f). (13.48)G e F, H R

* '''

Moreover, since for any sets G e V and H e 3f, (((t))= 1c() - #(G) and n() =

1zz((l))- #(& are members of the set over which p(V,>t) is defined, and F((n) =

#(G f''h z.fl- PGlm while l( I < 1 and lqI S 1 for these cases, it is also clearthat p a. Thus, (x mixing, notwithstanding its designation, is the weakest ofthese four istrong' variants, although it is o? course stronger than ordinaryregularity characterized by trivial remote events. We also have j S 9, by animmediatecorollary of the following result.

Page 228: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

208

13.22Theorem I#(NIV) - #(& I f 9(V,>t) a.s., for a1l H e R. a

The main step in the proof of 13.22 is the following lemma.!' )13.23 Lemma Let X be an talmost surely boundedy V-measurable nv. Then

1 xdp = ess sup x. (13.49)sup po). jaGeF,#(G)>0

Theory of Stochastic Processes

r:'

.E

Proof #(G)-1 jfccxlrj< ess sup X, for any set G ils the designated class. For any..E

E > 0, consider the sets .

. v

G = f : X() k (ess sup.km - 6:),

G- = (: -A) 2 (ess supal - E).

By definition of ess sup X, both these sets belong to V and at least one of themis nonempty and has positive probability. Define the set

G+, #(G>+)k PG-)G* =

-, otherwiseG

and we may conclude that

1 J xdp z (ess spp m-:.

,(c-) c- (13.50)

(13.49) now follows on letting E. approach 0. *

Proof of 13.22 #ut X = #(SIV) - #(S) in the lemma, noting that this is aV-measurable r.v. lying between +1 and

-1.

Observe that, for any G e V,E'(G)-1I crxl'l = IPCH, G) - Ph'j l. Hence the lemma together with (13.44).' Jimplies that, for any H e R,

with probability

9(N,J#) ess sup f#(SIV) - #(S))

k l#(SIV) - #4& l,

1. .

(13.51)

Page 229: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

14Mixing

14.1 Mixing Sequences of Random VariablesFor a sequence (X,())J=-x,let T-Jx

= of...,Xt-z,Xt-j,Xt) as in 913.5,and similarlydefine5*t+m= n(Xt+m,Xt+m+3,.%+m+1,...4.The sequence is said to be vmixing (orstrong mixing) if lim,u-Axa,u = 0 where

qm = sup(x(T-Jx,@7+,,,), (14.1)

l

and a is defined in (13.43).It is said to be b-mixingtoruniform mixing) iflimpz-jxtj,u= 0, where

zrl= Sup 9(?F-fx,T7+r,,), (14.2)

(

and (j is detined in (13.44).(-mixingimplies a-mixing as noted in j 13.6, whilethe converse does not hold. Another difference is that g-mixingis not time-reversible', in other words, it is not necessarily the case that supytF/ +,,I,T-fx) =

supytT-lx,T''; +,,). By contrast, a-mixing is time-reversible. If the sequence (.X))+-7is a-mixing, so is the sequence (Ffl+-7 where J'f = X-t.

(#,((.t)))=-xis also said to be absolutely regular if limp-xp,,, = 0 where

p,,,= sup 13(8?;-'-,8?7-,+,,,) (14.3)t

and 13is defined in (13.46).According to the results in j13.6, absolute regular-it,yis a condition intenuediate between svong mixing and uniform mixing. On the

+= is a stationary, fmbounded sequence, and @0-x=other hand, if (m) ..x

c(...,X-1,A%) and %m+*= c(X,,,X,,+1,...), the sequence is said to be completelyT0 J;+oo O where p is defined in (13.47).In stationaryregular if p,,, = p( -x,

m )-->

,

Gaussian sequences, complete regllarity is equivalent to strong mixing. Kolmo-gorov and Rozanov (1960)show that in this case

qm/ pm < Lzam. (14.4)In a completely regular sequence, the autocovariances T = Extxt-j) must tend to0 as j

-->

x. A sufficient condition for complete regularity can be expressed interms of the spectral density function.When it exists, the spectral density fh)is the Fourier transform of the autocovariance function, that is to say,

1 t=

z)= 77z)c'l?

, e (-a,a1.f 2,s ,

jzz--sn(14.5)

Page 230: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Theory of Stochastic Processes

The theorem of Kolmogorov and Rozanov leads to the result proved by Ibragimovand Linnik (1971:th. 17.3.3), that a stationary Gaussian sequence is strongmixing when fk) exists and is continuous and strictly positive, everywhere on(-a,cj.

This topic is something of a tenninological minefield. KRegularity' is an un-descriptive term, and there does not seem to be unanimity among authors withregard to usage, complete regularity and absolute regularity sometimes being usedsynonymously. Nor is the list of mixing concepts given here by any means exhaus-tive. Fortunately, we shall be able to avoid this confusion by sticking with thestrong and uniform cases. While there are some applications in which absoluteregularity provides just the right condition, we shall not encounter any of these.Incidentally, the tenn

tweakmixing' might be thought appropriate as a synonym forregularity, but should be avoided as there is a risk of confusion with wetz/cdepen-dence, a term used, often somewhat imprecisely, to refer to sequences havingsummable covariances. Strongly dcpendent sequences may be stationary and mix-ing, buttheircovariances are non-summable. (dWeak' implies Icswdependencethan<strong' in this instance, not morel)

Confining our attention now to the strong and uniform mixing definitions,measures of the dependence in a sequence can be based in various ways on the rateat which the mixing coefficients xmor 4u tend to zero. To avoid repetition wewill discuss just strong mixing, but the following remarks apply equally to the

f

uniform mixing case, on substituting 4)for Gthroughout. Since the collections ;-x

and 5*t+m are respectively non-decreasing in t and non-increasing in t and m, thesequence f(xp,lis monotone. The rate of convergence is often quantified by asummability criterions that for some number ? > 0, xm

--> 0 sufficiently fast thatX

7711*< x. (14.6)r;=1

The term size has been coined to describe the rate of convergence of the mixingnumbers, although different definitions have been used by different authors, andthe terminology should be used with caution. One possibility is to say that thesequence is of size -(2 if the mixing numbers satisfy (14.6).However, the common-est usage (seefor example White 1984) is to say that a sequence is G-mixing of

-# for some fp > *.11 It is clear that such sequences aresize -(p() if am = Om )summable when raised to the power of 1/:0, so that this concept of size isstronger than the summability concept. One temptation to be avoided is to define

< h re f? is the largest constant such that the a) * are summable' ;the size as-(?,

w efor no such number may exist.

Since mixing is not so much a property of the sequence (.X)) as of the sequencesof c-fields generated by (A), it holds for any random variables measurable onthose c-fields, such as measurable transformations of Xt. More generally, we havethe following implication:

14.1 Theorem Let Ff = g(Xt,Xt-3,...,Xt-x) be a measurable function, for finite T.

lf Xt is a-mixing (g-mixing)of size -(?, then Ff is also.

Page 231: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 211

Proof Let V-fx=c(..., 1$-1,F, and G*twm= c(I$+,,,,im+1,...). Since F, is measur-able on any c-field on which each of Xt.%-j,...,Xt-z are measurable, Wlx i

i-Jx

and M*t-bmi T7+,u-s.Let ay,,, = supr (N-fx,Vc;+,,,)and it follows that ay,,uf (u-s

for m k 1. With .1 tinite, (xm-z = O(m-%) if um = Om-%) and the conclusionfollows. The same argument follows word for word with d9' replacing ttx'

. w

14.2 Mixing lnequalities

Strong and uniform mixing are restrictions on the complete joint distribution ofthe sequence, and to make practical use of the concepts we must know what theyimply about particular measures of dependence. This section establishes a set offundamental moment inequalities for mixing processes. The main results bound them-step-ahead predictions, Extwm I5-tx). Mixing implies that, as we try to forecastthe future path of a sequence from knowledge of its histol'y to date, lookingfurther and further forward, we will eventually be unable to improve on thepredictor based solely on the distribution of the sequence as a whole, EXt+m).The r.v. Extm lT-'x) - EXt+m) is tending to zero as m increases. We proveconvergence of the fw-norm.

14.2Theorem (Ibragimov 1962) For r p k 1 and with um defined in (14.1),IlF(A+r,,IT-'-) - Exm) lI,< 2(21/: + 1)a,,,1/r-1/rl1m+,,,lI,.(14.7)

Proof To simplify notation, substitute X for Xtmm,V for s-too,R for T7+,,,,and afor xm.It will be understood thatxis an X-measurable random variable where V, Rc 5. The proof is in two stages, tirst to establish the result for IXI < Mx < cxo

a.s., and then to extend it to the case where X is G-boundedfor t'initer. Definethe V-measurable r.v.

1, F(.YlV) k #(m,q = sgn(F(Xl V) - F(m) = (14.8)

-1, otherwise.

Using 10.8 and 10.10,

A'IFIXI9') - F(ml = FEn(F(XIV) -F(-Y))1

=FEtA'tn.YlN) -nF(m1

= Covtq,m = ICovtq,m I. (14.9)Let F be any V-measurable r.v., such as q for example. Noting that ( =

sgn(F(FIR) -F(F)) is A-measurable, similar arguments give

Icovt-Y,F) I = IF(X(&FIR4-A'(l'))) I<F(IXII(&FIR4-F(F))I)< Mxf'l F(y1 R) - F(F) l

Page 232: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

1Z

G A&1COv((,F) l, (14.10)

where the tirst inequality is the modulus inequality. ( and n are simple randomvariables taking only two distinct values each, so define the sets A+ = (n = 1),A- = (n =

-1

), B* = () = 1), and B- = (( =-1

l . Putting (14.9)and (14.10)together gives

FlFl.Yl T) - F(m l K Mxl Cov(n,k)l = Mxlfttnl - F(()F(n) I

=Mxj (#(A+ chB+)+ P(A- fa#-) - PA+c B-) - PA- rn#+)1

(#(A+)#(#+) +PA-)PB-) - PAhP(B-j - PA-4PB+)j I< 4Mm. (14.11)

Theory of Stochastic Processes

Since IF(-YlN) - F(m l < IF(#IT) 1+ lF(ml f lMx. it follows that, for p 1,

llF(-YIV) - F(mIIP S 2M,x(2a)1/F. (14.12)This completes the first pal't of the proof. The next step is to let X beLr-bounded. Choose a finite positive Mx, and define Xl = 1(IxjsMxlx and .Y2=

X - m. By the Minkowski inequality and (14.11),

IlF(.YlN) - A'(mII,< IIF(-LIT) -F(-Yl)lI,+ 11(721T) - e'(A)11,

u2A.za'(2a)1/J' + zjjxallp,

and the problem is to bound the second right-hand-side member. But

llX2ll/?f sf-rlr IlA'IIrr/P (14.14)for r p, so we arrive at

11.E'(x1N) - F(#)ll, < 1Mx2%)31P + zM-l--rll,-Yllr/. (14.l5)

(14.13)

Fillally, Choosing Mx = 1IXl)ra-I/C

and simplifying yields

11F(-Y1N) -.E'(A')11,/ 2(21/P+ lltxl/#-l/rjjxjjr,

which is the required result. *

There is an easy corollary bounding the autocovariances of the sequence.

14.3 Corollary For p > 1 and r plp - 1),

ICov(x,,.Y,+,,)l < 2(21-1* + 1)a)-1/#-1/rjjx;IIp1Ix,+mI!r.Proof

ICovxvxmb l = IEXtxtwmj - F(A)F(-'+,,,) I

(14.16)

= IExtExt-vm I9,) - EXt+mttj lf 11.X)1lpllA'(X,+,,I@,)- F(.&+,2II,/?-l)f 2(21-1/P+ 1)jj#,ljpIlX,+mIl<)-1/P-1/r (14.17)

Page 233: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Vixing 213

where the second equality is by 10.8 and 10.10, noting that Xt is Fsmeasurable,the tirst inequality is the Hlder inequality, and the second inequality is by14.2. .

1.4.4Theorem (Serfling 1968: th. 2.2) For r k p k 1,

IlA'(Xf+??,I11x) - EXt+m)11p< 29,,,1-1/r1I.&+,,,IIr. (14.18)where 4),,,is defined in (14.2).Proof The result is trivial for r = p = 1, so assume r > 1. The strategy is to

the result initially for a sequence of simple r.v.s. Let Xt-vm= Z#,=1.xj1xj,prove4/e 17+,,,,where s-tmm= nxtmm,xtwm.vj,

...).

For some e ( consider the randomelementExt-vm lT-fx)(t')),although for clarity of notation the dependence on ) is

not indicated. For r > 1 and q = r/tr - 1), we have

IExt..m I@-'-) - E(Xt+m)3r = jk-xipAi l@-'-)- #(4f)) 1r

i

r

S'>7

IxflI#(4fI@-'-)-#(4Ii

r

# z1. I5t ) - PAi) Il/rl #(AjI@-r-)- #(Af) I1/t?

='>7

ImlI ( i --

i

XI#(X i I5-tx)- #(X ij I)77I#(A I5

-'

-)- #(Af) I

X?f 77Ixi I

i i

S E IXt-mIrl @-,-)+ FIxt-vmIr) (77IPAi l@-'-) - PAi) I)X.

(14.19)i

The second inequality here is by 9.25. The sets Aj pmition , andPAi QJA/'I@-rx)= #(AfIT-rx)+#(Af'I5-t=) a.s. and PAiwAk') = PAi) + PAi') for# i'. Letting A) denote the union of all those Aj for which P(Ai I@-rx)- PAi) k

0, and AJthe complement of 4) on f1,

77I#tAfl @-'-)-#4A/)1 = I#(A) I@-'-)- #(A1)I+ IPCAII@-'-) - #(A7)I. (14.20)j

By 13.22, the inequalities

lPA; I@-'-)- #(41) l < 4uIPA; I@-'-) - PA;) I S 9,,,

hold with probability 1. Substituting int (14.19)gives

lExmf @-'-) - Exm) Ir ge'(IAu,,,Irl @-r-)+ Flx,+,,,l rjtztTl,,,lr&a.s. (14.21)Taking expectations and using the law of iterated expectations then gives

t r r rIq'

E If(xf+r,,l T-x) - Ext,vm) I S lElXt+m I (29,,,) , (14.22)and, raising both sides to the power 1/r,

Page 234: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

214 Theory of Stochastic Processes

Il'(kt+,,,IT-'-) - F(A7,a)IIrf 2IIA+-II41-1/r.

(14.23)Inequality (14.18)follows by Liapunov's inequality.

The result extends from simple to general r.v,s using the constmction of 3.28.For any nv. X:+mthere exists a monotone sequence of simple r.v.s fxkywm,k eN) such that IxY(z)r+,,,()- Xt-vmlt l --) 0 as k

-->

=, for all e f. This conver-gence transfers a.s. to the sequences (F(aV)f+,,,l@-fx)

- Exkymm), k e (N) by10.15. Then, assuming Xtwmis fv-bounded, the inequality in (14.22)holds as k

-->

x by the dominated convergence theorem applied to each side, with lxfx.mIr as thedominating function, thanks to 10.13(ii). Tlis completes the proof. w

The counterpart of 14.3 is obtained similarly.

14.5 Corollary For r k 1,

ICov(-Yr+,,,.x))l s zgml/rjjxrjjrjjx/+mjlrytr-j), (14.24)where, if r = 1, replace IlXf+r,,IIr/(r-1)by 1IX,+,,,IIx= ess sup Xt.vm.

Proof

ICov(x)+,,,,.')Is lIA-,IlrlI'(.2t+,,,I9-f-) - F(-Y,+r,,)lIs(r-l)s 2#-17rlIx,lIrII=,+-lIr/(r-1), (14.25)

where the first inequality corresponds to the one in (14.17),and the second oneis by 14.4. .

These results tell us a good deal about the behaviour of mixing sequences. Afundamental property is mean reversion. The mean deviation sequence (A - Ext) )must change sign frequently when the rate of mixing is ligh. lf the sequenceexhibits persistent behaviour with Xt - Ex tending to have the same sign for alarge number of successive periods, thep IE(Xt+mI5-t.) - Extmm) j would likewisetend to be large for large m. If this quantity is small the sign of the meandeviation m periods hence is unpredictable, indicating that it changes frequently.

Butwhile mixing implies mean reversion, mean reversion need notimply mixing.Theorems 14.2 and 14.4 isolate the properties of greatest importance, but not theonly ones. A sequence having the property that Ilvar(Ap, I9-',.) - Vart-mlllr > 0is called conditionally heteroscedastic. Mixing also requires this sequence ofnorms to converge as m

--+

x, and similarly for other integrable functions of Xtwm.Comparison of 14.2 and 14.4 also shows that being able to assert uniform mixing

can give us considerably greater flexibility in applications with respect to theexistence of moments. In (14.18),the rate of convergence of the left-hand side tozero with m does not depend upon p, and in particular, FIE(Xt+mIF-Jx)

- Ext+m) Iconverges whenever 117,+,,,111+8exists for > 0, a condition infinitesimallystronger than uniform integrability. In the corresponding inequality for am in14.2, p < r is required for the restriction to

Kbite'. Likewise, 14.5 for the case

p = 2 yields

Page 235: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 215

lCovxmnx I 2#,,,1/2jI.;t1I2IIx,+,,,Il2, (14.26)but to be useful 14.3 requires that either Xt or Xtwmbe fo-bounded, for > 0.Mere existence of the variances will not suffice.

14.3 Mixing in Linear Processes

A type of stochastic sequencefmlO-x

which arises very frequently in econometricmodelling applications has the representation

qXt = jzt -j

, 0 <- q-<

c'o ,

jcut(14.27)

where (Z,1+-7 (calledthe innovations or shoks) is an independent stochasticsequence, and (%lp=()is a sequence of fixed coefficients. Assume without loss ofgenerality that the Zt have zero means and that % = 1. (14.27)is called a moving

average process of order q (MA()). lnto this class fall the finite-order auto-regressive and autoregressive-moving average (ARMA) processes commonlyused to model economic time series. We would clearly like to know when suchsequences are mixing, by reference to the properties of the innovations and of thesequence f0;1. Several authors have investigated this question, includingIbragimov andLinniktlg7l), Chanda(1974), Gorodetskii (1977),Withers (1981a),Pham and Tran (1985),and Athreya and Pantula (1986a,1986b).

Mixing is an asymptotic property, and when q < x the sequence is mixinginfinitely fast. This case is called q-dependence. The difficulties arise with thecases with q = x. Formally, we should think of the MA(=) as the weak limit of asequence of MA() processes; the characteristic function of Xt has the form

tlk/) = Iltllzr-/fz-t),

j-j(1k.28)

and if #t?f(%)--> ($(D(pointwisein R) as q-- cxp where 9r(l) is a ch.f. and

continuous at , = 0, we may invel't the latter according to 11.12, and identify theT 0 Z 12 The existence of the limitcorresponding distribution as that of Xt = Zy=() ) t-j.

imposes certain conditions on the coefficient sequence f0./17,0.We clearly needl$1 -- 0 as j

-,..:

oo, and for the variance of Xt to exist, it is further neces-sary that the sequence be square-summable. Note that the solutions of finite-order ARMA processes are characterized by the approach of l0.j1to 0 at an exp-onential rate, beyond a finite point in the sequence.

If (4) is i.i.d. with mean 0 and variance c2, Xt is stationary and has spectraldensity function

.wl co 2

1)=''' Tjeihi (14.29)f( .

2:: jujl

The theorem of Ibragimov and Linnik cited in j14.1 yields thexconditionZ7=()l$1< txl as sufficient for strong-mixing in the Gausjian case. However, anotherstandard result (seeDoob 1953t ch. X!8, or Ibragimov and Linnik 1971: ch. 16.7)

Page 236: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Theoty of Stochastic Processes

states that every wide-sense stationary sequence admitting a spectral density has

a (doubly-infinite)moving average representation with orthogonal increments and

square summable coefficients.But allowing more general distributions for the innovations yields surprising

results. Contrary to what might be supposed, having the %tend to zero even at anexponential rate is not sufticient by itself for strong mixing. Here is a simpleillustration. Recall that the first-order autoregressive process Xt = pm-l + Zt,

1 has the MA(x) form with %= p/, j = 0, 1,2,...Ip I < ,

14.6 Example Let (Z,)7' be an independent sequence of Bernoulli r.v.s, withP(Zt = 1) = Pzt = 0) = zl. Let ax'c

= Ztl and

tX = .zXlt- j + Zt = F'')l-izt-p t = 1,2,3, ...t

lt is not difficult to see that the term

(14.30)

f t

X2-lZt-j = 2-Xlkzk (14.31)

belongs for each t to the set of dyadic rationals Bs = (kl2t,k = 0,1,2,...f+1 1) Each element of Ff corresponds to one of the 2f+1 ossible drawings2 -

. p-f- 1(Zo,...,4), and has equal probability of 2 . Iff Zo = 0,

Xt e Bt = fkllt, k = 0,2,4,...,242f - 1)),

whereas iff ztl = 1,

t k = 1 3 5 2f+1- I jXt e J'F'f- Bt = Lkll, , , ,..., .

It follows that (.Y()= 1) /a fX, e Btj = 0, for every finite /. But it is clear thatP(Xt 6 SJ = PIXO = 0) = ,1

. Hence for every finite m,

q?n X )#( (& = 1) rn (A% e Bm)) - #(V = 1)#(Xm e Bm4J=t, (14.32)

which contradicts im.--.h

0. n

Since the process starts at t = 0 in this case it is not stationary, but the ex-ample is easily generalized to a wider class of processes, as follows,

14.7 Theorem (Andrews 1984) Let IZtIEMJxbe an independent sequence of Bern-oulli r.v.s, taking values 1 and 0

,with

tixedprobabilities p and 1 -#. If Xt =

pXf-1+ Zt for p e (0,!, (A)*-oo is not strong mixing.' n

Note, the condition on p is purely to expedite the argument. The theorem surelyholds for other values of p, although this cannot be proved by the presentapproach.

Proof Write Xt+s = p'Xf + Xta where

Page 237: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 217

x-1Xts = 'X7fZt-s-j. (14.33)

j=

The support of Xzs is finite for finite p, having at most 2, distinct members.Call this set Bi, so that Wr! = (0,1), Fa = (O, 1, p, 1 + p), and so on. lngeneral, 8$+1is obtained from Wrxby adding p' to each of its elements and formingthe union of these elements with those of Bi; formally,

WQ1 (14.34)= F, QJ tw+ p&:w s v,J, s = 2,3,...

For given s denote the distinct elements of B&by wj, ordered by magnitude with wl

< ... < wJ, for J < ls.

h t X e (0p) so that ps# e (0p'+1) This means that XtmsNow suppose t a t , , t ,.

l between wj and wy+ p&+1for some-j. Defining events A = (aYtEassumesa va ue ,

(0,p)l and Bs = lXf+, e Ul/=ltwj,w + r'+1) ), we have #(#,IA) = l for any s,howeverlarge. To see that #(A) > 0, consider the case Zt = Zf-l = Zt-z = 0 andzr-a = 1 and note that

x x 3

pz u xp?= P < p;jSp t-j j . pj=3 j=3(14.35)

for p e (0,1g.So, unless PBs) = 1, strong mixing is contradicted.The proof is completed by showing that the set D = (Xf e (p,1)l has positive

probability, and is disjoint with #,. D occurs when Zt = 0 and Zf-1 = 1, sincethen, for p e (0,,11,

X oo

pz u y--lp/= P s 1,p s :-lp t-j j - )j= /=1

(14.36)

and hence #(D) > 0. Suppose thatJ-1

min(ws1 - wyl k p .

./k1Then, if D occurs,

(14.37)

J+1 sy < w + p&-1s w (1438)wy+ p f uj. + p t j y 1, .

SX B U/ (w wy+pS+1)

or in other words, Bs rnD = 0.hence, Xtn = wj + p f J=1 j, ,

The assertion ln (14.37)is certainly true when s = 1, so consider the followinginductive argument. Suppose the distance between two points in J7, is at least

'-1 Then by (14.34),the smallest distance between two points of +,+1 cannot be() .

l than the smaller of pf and pf-l - p' But when p e (0,r1),pf SrlpJ-l

whicheSS . ,

'-1 s s llows that (14.37)holds for every s. wimpliesp - p 2 p . It fo

These results may appear surprising when one thinks of the rate at which p'approaches 0 with -; but if so, this is because we are unconsciously thinkingabout the problem of predicting gross features of the distribution of Xtmsfromtime t, things like Pxt-vs f xlA) for fixyd x, for example. The notable feature ofthe sets Bs is their inrlevance to such concerns, at least for large s. What we

Page 238: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

218

have shown is that from a practical viewpointthe mixingconcept has some undesiable features. The requirement of a decline of dependence is imposed over 6

events, whereas in practice it might serve our pumoses adequately to toleracertain uninteresting events, such as the Bs defined above, remaining dependent cthe initial conditions even at long range.

ln the next section we will derive some sufficient conditions for strong mixinland it turns out that certain smoothness conditions on the marginal distributionof the increments will be enough to rule out this kind of counter-example. But no.consider uniform mixing.

13C ider an AR(1) process with i.i.d. increments,14.8 Example ons

Xt = pX,-l + Zt, 0 < p < 1,

Theory of Stochastic Processes

in which the marginal distribution of Zt has unbounded support. We show that (,X))is not uniform mixing. For 8 > 0 choose a positive constant M to satisfy

m-l

# X plzm-j < -.M < .

7=0(14.39)

Then consider the events

'-mL +A/)) e 10yt = (& p ( -x

B = fXm < Lk e sm+*where L is large enough that #(#) k 1 - 8. We show #(AI> 0 for every m. Let pK =

#(Zo < A'), for any constant K. Since Zo has unbounded support, either px < 1 forevery K > 0 or, at worst, this holds after substituting (-Z,) for (Zfl and hence(-XtI for (Xf). pk < 1 fOr a1l K implies, by stationarity, #(X-l < 0) = Pxo < 0)< 1. Since (.%< A-Ji (Z0 < X') QJ (fZ() 2 A') f'n IX-I < 01), independence of the(Zfl implies that

PIXZ < X) K pK+ (1-#r)#(X-1 < 0) < 1. (14.40)PA) > 0, since K is arbitrary. Since Xm = PXJA'C+ ZPJ

.-)

p'Zm-j, it is clearSothat

m- 1

PB I4) = # p'Vc + Xpizm-j < L p'Ve L +M < (14.41)

by (14.39).Hence 4uk I#(#lA) - PB4 I >

means ()a, = 1 for every m. n1- 2, and since is arbitrary, this

Processes with Gaussian increments fall into the category covered by this example,and if (hmixing fails in the first-order AR case it is pretty clear that counteruexamplej exist for more generalMAtx) ases too. The conditions for uniform mix-ing in linear processes are evidently exemely tough, perhaps too tough for thismixing condition to be very useful. ln the applications to be studied in laterchapters, most of the results are found to hold in some form for strong mixingprocesses, but the ability to assert uniform mixing usually allows a relaxation of

Page 239: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 219

conditions elsewhere in the problem, so it is still desirable to develop theparallel results for the unifonn case.

The strong restrictions needed to ensure processes are mixing, which these exam-ples point to (to be explored further in the next section), threaten to limit theusefulness of the mixing concept. However, technical infringements like the onesdemonstrated are often innocuous in practice. Only certain aspects of mixing,encapsulated in the concept of a mixingale, are required for many important limitresults to hold. These are shared with so-called near-epoch dependenthmctions ofmixing sequences, which include cases like 14.7. The theory of these dependenceconcepts is treated in Chapters 16 and 17. While Chapter 15 contains some neces-sal'ybackground material for those chapters, the interested reader might choose tokip ahead at this point to find out how, tnessence, the diftkulty will bes

resolved.

14.4 Sufficient Conitions for Strong and Uniform Mixing

The problems in thecounter-examples aboveare with theformof themarginal shockdistributions

-discrete

or unbounded, as the case may be. For strong mixing, adegree of smoothness of the distributions appears necessary in addition to summa-bility conditions on the coefticients of linear processes. Several sufficientconditions have been derived, both for general MA(x) processes and for auto-regressive and ARMA processes. The suftkiency result for strong mixing provedbelow is based on the theorems of Chanda (1974)and Gorodetskii (1977).Theseconditions are not the weakest possible in a1l circumstances, but they have thevirtues of generality and comparative ease of verification.

14.9 Theorem Let Xt = X*j=eojzt-jdefine a random sequence taY/l=-x,where, foreither 0 < r f 2 or r an even positive integer,

(a) Zt is unifot-mly fv-bounded, independent, continuous with p.d.f. J'rzt, and

+-Ih-zt-va, - yz,tzlIdz u Mlzl, &l' < -,SuyPj.. (14.42)

whenever Ia I f , for some > 0;x g (y.;1&1+r) jwre(b) Zf=0 t < =, w

X

277I$1r, r < 2,j=t

Grtrl =

x r/2gr-l y-'q:2 y. g.j ,

j=t

(14.43)

T 0 i # 0 for a11complex numbers x with jxl(C) 0(1) = ZJ =1 ./.'E

x g (r)1&1+r))zThen (.&)is strong mixing with m= O(Z/z.+1 t .

Before proceeding to the proof, we must discuss the implications of these threeconditions in a bit more detail. Conditton 14.9(a) may be relaxed somewhat, as we

Page 240: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

220

show below, but we begin with this case for simplicity. The following lemmaextends the condition to the joint distributions under independence.

14.10 Lemma lnequality (14.42)implies that for IJrl < 6, t = 1,...,k,

Theory of Stochastic Processes

k k k

Jw,ml-ljfzrtzr''btz'l - I1TzJz2dzz...dzk f M 77lJ,l.

>1 >1(14.44)

Proof usingFubini-s theorem,

k kjpkmljfz/zr+*l - l1Tz,tzf)dzt...dzk

,.-1k

f jpkI-fzltzl+Jl) - Tzltzlllllfz/z,+J) Jzl...dzk

>2

, ,

+jpk.fzl(zl)

I1.fz,(zr+zh)- l-I.fz,(o)dz,...dzk>2 >2

k k

f Ml cl I + jvk-mlj/ktz, + at4 - I1Jz,(z dzz...dzk.>2

The lemma follows on applying the same inequality to the second term on the right,iteratively for / = 2,...,k. w

Condition 14.9(b) is satisfied when I0;1 <<j-V for g. > 1 + 2/r when r K 2 and g> 3/2 + 1/r when r k 2. The double definition of Gtr) is motivated by the factthat for cases with r < 2 we use the von BalmEsseen inequality (11.15)to bound acertain sequence in the proof, whereas with r > 2 we rely on Lemma 14.11 below.Since the latter result requires r to be an even integer, the conditions in thetheorem are to be applied in practice by taking r as the nearest even integerbelow the highest existing absolute moment. Gorodetskii (1977)achieves a furtherweakening of these summability conditions for r > 2 by the use of an inequalitydue to Nagaev and Fuk (1971). We will forgo this extension, both because proof ofthe Nagaev-Fuk inequalities represents a rather complicated detour, and becausethe present version of the theorem permits a generalization (Corollary 14.13)which would otherwise be awkward to implement.

Detine W'r= ZJ;0Zt-j and J// = Z7 tjzt-j, so that Xt = 11$+ W,and W and J''f? zare independent. Think of 54as the Jcx-measurable Stail' of Xt, whose contribu-tion to the sum should become negligible as t -- x.

14.11 Lemma lf the sequence (Z,l is independent with zero mean, thencr m

Evlm) < 22+-1 )q0? sup Ezlm)t J s

j=t s ; 0(14.45)

2-for each positive integer m such that sup,ucz, ) < x.

Page 241: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 221

Proof First consider the case where the r.v.s Zt-j are symmetrically distributed,meaning that -Zt-j and Zt-j have the same distributions. In this case al1 existingodd-order integer moments about 0 are zero, and

t+k lm t+k t+k

E jzt-j =... 0./j...h,nElzt-jq ...Zt-h,n)

j=t jkzzzt jzm=t

t+k t+k2 1 1 1

= ... QjI .. .0jmE(Zt -./1 ...Zt-jm)./l=f jmzzt

1,0J: Aq

X 02 sup Ezlm).j sj=o s f 0

(14.46)

The second equality holds since EZt-jL...Zt-jzy) vanishes unless the factors formmatching pairs, and the inequality follows slnce, for any r.v. F possessing the

i*k 2 E(YqEYk) (i.e., Covtl'? F$ 0) for j,k > 0.requisite moments, EY ) ,

The result for symmetrically distributed Zs follows on letting k -->

x.

For general Zs, 1et Zs' be distributed identically as, and independent of, zx,for each s < 0. Then J4 = X; ctjzl-j is independent of J6, and J - 7; has symmet-rically distributed independent increments Zt-j - Z't-j. Hence

X m

lm)< Evt - k';)241< X 02 sup E(Zt.j - Z't-plmEvt jj=t

x mglm- 1 y 62 sup Elzltm-jl

,V j:=0 /

(14.47)

where the first inequality is by 10.19, the second bythe cr inequality. .

(14.45), and the third is

Lastly, consider condition 14.9(c). This is designed to pin down the propertiesof the inverse transformation, taking us from the coordinates of (X,) to those of(4J . It ensures that the function of a complex variable 0@) possesses an ana-

14 x'

lytic inverse I(x) = jzzorykxl.for lxl S 1. The particular property needed andimplied by the condition is that the coefficient sequence (':y) is absolutely

summable. If Xt = X; =()0.j4s, under 14.9(c) the inverse representation is alsodetined, as Zt = TJ=0pms. Note that 'Io = 1 if % = 1. An effect of 14.9(c) isto rule out

tover-differenced'

cases, as for example where 0@) = 01(x)41- x)with 01(.)a summable polynomial. The differencing transfonnation does zltp/yield amixing process in general, the exception being where it reverses the previousintegration of a mixing process.

For a t'initenumber of terms the transformation is conveniently expressed usingmatrix notation. Let

Page 242: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

222 Theory of Stochastic Processes

1

01

An =

0a-2

0n-1

01

0n-2

0

n x n),

01

(14.48)

(-10 t = 1 n can be written x = A?g where xso that the equations xt = Zy=0jzt-j, ,. ..,

z z-1

= @1,...,ak)and z = (z1,...,za).

A,T is also lower triangulat', with elements.

%-1

h replacing 0j for J = 0,...,a - 1. If >' = (v1,...,va)the vector &' = Aa # hast-1 for f = 1 ...,n. These operations can in principle be takenelements Zyzoljvr-j, ,

to the limit as n-->

x, subject to 14.9(c).

Proof of 14.9 Without loss of generality, the object is to show that the c-fields0 Ji'x' X ,x ) are independent as m

-->

x. TheF-x = c(...,X-1,Xo) and m+1 = c( -+1 m+2,...,

result does not depend on the choice of origin for the indices. This is shown forf.f )7+k for finite p and k, and since k and p are arbitrary, it thena sequence t

=1-p

follows by the consistency theorem (12.4)that there exists a sequence (XfJ'lxwhose finite-dimensional distributions possess the property for every k and p.This sequence is strong mixing on the definition.

Define a p + m + k-vector X = X6X1,XL)'where 2% = Xt-p,...,Xo)' (px 1), #1

= (X1,...,Ak)' m x 1), and Xz = Xm+L,...,Xm+' k x 1), and also vectors W =

(W(,W)' and F = (FJ,FJ)'such that #1 = IFI + F1 and Xz = +2 + Fa,.(The elementsof W and F are defined above 14.11.) The vectors

.% and F are independent of W.Now, use the notation X = GXg,...,Xt) and define the following sets:

0 f C e TPG = f(t): #(,() e C) e @1-p, or some ,

H = (: A%() e DJ e J;'7l, for some D G Bkm ,

#) e 10E = f ): Fz() e -x,

k denotes the vector whose elements are thewhere # = (p2:Ir21 f n) e T , lv21

absolute values of p2, and z1 = (q,,,+1,...,n,,,+z)'is a vector of positive constants.Also define

kD - pa = (w2:.,2 +#2 e D) e T .

H may be thought of as the random event that has occun-ed when, first F2 = 1/2 isrealized, and then W2 e D - p2. By independence, the jointc.d.f. of the variables(Wa,F2,A%) factorizes as F = FwzFh,xo (say)and we can write

#(& = #(X2 e D) =JsJss.va#F(+2,V2,20)

=jepjekzvltdFvz,xvl'x' (14.49)

Page 243: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 223

where

:(1/2)= #(W2 e D - v2) =jp-vgdFwzw. (14.50)

These definitions set the scene for the main business of the proof, which is toshow that events G and H are tending to independence as m becomes large. Given

P'm'k ility of X this is sufficient for the result, since C and D are5IB-measurab

,

arbitrary. By the same reasoning that gave (14.49),we have

#(G /'7 HIA E4 =jcjg(v2)#Fvz,xa(v2,&). (14.51)

Detine k* = supyzssztpz) and x. = infpzssztva), and (14.51)implies

X.PIGcjE) f PIGCHIAEI < z.PGcjE4. (14.52)

Hence we have the bounds

PCGchP = PCGCDHI-?EI+#45rn H(nEcl

; z*#(G) + PEc), (14.53)

and similarly, since z. K 1,

PGcllj k z.PGcjE) +#(Grnz;faFC)

= z.P(G) - k.PG fa Ec) + #(G chHcj Ec4

k k.P(G4 - P(Ec4. (14.54)

Choosing G = f (i.e., C = R#) in (14.53)and (14.54)gives in particular

x. -

PCEC4 f #(& S z*+PCEC), (14.55)

and combining all these inequalities yields

1#(G rn /f) - #4G)#(& 1f x*-

x++lPEc). (14.56)Write IP = Am+kZ,where Z = (Z1,...,Z,,,+k)'and Amn is defined by (14.48).SinceI4,?,+I= 1 and the (z1,...,Z,,,+kl are independent, the change of variable formulafrom 8.18 yields the result that G is continuously distributed with

m+k

/(w) = fztz) = 11fz,(zr). (14.57)>1

' 0 v G #1 G Bm+kThen the following relations hold:Define B = (r: A'I =, 2 .

y.-z.

u 2 sup Ix(v2)-z(0) jvzB

S 2 sup JsI/ctwz + #2) - /c(w2)Idw1v2G#

Page 244: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

224 Theor.v of Stochastic Processes

u 2 sup, jpm..kI/vt.' + v)-yv(w) ldw

&,G B

m+k m+k

=2 sup. J.,a+zflJz/z,+ r) - I-lJz/tz,)# 6 B'

*' t'% >1 >1

m+k

<ZM sup E 19,1,

p e B' >-+1(14.58)

where it is understood in the final inequality (whichis by 14.10) that );, l fhere 6 is defined in condition 14.9(a). The tllird equality substitutes 7 = Am-+1kvW

and uses the fact that ;'I = 0 if vl = 0 by lower triangularity of Amn. For v e#' note that1

m+k m+k t-m- l

77 Iv-,I 77 77zjvt-j1=-+1 >-+1 jszj

m+k t-m- 1 x m+kS 77 :7 I&ln,-; K X I

':/1

A'lnr,>-+1 jzujj /=0 t-nm

(14.59)

assuming 'rt has been chosen with elements small enough that the terms in parenthe-ses in the penultimate member do not exceed 6. Tlzis is possible by condition14.9(c).

For the final step, choose r to be the largest order of absolute moment if thisis does not exceed 2, and the largest even integer moment, otherwise. Then

PEc) = P lF21 >'q)

m+k

=P U f I141 > n,l>-+1

m+k m+k

< 77#(I7,I > n,)K >7FI7,Irn-,r,

>-+1 J=ra+1(14.60)

by the Markov inequality, and

E j7, Ir < sup E Iz,IrG,(r),'

(14.61)

where Gt is given by (14.43),applying 11.15 for r 2 (see(11.65) for therequired extension) and Lemma 14.11 for r > 2. Substituting inequalities (14.58),(14.59), (14.60),and (14.61)into (14.56)yields

m+k

l#(G chffl - PG4PH4 I X (n,+ G(r)n;3.t=m+3

(14.62)

Page 245: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 225

Since Gt i 0 by 14.9(b), it is possible to choose m large enough that (14.59)1R1+8 G (rln-r for each f > m. We obtainandhence (14.62)hold with n, = Gtr) =

t tm+k

*(G)#(S) l d 77t%(r)1/C1+r'lP(G r'3S) -

>-+1

X

1/(1+r)K 77G'r) ,

/=-+1(14.63)

where the right-hand sum is finite by 14.9(b), and goes to zero as m-->

x. Thiscompletes the proof. w

It is worth examining this argument with care to see how violation of the condi-tions can lead to trouble. According to (14.56),mixing will follow from two

O bje partconditions: the obvious one is that the tail component F2, the F-x-measuraof A%,becomes negligible, such that is, #(F) gets close to 1 when m is large,even when Tl is allowed to approach 0. But in addition, to have x*-

x. disappear,#(W2 e D - r2) must approach a unique limit as 172

--> 0, for any D, and whateverthe path of convergence. When the distribution has atoms, it is easy to deviseexamples where this requirement fails. In 14.6, the set Bt becomes Js - Bt onbeing translated a distance of 2-l For such a case these probabilities evidentlydo not converge, in the limiting case as t

-..-)

x.

However, this is a suftkiency result, and it remains unclearjust how much morethan the absence of atoms is strictly necessary. Consider an example where thedistribution is continuous, having differentiable p.d.f., but condition (14.42)none the less fails.

.; g14.12 Example Let fz) = Coz sin (z ), z e R . This is non-negative, continuous-2 d hence integrable. By choice of Q we caneverywhere, and bounded by C()c an

have *fztdz= 1, so f is a p.d.f. By the mean value theorem,

Ifz + ab - f(z)I = Ia IIf'z + a(z)J) I, a(z) e (0,11, (14.64)

' 8c sint/lcost/k - 2c()sin2(/)g-3 But note that J+OIf'l ldz =wheref (z)= o . -x

x and hence,1

1 :*

J.-

Ifz + J) - f U)ldz '--h

'x' as Ia l --> 0,' Ia I (14.65)

which contrdicts (14.42).The problem is that the density is varying too rapidlyin the tails of the distribution, and Ifz + a4 - fz) l does not diminish rapidlyenough in these regions as a -- 0.

15F fixed (small)a,The rate of divergence in (14.65)can be estimated. or4Ifz + a) - fz) Iis at a local maximum at points at which sin (z+ a) = 1 (or0)

4 d here (z+ tz)4- z4 = 4.3 + o(u2)=and sin z = 0 (or 1), or in other wor s w

:+/2. The solutions to these approximate relations can be written as z =

-1/3f C > 0 At these points we can write, again approximately (orders+CI lJI or 1 .

of magnitude are all we need here),

Page 246: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

226

-2 13Ifz + a4 - fz) IS lfz) S ICOC 11J I .

l is bounded within the interval (-C1 It7I-1/3

C jt7l-1/3j

byThe integra , 1-1 1/3 h area of the rectangle having height 2C0C7-121

a I2/3 Outside the4CoC I la I , t e .

-2 h integral over this region is bounded byinterval,J is bounded by coz , and t e+00 .p, .j 1/32C0Jcj$(,I-v3Z

dz = 2C0Cl IJl.

Theoty of Stochastic Processes

Adding up the approximations yields+x

jsJ..I fz + J) - fzs 1dz f Ml J l (14.66)

for M < x. oThe rate of divergence is critical for relaxing the conditions. Suppose instead of(14.42) that

+oo

y y s (;J-.I z + J) - zbIdz f Mtl JI), IJ1 f , (14.67)

could be shown sufficient, where (.) is an arbitrary increasing function withh ltzI) 1, 0 as Ia l . 0. Since

+co g)+x ty m gJ-.lfz + J) - f ldz S..f

(z) z (14.68)

for any a, (14.67)effectively holds for any p.d.f., by the dominated convergence16theorem. Simple continuity of the distributions would suffice.

This particular result does not seem to be available, but it is possible torelax 14.9(a) substantially, at the cost of an additional restriction on themoving average coefficients.

14.13 Corollary Modify the conditions of 14.9 as follows: for 0 < ;$S 1, assumethat

Zt is uniformly fv-bounded, independent, and continuously distributedwith p.d.f. /'z,,and

(+-Izz-va, - hjz) Idz u Mlcl', u < -., (14.69)sup ,t < etxl

whenever Itzl S :, for some > 0,'

b-)x- c (r)I5'(I5+r)< x where c,(r) is defined in (14.43);( tzz) t ,

') 1/0(x) = T(x) = E''=l'r/ for 1x1S 1, and Z'r=I (Tjl 15< x.(CJ J

* G (r)lVO+8)Then Xt is strong mixing with m= t?(Xt

.,,,+1

t .

Proof This follows the proof of 14.9 until (14.58),which becomes

Page 247: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixing 227

m+k

.-x < zu sup >'')lf;,1;' ,z +

y B' f=?n+1 '

(14.70)

applying the obvious extension of Lemma 14.10. Note that

m+k m+k f-m-j ;$ x m+k

X I9,1F = X X 'zjvt-j K X I'zj X n?p=?n+l p=+1

./=0

j=0 f=/?1+1(14.71)

using (9.63),since 0 < ;'Jf 1. Applying assumption 14.13(c'),

m+kI#(G r-bz,r?-#(G)#(zoI f< 77(nf)+c,(r)n-,r),

>-+1(14.72)

is obtained as before, but in this case setting n, = G1,7t+* .and the result .

Condition 14.13(b') is satisfied when l0./1 ./-B for g > 1/p + 2/r when r < 2, andg > 1/2 + 1/r+ 1/j3when r 2, which shows how the summability restrictions haveto be strengthened when ;'$is close to 0. This is none the less a useful extensionbecause there are important cases where 14.13(b') and 14.13(c') are easilysatisfied. In pmicular, if the process is finite-order A, both I%I and I

'ry1

either decline geometrically or vanish beyond some finite j, and (b') and (c')both hold.

Condition 14.13(a') is a strengthening of continuity since there exist functionsh.) which are slowly varying at 0, that is, which approach 0 more slowly than anypositive power of the argument. Look again at 14.12, and note that setting 13= la

will satisfy condition 14.13(a') according to (14.65).lt is easy to generalize2 k

-2

f k k 4 the earlier argument is easilythe example. Putting j'z) = Csin (z)z or ,

1/(k-1) d tusmodified to show that the integral converges at the rate Ia 1. , an2 z

-2

h integral convergeschoice of f'sis appropriate. But for J(z) = Csin e )z t elowly than Ia IF for al1 13> 0, and condition 14.13(a') fails.more s

To conclude this chapter, we look at the case of uniform mixing. Manipulatinginequalitie (14.52)414.55)yields

IPIH' G) - #4/./)1 S z*-

z. + PCEY(1+ plv), (14.73)

which shows that unifonn mixing can fail unless #(F) = 1 for all m exceeding atinite value. Otherwise, we can always construct a sequence of events G whoseprobability is positive but approaching 0 no slower than PEc4. When the supportof X-v,...,Xo4 is unbounded this kind of thing can occur, as illustrated by 14.8.The essence of this example does not depend on the AR(1) model, and similar casescould be constructed in the general MA(x) fzamework. Suftkient conditions mustinclude a.s. boundedness of the distributions, and the summability conditions arealso modified. We will adapt the extended version of the strong mixing conditionin 14.13, although it is easy to deduce the relationship between these conditionsand 14.9 by setting f$= 1 below.

Page 248: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

228 Theoty of Stochastic Processes

14.14 Theorem Modify the conditions of 14.13 as follows. Let (a')and (c')hold

as before, but replace (b') by'')

XO

xt I j I)l5< x,(b f.o

and add(d) (Z,l is uniformly bounded a.s.

hen (mJ is uniform mixing with (,,, = O; =,,,+1(E7=fI0./1)f5).T

Proof Follow the proof of 14.9 up to (14.55),but replace (14.56)by (14.73).Bycondition 14.14(d), th'ere exists K < x such that suprl Zt I< K a.s., and hence I-Y,I< K;=o I0.j1a.s. It further follows, recalling the definition of Fa, that PE) = 1when n, < A'Z7=,10./1for t = m + 1,...,- +k. Substituting directly into (14.73)from (14.70)and (14.71),and making this choice of n, gives (forany G with #(G)> 0)

-s+,

(s-yojj),

.I,(,.,1 c) - rm I <tl=?n+1 j=t

(14.74)

The result now follows by the same considerations as before. .

These summability conditions are tougher than in 14.13. Letting r-->

x in thelatter case for comparability, 14.13(b') is satistied when I%I = OU-B) for g, >

1/2+ 1/p, while the corresponding implication of 14.14(b'') is jt > 1 + 1/p.

Page 249: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

15Martingales

15.1 Sequential ConditioningIt is trivial to observe that the arrow of time is unidirectional. Even though wecan study a sample realization expost, we know that, when a random sequence isgenerated, the dcurrent' member Xt is determined in an environment in which theprevious members, Xt-k for k > 0, are given and conditionally fixed, whereas themembersfollowingremaincontingent. Thepastisknown, butthefutureis unknown.The operation of conditioning sequentially on past events is therefore of centralimportance intime-series modelling. Wecharacterizepmialknowledge by specify-ing a c-subtield of events from @, for which it is known whether each of theevents belonging to it has occurred or not. The accumulation of infonnation by anobserver as time passes is represented by an increasing sequence of c-fields,

= h that i @-1 i J70 Tl 51 i...? F.171T,1-x, suc ...

If Xt is a random variable that is Tr-measurable for each /, (Tfl=-xis said tobe adapted to the sequence ta)l=-x.The pairs (A,@,)=-x are called an adaptedsequence. Setting 9t = ctxx, -x < s < /) defines the minimal adapted sequence, but5t typically has the intepretation of an observer's infonnation set, and cancontain more information than the history of a single variable. When Xt is inte-grable, the conditional expectations A'(ml Fr-1) are defined, and can be thought ofas the optimal predictors of Xt from the point of view of observers looking oneperiod ahead (compare10.12).

Consider an adapted sequence l&,Tnlo-x on a probability space L't,5nP4,where( kl is an increasing sequence. If the properties

FI,L1 < x, (15.1)Esn I

.n-1)

= s%-l,a.s., (15.2)

hold for every n, the sequence is called a martingale. ln old-fashioned gamblingparlance, a mmingale was a policy of attempting to recoup a loss by doublingone's stake on the next bet, but the modern usage of the term in probabilitytheory is closer to describing a gambler's worth in the course of a sequence offair bets. In view of (10.18),an alternative version of condition (15.2)is

Sndp = Sn-)dP, each A e @n-1.A A

(15.3)

Sometimes the sequence has a tinite lnitialindex, and may be written f,L,F,,lTwhere 5'1 is an arbitrary integrable r.v.

Page 250: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

230

15.1Example Let (X,ITbe an i.i.d. integrable sequence with zero mean. If Sn =

Z7=l-Y,and 5n = c(Xn,X,,-1,...,X1), l5'q,?LlTis a mmingale, also known as adomwalk sequence. Note that A'lil S Z7=1A'l-Y,l < x. nran

F/ldt?ry of Stochastic Processes

15.2 Example Let Z be an integrable, T/f-measurable, zero-mean r.v., (Kl'-xan increasing sequence of c-tields with limn-yxfn = T, and Sn = #tzj ;n). Then

E(Sn39a-1) = F(F(zI@n)I@n-1)= F(ZIT,,-1) = .L-1, (15.4)

wherethe second equality is by 10.26(i). F1 ,L1< FIzI < x by 10.27, so Sn is amartingale.o

Following on the last definition, a martingale d#erence (m.d.) sequencefm,sl'<'-x is an adapted sequence on L't,5,P) satisfying the properties

Fl#r1 < x, (15.5)F(Al Tr-l) = 0, a.s., (15.6)

for every t. Evidently, if fSn) is a mmingale and Xt = St - St-j. then (.1 is am.d. Conversely, we may define a martingale as the partial sum of a sequence ofm.d.s, as in 15.1 (anindependent integrable sequence is clearly a m.d.). However,if Xt has positive variance unifonnly in /, condition (15.1) holds for a1l finiten but not unifonnly in n. To define a mmingale by Sn = L1=-=Xt can thereforelead to difticulties. Example 15.2 shows how a mmingale can arise withoutreference to summation of a difference sequence.

It is important not to misunderstand the force of the integrability requirementin (15.1).After all, if we observe Sn-b, predicting Sn might seem to be just amatter of knowing something about the distribution of the increment. The problemis that we cannot eat Esn I&-1,...)as a rt7rltftprl variable without integrabilityof Sn. Conditioning on S-l is not the same proposition as treating it as aconstant, which entails restricting the probability space entirely to the set ofrepeated random draWings of Xn. The latter problem has no connection with thetheory of random sequences.

A fundamental result is that a m.d. is uncorrelated with any measurable functionof its lagged values.

15.3 Theorem If (7r,@t)is a m.d., then

Cov(X,,9(.X)-1,X,-2,...)) = 0,

where (1) is any Borel-measurable, integrable function of the arguments.

Proof By 10.11 (szealso the remarks following) noting that 9(A-1,.&-2,...), isTf-l-measurable. w

15.4 Corollary lf (.,121 is a m.d., then E(XtXt-k) = 0, for a1l t and all k # 0.

Proof Put () = Xt-k in 15.3. For k < 0, redefine the subscripts, putting t = t - kand t' - Il7I= t, so as to make the two cases equivalent. m

Page 251: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 231

One might think of the m.d. propel'ty as intennediate between uncorrelatedness andindependence in the hierarchy of constraints on dependence. However, note the %

asymmetry with respect to time. Reversing the time ordering of an independentsequenceyields anotherindependent sequence, andlikewise areversed uncorrelated

sequence is uncorrelated; but a reversed m.d. is not a m.d. in general.The Doob decomposition of an integrable sequence (&,?L)7'is

Sn = Mn+An, (15.7)where Ao = 0, Mo = %, and

Mn = Mn-t + Sn - Esn I k-1), (15.8)An =

.4u-1

+ Esn lTn-1) - 5'n-1. (15.9)Anis an Fn-l-measurable sequence called Lbepredictable component of Sn.WritinghSn = L and hsl'n = Xn, we find A,4u = F(Fn l;t-1), and

Xn = rn- E(Ynl?h-1). (15.10)

Xn is known as a centred sequence, and also as the innovation sequence of Sn. Itis adapted if (l'n,iu J7'is, and since Fl Fn1< oo by assumption,

E IXnI < E Il',yI+ E IEYn lJ;,,-1)IE 1FnI+ EE lFnIITn-l))

= IE IraI < x, (15.11)

by (respectively)Minkowski' s inequality, the conditional modulus inequality, andthe LIE. Since it is evident that E(XtI?h-1) = 0, (.L,@n)7is a m.d. and solMn,Tnl7 is a mmingale.

Martingales play an indispensable role in modern probability theory, becausem.d.sbehave inmany importantrespects likeindependentsequences. Independenceis the simplifying propel'ty which permitted the iclassical' limit results, laws oflarge numbers and central limit theorems, to be proved. But independence is aconstraint on the entire joint distribution of the sequence. The m.d. property isa much milder restriction on the memory and yet, as we shall see in laterchapters, most limit theorem: which hold for independnt sequences can also beproved for m.d.s, with few if any additional restrictions on the marginaldistributions. For time series applications, it makes sense to go directly to themartingale version of any result of interest, unless of course a still weakerassumption will suftke. We will rarely need a stronger one.

Should we prefer to avoid the use of a definition involving c-fields on anabstract probability space, it is possible to represent a martingale difference

as, for example, a sequence with the property

E(xt IA-1,Xr-2,...) = 0 a-s. (15.12)

When a random variable appears in a conditioning set it is to be understood asrepresenting the corresponding minimal c-subfield, in this case c(#f-1sXf-2,...).

Page 252: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

232

This is appealing at an elementary level since it captures the notion of informa-tj

tion available to an observer, in this case the sequence realization to date. Butsince, as we have seen, the conditioning information can extend more widely thanthe llistory of the sequence itself, this type of notation is relatively clumsy.Suppose we have a vector sequence (4XsZf)), and Xr-though not necessarilyZt - is a m.d. with respect to 5t = c(XsZsXr-1,Zf-1,...) in the sense of(15.6). This case is distinct from (15.12),and shws that that definition isinadequate, although (15.16)implies (15.12).More important, the representationof conditioning information is not unique, and we have seen (10.3(ii))that anymeasurably isomorphic transformation of the conditioning variables contains the

same information as the oliginal variables. lndeed, the information need not evenbe represented by a variable, but is merely knowledge of the occurrence/non-

occurrenceof certain abstract events.

Theory of Stochastic Processes

15.2 Extensions of the Martingale Conceptk '-'

here (/( )= 1 is some increasingAn adapted triangular array ( tAk, @nf)?21Jn=1, w n n,z

sequence of integers, for which

'1Xnrl < oo, (15.13)

ft-vtl 9$,f-1) = 0 a.s. (15.14)

for each t = 1,...,lk and n k 1, is called a ntartingale dterence array. In manyapplications we would have just kn = n. The double subscripting of the subfield5nt may be superfluous if the infonnation content of the array does not depend onn, with 5nt = 5t for each n, but the additional generality given by the definition

k X d 5 =is harmless and could be useful. The sequence t&,Tn ITwhere Sn = Xf21nran n

5n,knis not a martingale, but the properties of mmingales can be profitably-1/2 x Iwre (m,@;)isused to analyse its behaviour. Consider the case Sn = n E)=1 t w

a m.d. Such scaling by sample size may ensure that the distdbution of Sn has anon-degenerate limit. Sn is not a martingale since

1/2Esn ITn-1) = E(/l- 1)/n1 5'n-l, (15.15)but each column of the m.d. array

- 1/2-1/2

4-1/2xX1 2 X) 3 X1 1 . .

-1/2 - 1/2 4-1/2x2 X2 3 X1 2 . .

-1/2-1/2

3 Xa 4 X5 . .

-1/24 .X4 . .

(15.16)

is a m.d. sequence, and S,l is the sum of column n. It is a tenn in a mmingalesequence even though this is not the sequence (5'n1.

An adapted sequence (&,@s)=-xof fal-bounded variables satisfying

Page 253: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 233

ESn+L IFa) k Sn a.s. (15.17)is called a submartingale, in which case Xn = Sn - &-l is a submalingale differ-

ence, having the property E(Xn+3lJ;n) 2 0 a.s. In the Doob decomposition of asubmartingale, the predictable sequence An is non-decreasing. Reversing theinequality defines a supermartingale, although, since -Sn is a supermmingalewhenever Snis a submartingale, this is a minor extension. A supermartingale might

represent a gambler's worth when a sequence of bets is unfair because of a housepercentage. The generic tenn semimartingale covers al1 the possibilities.

15.5 Theorem Let 94.):R i.-,b R be continuous and convex. If (&,TnJis a martin-gale and Fl ((Sn) l < =, then (9(&),Fn) is a submmingale. If () is also non-decreasing, (Y(&),8L) is a submartingale if (&,8Ll a submartingale.

Proof For the martingale case,

f(#(.%+1)ITn) #(F(&+1 I@n))= #(u%)a.s. (15.18)by the conditional Jensen inequality (10.18).For the submartingale case,

4='

becomes E2' in (15.18)when xl S x2 = (1)(x1) < 9(.n).xIf (.X),TfITis a (sublmmingaledifference, (Zr,@rl7' any adapted sequence, and

sn = Xx,z,-1,,.1

(15.19)

then (&,L )1'is a (sublmartingalesince

Esn..t I@n)= XX,z,-l + ZnEXn+3I?L)= Sn ('n).

>1

We might think of Xt as the random return on a stake of 1 unit in a sequence ofbets, and the sequence (4) as representing a betting system, a rule based oninformation available at time t - 1 for deciding how many units to bet in the next

game. The implication of (15.20)is that, if the basic game (in which the samestake is bet every time) is fair, there is no betting system (basedon no morethan information about past play) that can turn it into a game favouring theplayer -

or for that matter, a game favouring the house into a fair game.For an increasing sequence (1,1of c-subfields of ,5,P), a stopping time

..:(r4))

is a random integer having the property f : t ='r()

) e 5t. The classic example isa gambling policy which entails withdrawing from the game whenever a certainconditiop depending only on the outcomes to date (suchas one's losses exceeding

some limit, or a certain number of successive wins) is realized. If 'r is therandom variable defined as the first time the said condition is met in a sequenceof bets, it is a stopping time.

Let 'r be a stopping time of f@n), and consider

(15.20)

Sn, n ; ':

Snxz =

%, n > T(15.21)

Page 254: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

234

where n zx T stands for minf l,'rl. fSnoysnl'X'n.l

is called a stopped process.

15.6 Theorem If fSn,5n 1Tis a martingale (submartingale),then t,Sv,T,, JTis amartingale (submartingale).Proof Since f5n1Tis increasing, f ): k = T()) ) e 5n for k < rl, and hence also

-1 1 + s j( : n <'r()))

e 5n, by complementation. Write Snxz = 2=1,%(t=z) n (a<z),where the indicator functions are a11Fa-measurable. It follows by 3.25 and 3.33that Snxxis Ta-measurable, and

n-1FlSnvl f TE 151171(1....,:)l +e'lx%1(n<z) l

&1

Theory of Stochastic Processes

n-1

<Xflxsj+FxL l < =, n k 1.=1

lf (,L,TnJTis a martingale then for A e 5n, applying (15.3),

(15.22)

jghnwbjxxdp= jgcsqncgljdp+ Jxc,jsszlx'lzd'= Sndp + S4P = Snszdp,

Xfafn<ecl drnfnll A(15.23)

showing that (&+1,u,@n)Tis a martingale. The submmingale case follows easilyon replacing the second equality by the required inequality in (15.23).*The general conclusion is that a gambler cannot alter the basic fainzess charact-eristics of a game, whatever gambling policy (bettingsystem plus stopping rulelhe or she selects.

Al1 these concepts have a natural extension to random vectors. An adaptedsequence (A),@r)=-xis detined to be a vector rnzzrffrlgtzfe dterence if and only if(A'X,F,J*-xis a scalar m.d. sequence for a11conformable tixedvectors A # 0. Ithas the property

f(X+1I5tj = 0. (15.24)The one thing to remember is that a vector martingale difference is not the samething as a vector of martingale differences. A simple counter-example is the two-element vector Xt = XoXt-q)'s where Xt is a m.d.; fLX, + V2.Y;-1,@,Jis anadapted sequence, but

EhLxt + L4-j J?h-1) = l2A-j # 0,

so it is not a m.d.. On the other hand,

f(l-V+l + klxt ITr-l) = 0,

but (:14+1+ hzxt, ?&) is not adapted, since ml is not Tsmeasurable.

Page 255: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 235

15.3 Martingale Convergence

Applying 15.5 to the case ()4.) = l .

IP and taking unconditional expectations showsthat every martingale or submartingale has the property

FI5'n+11:2 F1snlP, p 1. (15.25)By 2.11 the sequence of pth absolute moments converges as n -- x, either to afinite limit or to +x. In the case where the Ll-norms are uniformly bounded,(sublmartingales also exhibit a substantially stronger property; they converge,almost surely, to some point which is random in the sense of having a distribution

over realizations, but does not change from one time period t to the next.The intuition is reasonably transparent. (Tnl is an increasing sequence of

c-fields which converges to a limit Tx c ;, the c-field that contains nnfor everyn. Since Esn ITn) = Sn, the convergence of the sequence (Tn) implies that of auniformly bounded sequence with the property ESn+L I1n) k Sn, so long as theseexpectations remain well-defined in the limit. Thus, we have the following.

15.7 Theorem If (&,;n)Tis a submartingale sequence and supnfl Snl f M < x,

then Sn -- S a.s. where S is a i-measurable random variable with FI5'1 S M. Ia

The proof of 15.7, due to Doob, makes use of a result called the upcrossinginequality, which is proved as a preliminary lemma. Considering the path of asubmartingale tluough time, an upcrossing of an interval (a,p1 is a succession ofsteps starting at or below G and terminating at or above p.To complete more thanone upcrossing, there must be one and only one intervening downcrossing, sodowncrossings do not require separate consideration. Fig. 15.1 shows two upcross-ings of ga,pq,spanning the periods marked by dots on the abscissa.

O O O O OO

O O

Fig. 15.1

Let the r.v. Fk be the indicator of gn upcrossing. To be precise, set i'I = 0,and then, for k = 2,3,...,n,

Page 256: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

236 Theory of Stochastic Processes

0 if either Fz-I = 0, Sk-k > a, or l'-1 = 1, Sk-j k p,1%=

1 if either Fz-j = 0, Sk-k K (x, or Fk-1 = 1, Sk-j < p.(15.26)

The values Of l'k appear at the bottom of Fig. 15.1. Observe that an upcrossingbegins the period after Sk falls to or below a, and ends at the tirst step there-after where ;$is reached or exceeded. 1%is a function of Sk-k and an K-l-measur-able random variable.

The number of upcrossings of (a,j$)up to time n of the sequence (,s'a((l))17,to bedenoted &a()), is an Fa-measurable random variable. The sequence (1&4)1Tismonotone, but it satisfies the following condition.

15.8 Upcrossing inequality The number of upcrossings of g(x,f$Jby a submartin-gale (&,@aJTsatisfies

E lsn1+ Ia lEun) f .

j) - tx(15.27)

Proof Define Sn'= maxlxa,Gl, a continuous, convex, non-decreasing function ofSn, such that fSn',snj is an adapted sequence and also a submartingale. Un is theset of upcrossings up to n for (&') as well as for (&). Write

n & 11

S - 5'' = NXYj= XFkyj + X (1 - J%)xL/,n la2 &2 &2

(15.28)

where l'k is f'rom (15.26),and XLis a submmingale difference. Thenn l

E X(1- i'1)xI = X jyg-oixLap*=2 k=2

- j ExLbsk-bldp. e,=Xa2 l 1*k=01

(15.29)

using the definition of a conditional expectation in the second equality (recall-ing that i'k is g-j-measurable), and the submartingale property, to give the in-equality. We have therefore shown that

Esn' -

,$()

k E X l'kAk' .

&2(15.30)

X2=21%Xis the sum of the steps made during upcrossings, by definition of J%.Since the sum of the XLover an upcrossing equals at least ;$- G by detinition, wemust have

M

X l%-YI(p - a)&s,&2

(15.31)

where Unis the number of upcrossings completed by time n. Taking the expectationof (15.31)and substituting (15.30),we obtain, as required,

Page 257: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 237

(;3- a)F(&n) f Eu - 5') f Esn' - a)

- Ifs-ozjs-- dp

< FIsn- a 1< E 1SnI+ Ia I. w

(15.32)

The upcrossing inequality contains the implication that, if the sequence isuniformly bounded in f.l, the expected number of upcrossings is finite, even as n.,..4x. This is the heart of the convergence proof, for it means that the sequencehas to be settling down somewhere beyond a certain point.

Proof of 15.7 Fix (x and f3> a. By 15.8,

e'1snI+ laI M+ l(x IEUn) < S < x.j3- a f3- a

(15.33)

For ) e (, (!.&((l))J:'is a positive, non-decreasing sequence and either divergesto +c,o or converges to a finite limit U(@ as n -- x. Divergence for (t) e C with#(C) > 0 would imply EUn) --x, whichcontradicts (15.33),so Un-.,-qUa.s., wherefl&) < =.

Define R(t)l= limsupn-oxxSand s((1)) = liminfu-yxs'n. lf s()) < (x < I < R)), theinterval (a,Iqis crossed an infinite number of times as n

-,..:

x, so it must be thecase that #(j' < (x < I < X)= 0. This is tnle for any pair a,p. Hence considef

lt0;srtl < iltllll = Uls' f tx < ;$f X1, (15.34)a,fs

where the union on the right is taken over rational values of (x and 0.Evidently,#(j' < X)= 0 by 3.6(ii), which is the same as j' = S = S a.s., where ,5' is the limitof fSn) . Finally, note that

FI5:1 S liminf E Ivn I f sup FIsnI < M, (15.35)'l-/oo n

where the first inequality is from Fatou's lemma and the last is by assumption.This completes the proof. .

Of the examples quoted eaflier, 15.1 does not satisfy the conditions of 15.7. Arandom walk does not converge, but wanders forever with a variance that is anincreasing function of time. But in 15.2, Xt is of course converging to Z.

15.9 Corollary Let fSu,5n l=-xbe a doubly infinite mmingale. Then Sn ---> S-= a.s.as n

-->-x, where S-x is an Al-bounded nv.

Proof Let U-n denote ihe umber of upcrossings of (a,j31performed by the

sequencef'j,-1

k j 2-nl.

The argument of 15.8 shows that

FIsj I+ Ia IEU-i,4 ; , all n 1.pra

1 ous to those of 15.7 show thatArguments precisely ana og

(15.36)

Page 258: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

238

# liminf Sn < limsup Sn = 0, (15.37)N-)-x n'-i-co

- 1 i tive non-so that the limit 5'-x exists a.s.'rhe

sequence (FI5k11-x s non-nega ,

increasingas n decreases by (15.25),and FI<S- l < cxo by definition of a martin-gale.Hence FI5'-.I < x. .

Theoty of Stochastic Processes

If a martingale does not converge, it must not be thought of as converging in F,of heading off to +x or

-x,

never to return. This is an event that occurs onlywith probability 0. Subject to the increments having a suitably bounded distribu-tion, a nonconvergent mmingale eventually visits al1 regions of the real line,almost surely.

15.10 Theorem Let (Xf,Ffl be a m.d. sequence with suptlxf I) < cxa, and 1etSn =

1J=1kt.lf C = (: Snk converges ), and

E = ((: either infnxst) > -x

or supnxst) < xl,

then P(E - C) = 0.

Proof For a constant M > 0, define the stopping time Twz((o) as the smallest integer

n such that Sn > M, if one exists, and IA(t) = tx, otherwise. The stopped processfsnxzu,Tnwwl-i-rlis a martingale (15.6),and vhn-tlxxu S M for al1 n. LettingS'Lzu= maxf 5kZ.,OJ,

s+ < s+ 1),xw+ X'bnxzu< M+ sup I-LI.nh'w (n-n

(15.38)

Since Ezp = 0, Fj Swwl = zft5Xwl,and hence supne'l skwj < x, andSnvsxuconverges a.s., by 15.7. And since Snvsxufk = Snk on the set f : supnxst)S MJ,

k%()

converges a.s. on the same set. Letting M---k x, and then applying thesame argument to

-%,

we obtain the conclusion that Snlk converges a.s. on theset E; that is, #(Cn AD= #(A-), from which the theorem follows. .

Note that Ec = ( :supnlnt) = +x, infniat) =

-x1

. Since #(FC)= #((Cf'7F)6)

= P(Cc t.pEc), a direct consequence of the theorem is that Cc ECDN where#(m = 0, which is the claim made above.

15.4 Convergence and the Conditional Variances

lf (&) is a square-integrable mmingale with diferences fAkl,

2j5 ) = E(Sl + Xl + J,x S -j jTu-j) k &2-j,Esn n-1 n-1 n n n

1is a submartingale.The Doob decomposition of the sequence of squares hasand Su2 M +A where hhl'n = X1 - E(X1 j@

-j),

and Aa = E(X2 j5-1).

Thethe form Sn =n n n n n n n

sequence fAn) is called the quadratic variation of f&) . The following thoremreveal an intimatelinkbetween mmingaleconvergence and the summability of theconditional variances; the latter propel'ty implies the former almost surely, and

'=' E AQj 5 ) < x a.s. then Sn-..4

S a.s.in particular, if ?=1 ( t t-1

Page 259: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 239

15.11Theorem Let (X,,F,)Tbe a m.d. sequence, and Sn = X:=l-V.lf

D = ( : :7.1F(xl I@,-j)() <xl

e @,

C = f : sn convergesl e F,then #(D - C) = 0.

Proof Fix M > 0, and define the stopping time ,1A#(l)) as the smallest value of nhaving the property

Xe'(A I@,-l)(to) M.>1

(15.39)

If there is no finite integer with this property then ,:A#)) = x. If DM = ( : 'rA/.((l))

=x), D = limv-yxfk. The r.v. 1(wka)() is Fs-l-measurable, since it is known at

time n - 1 whether the inequality in (15.39)is true. Detine the stopped processN

A7A1(zvvf) = Snxxu. (15.40),.=1

snvsxuis a martingale by 15.6. The increments are orthogonal, andN

supEsnhxu) = sup E XV ltwkuln n >1

n

=supf 77ltwknlftl @r-1) < M,n >1

(15.41)

where the final inequality holds for the expectation since it holds for each (.t) e fby definition of 1M((t)). By Liapunov's inequality,

supFIskwl S sup Il&.wII2< MlN n

and hence Snxzu converges a.s., by 15.7. If ) e DM, xskss/)) = xS4))for everyn e EN,and hence Sn1 converges, except for ) tna set of zero measure. That is,PDu rn C) = PDM4. The theorem follows on taking complements, and then letlingM

-->

x. .

15.12 Example To get an idea of what convergence entails, consider the case of(Xf) an i.i.d. sequence (compare15.1). Then Lxtlatjis a m.d. sequence for any

f ositive constants. Since Exlt I@f-1)= F(aYl) = c2, a constantsequence (t7,)o pwhich we assume finite, Sn = Xlcjxlat is an a.s. convergent martingale whenever

x 2t

.1 jlat < (x,. For example, at = t would satisfy the requirement. a

In the almost sure case of Theorem 15.11 (when#(C) = #4D) = 1), the summa-bility of the conditional variances transfers to that of the ordinary varipnces,

2 2 h Ftsupfxl) < x, the summability of the conditional variancescf = EXt). Also w en2 th lves. These are conse-is almost equivalent to the summability of the Xt emse

uences of the following pair of useful results.q

Page 260: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

240 Theory of Stochastic Processes

15.13 Theorem Let lZ?) be a any non-negative stochastic sequence.(i) X*t=jEZt) < x if and only if :7 =1f(Zt17t-1) < cxa a.s.

(ii) lf Asupzf) < x thn PD A E) = 0, where

D = ( ): Z=f=qE(Zt197,-1)4) < cxo ),

E = ( ): Z'';=lZr() < x ) .

Proof (i) The first of the sums is the expected value of the second, so the Konly

if' part is immediate. Since F(ztl F,-1) is undetined unless Ezt) < x, we mayassume &1=qE(Zt)< x for each finite n. These partial sums form a monotone serieswhich either converges to a finite limit or diverges to +x. Suppose Z:=1F(Zrl Tf-1)converges a.s, implying (bythe Cauchy criterion) that L1+=mn+3EZtIYf-1) -->

0 a.s.as m A n

--)x. Then &I*=mn+jEZt)-.+0

by the monotoneconvergence theorem, so thatby the same criterion k1=jEZt) --y k*t=jEZt) < x, as required.

(ii) Define the m.d. sequence Xt = Zt - F(Z,I1f-l), and 1et Sn = Z:=1m.Clearlysupnskt) f k*t=1Z,((t)), and if the majorant side of this inequality is finite,Sn1l))converges in almost every case, by 15.10. Given the definition of Xt, thisimplies in turn that Z7=IF(Z/I9f-1)()) < x. ln other words, PE - D) = 0. Nowapply the same argument to -Xt = E(Zt 111-1)- Zt to show that the reverse implica-tion holds almost surely, and PD - F) = 0 also. .

15.5 Martingale lnequalities

Of the many interesting results that can be proved for mmingales, certaininequalities are essential tools of limit theory. Of particular importance aremaximal inequalities, which place bounds on the extreme behaviour a sequence iscapable of over a succession of steps. We prove two related results of this type.The tirst,a sophisticated cousin of the Markov inequality, was originally provedby Kolmogorov for the case where (Xr) is an independent sequence rather than am.d., and in this fonn is known as Kolmogorov's inequality.

15.14 Theorem Let (&,Tn)Tbe a martingale. For any p 2

E lsnj#P max lSkI > E I - .

P1Sk<a E

(15.42)

Proof Define the events A1 = f : lkh'14) I > :1, and for k = 2,...,n,

A1 = ): max #Sj I S E, 1skl) l > e e 5k.1u./<1

The collection A1,...,An is disjoint, and

UWz = max l51 > e, .

=l lsfn(15.43)

Page 261: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales

Since A1 c l I51 > el, the Markov inequality (9.10)gives

PA <-pe'

I 'k IP1z:).

By 15.5, ISnIP for p 1 is a submartingale, sok f n. Since At e 5k, it follows that

f(1'%lP1Ak) I EE ISnIP I1k)1xJ= F( ISnIP1zk), (15.45)where the equality applies. (10.18).Noting X2=1lz4k= 1U2-lxk,we obtain from(15.43)415.45), as required,

241

(15.44)I&IP f F(I&IPI5k4 a.s., for 1 <

n n

P max I.%1

> s = X#(All < Z-PTECIsnIP1z:)

1<k < n=1 k=l

N

= Z-PE ISnIPTlzu f Z-PEC1sL1#).wk=1

(15.46)

The second result converts the probability bound of 15.14 into a moment inequal-ity.

15.15 Doob's inequality Let (&,FnJTbe a martingale. For p >

pp s p s Is Ipe' max Iskl n .

- 1lffn(15.47)

Proof Consider the penultimate member of (15.46)for the case p = 1, that is,

P max 1,%l

> s f E-1F( ISnIltmaxlsksalslo c)),

lus;n(15.48)

and apply the following ingenioud lemma involving integration by parts.

15.16 Lemma Let X and F be non-negative nv.s. If #(X > E) S E-1F(F1(x>c)) forall E > 0, then F(XP) f lp - 1)1PF(F#), for p > 1.

Proof Letting Fx denote the marginal c.d.f. of X, and integrating by parts, using#(1 - Fx) = -#Fx and RP(1 - Fx@))17 = 0,

'jpdFxv=

-jhpdk

- Fx(())Exp = j, ,

Xp(#-1(1

- Fx6))X=j*pLP-$PX

> j)#j. (15.49)=l oDefine the function 1fx>j)@) = 1 when .7

> (, and 0 otherwise. Letting Fxydenote the joint c.d.f. of X and F and substituting the assumption of the lemmainto (15.49),we have

p)upj-r-zstrlfxxklldEx0

Page 262: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

242 Theory of Stochastic Processes

=pjo-ie-l (J(scsF1tx>()(I)JFx.r(I'F))X

-pj

y jkp-zmdpkvx,y)(R2)+ 0

- 11jpz,-yx'--'dFkYx-ys

P styxp-l;= .

-1 (15.50)

Here (R2)+ denotes the non-negative orthant of R2, or g0,x)x g0,x).The secondequality is permitted by Tonelli's theorem, noting that the function Fxrt definesa c-finite product measure on (R3)+. By Hlder's inequality,

-j ly josl-lytxp;ECYXP ) K E ( .

Substituting into the majorant side of (15.50)and simplifying gives the result. w

To complete the proof of 15.15, apply the lemma to (15.48)to yield (15.47),putting x = maxlsjss I

,%1

and i' = I.%1

. x

Because of the orthogonality of the differences, we have the interesting propertyof a mmingale l5'nlr that

EzSll = E XV ,

>1(15.51)

where, with s'tl= 0, Xt = St - St-k. This lets us extend the last two inequalitiesfor the case p = 2, to link Ptmaxlsksa lSnl > :) and Ftmaxlxlsa) directlywith the variance of the increments. It would be most useful if this type ofproperty extended to other values of p, in particular for p e (0,2).

One approach to this problem is th von Bahr-Essen inequality of j11.5.Obviously, 11.15 has a direct application to martingales.

15.17 Theorem lf (aYf,7f1Tis a m.d. sequenc and Sn = 1=3Xt,

E lsnIP < 2:7 E lx,IP

>1(15.52)

for 0 < p f 2.

Proof This is by iterating 11.15 with F = Xn, V = Tn-la and X =kv-j,

as in theargument leading to (11.65);note that the latter holds for m.d. sequences just asfor lndependentsequences. .

Another route to this type of result is Burkholder's inequality (Burkholder1973).

15.18 Theorem Let (&,Fn ITbe a martingale with increments Xt = St - St-j, and

Page 263: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 243

So = 0. For 0 < p f 1 there exist positive constants cp and Cp, depending only onp, such that

ng

py

ng

p

c,l X.xt K E Isn1 < c?,f XjA-,)

. n>1 /=

(15.53)

On the majorant side, this extends by the cr inequality ton p n

FISnIIP < C E XF < c XFI Xt1lP, 0 < p S 1,p t p>1 >1

(15.54)

18 I fact thewhich differs from (15.52)only in the specitied constant. n ,

Burkholder inequality holds for p > 1 also, although we shall not need to use thisresult and extending the proof is fairly laborious. Concavity of

.?

becomesconvexity, so that the arguments have to be applied in reverse. Readers may liketo attempt this as an exercise.

The proof employs the following non-probabilistic lemma.

15.19Lemma Let fyfJ be a sequence of non-negative mbers with yl > 0, and

1etFn = )7=1y?for t 1 and J'tj = 0. Then, for 0 < p S 1,&

F$ < y + PXF--ly, < (1+BP4YL,>2

(15.55)

where Bp k 0 is a t'initeconstant depending only on p.

Proof For p = 1 this is trivial, with Bp = 0. Otherwise, expand F$ = (L-1 A-yn?

in a Taylor series of tirst order to get-1 15 56)1'$ = F$-1+P(l%-1 + ot'nf yn, ( .

where 0n G (0,11.Solving the difference equation in (15.56)yields

YP = y; + px (J'f-1 + tytf-yt.&

>2

#-1-1

Kt = F,-1 - (F,-1 + %y,)P,

Defining

(15.57)

(15.58)

we obtain the result by showing thatn

0< pxKtyt < BpYL.>2

(15.59)

r rThe left-hand inequality is immediate. For the right-hand one, note that y-x f

@-x)r,

fbr y > x > 0 and 0 < r < 1 (see(9.63)).Hence, (15.58)implies that

1-.p

1 1 p-1 1.v 1-pKt K -

a= (Ff-1(Ff-1+ 0ly) 0f yt .

Ff-1 Fl-1+ tyt(15.60)

It follows that

Page 264: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

244 Theory of Stochastic Processes

2,-2 l-p0 S Ktyt < F,-1 yt , (15.61)

and hencen n

F-P#XA-rzJ < pv@tlYt-tlls-ps(.y /l'n)''n t>2 >2

= PT'''/(A''71''-1)X1-p'

'P

? ytr-a

E Bpn), (15.62)

where y; = ytlj'n for t = 1,...,n is a collection of non-negative numbers summing

to 1, F; = Yxilyr,and Bpn) denotes the supremum of the indicated sum over allsuch collections, given p and n.

The terms y;/F;-1 = yt /1$-1 for f k 2 are finite since yl > 0. lf at most afinite number of the yt are positive the majorant side of (15.62)is certainlyfinite, so assume otherwise. Without loss of generality, by reinterpreting n ifnecessary as the number of nonzero terms, we can also assume yt > 0 for every t.

' ' Ot-1) and y; =04?T-1) and applying 2.27 yields the result B n)Then, ytlYt-t =

, p

=/41) for a1l p e (0,1). Putting Bp = supnfptn) < x completes the proof. w

2 d for E > 0 and 8 > 0, setProof of 15.18 Put An = Z:=1X?, ann

Fn = : + Sln+ (: +An) = (1+ 6)(: +4u) + 2Xx%-1.V,>1

(15.63)

2 25' X for t k 2,so that in the notation of 15.19, yt = 1', - F,-1 = (1+ 6).Yf+ f-1 t2 0 Then by the left-hand inequality of (15.55),with yl = (1+ 8)(e+X1) > .

?1 t

is S (1+ 6/18 +X21)P

+ (1+ 6)#XF(:lV+ 2pXJ't--ljx%-j.Y/.>2 N2

However, : is arbitrary in (15.64),and we may allow it to approach 0. Takingexpectations through, using the law of iterated expectations, and the facts that

-1 i decreasing in its argument, we obtainF(.&l 7/-1) = 0 and that.?

s

E(%l + 6:4.)#< E (1+DjPXIP

+ (1+ )pX(:2,-1+ 6x4,-j)P-1V>2

(15.64)

N

S E (1+&fXILP

+ (1+ 6)45/'-1XA(--j1FP t .

>2(15.65)

2 d = Xl for t 2 the right-But if we put now l'n = : + 4n, with yl = e:+ man yf t ,

hand inequality of (15.55)yields (again,as the limiting case as E 1 0)Fl

Xlp + #XA--lV< (j + Bg)E(AL).1>2

(15.66)

Page 265: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Martingales 245

and since (1+ ? f (1+)P-1 this combines with (15.65)to give

(1 +6)8P-1(1

+ B )F(A$) k E(Sln+ 8z4ay

2 2:-11(.E'1us- llp+ #F(4j)j, (15.67)

where the second inequality is by the concavity of the function.?

for p S 1.Rearrangement yields

FIs I2: S 6J'g21-P(1 +.B )(1+ )/ - 1jF(4j). (15.68)n p

which is the right-hand inequality in (15.54),where Cp is given by choosing 8 tominimize the expression on the majorant side of (15.68).

In a similar manner, combining the right-hand inequality of (15.55)with L = :2 ith (15.65)and (15.67),and using concavity, yields+ Sn w

1+ )(1+l,)FI 5'n1lp k (1+ &)E xll' + pTn5'l.%-llxj(

,1U'-

Pxip+ (1+

Dlpxn(./-1 + A,-1)P-1A-lz E ((1+ )N N2

2 6.4 )pEsn + n

>2#-1(s1 snIlp + #A'(z4j)) (15.69)

which rearranges as

FIS 1lp 8#(21-P(1+ 6)41+#p - 11-1F(Aj), (15.70)n

which is the left hand inequality of (15.54),with o given by choosing tomaximize the expression on the majorant side . K

For the case p = 1, Bp = 0 identically in (15.55)and cl = C1 = 1 for any ,

reproducing the known orthogonality property.Our final result is a so-called exponential inequality. This gives a probability

bound fof martingale processes whose increments are a.s. bounded, which isaccordipgly related directly to the bounding constants, rather than to absolutemoments.

15.20 'heorem If (aYf,?;r)';'is a m.d. sequence with IAI K Bt a.s., where (#fl is asequence of positive constants, and Sn = Z)=lA,

#( IsnI > :) S 2exp(-:2/2(J=1#2f)j. u (15.71)

This is due, in a slightly different form, to Azuma (1967),although the cor-responding result for independent sequences is Hoeffding's inequality, (Hoeffding1963). The chief interest of these results is the fact that the tail probabilitiesdecline exponentially as 6: increases. To fix ideas, consider the case Bt = B for11t so that the probability bound becomes #( ISnI > e) f 2exp (-elllnBlt

. Thisa ,

inequalitv is trivial when n is small. since of course P( ISnI > nBj = 0 bv con-

Page 266: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

246

1?2struction. However, choosing e = On ) allows us to estimate the tail probabili--1/2 The fact that these are becomingties associated with the quantity n Sn.

exponential suggests an interesting connection with the central limit results tobe studied in Chapter 24.

Proof of 15.20 By convexity, evel'y x e (-#,,#,) satisfies

Theory of Stochastic Processes

ctfr B-xle-l?t

(Bt +.x)d

+ ( tetM < lBt (15.72)

for any a > 0. Hence by the m.d. property,

ELemtI1,-1) f qlelt+ e-Bt) S exptzlazsz, j a.s., (15.73)

where the second inequality can be verified using the series expansion of theexponential function. Now employ a neat recursion of 10.10:

Ee'nk @n-I) = Ee-b-xn I@n-1) (15.74)

= etxk-lFtetnj @s-1)

osn-lexp

(.1 2s2 ) a s .fe n .

Generalizing this idea yields

Een) = Eeq...Eefen, @,,-1)1,,,-2)... l11))

S expfz1(#u2jF(F(...F(c%-l (@u-y...(@j))

(15.75)

s exp fac,l2x:-1s2, ) .

Combining (15.75)with the generalized Markov inequality 9-11 gives

Psn > E) f exp (-(y,E + yx12E2.1#2,j (15.76)for e > 0, which for the choice a = elX1=LBlt4 becomes

Psn > E) < zexpt-k?!l&1=jB1t) j. (15.77)The result follows on repeating the argument of (15.75)q15.76)in respect of -Snand sumrning the two inequalities. *

A practical application of this sort of result is to team it with a truncation oruniform integrability argument, under which the probabilities of the bound B being

jexceeded can also be suitably controll: ,

Page 267: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

16Mixingales

16.1 Definition and Examples

Martingale differences are sequences of a rather special kind. One-step-ahead

n redictability is not a feature we can always expect to encounter in observedu ptime series. In this chapter we generalize to a concept of asymptotic unpredict-ability.

16.1 Dellnition On a probability space L1,5,P), the sequ e of pairs (A,Tt)=-x,where (Tf) is an increasing sequence of c-subfields of 5 and th .Xt are integrable

nv.s, is called an Lp-mixingale if, for p 2 1, there exist sequences of non-negative constants (cfl*-x and ((,,,)7such that (,,,--> 0 as m

-->

x, and

IleAl 1,-,211, K c'l,,,

117,-F(A1 T,+,2Il,< c,(,,,+1

(16.1)

(16.2)

hold for a11t, and m k 0. In

A martingale differepce is a mixingale having (,,,= 0 for all m > 0. Indeed,tmixingale differences' might appear the more logical terminology, but for thefact that the counterpm of the martingale (i.e. the cumulation of a mixingalesequence) does not play any direct role in this theory. The present terminology,due to Donald McLeish who invented the concept, is standard. Many of the resultsof this chapter are basically due to McLeish, although his theorems are for the

case p = 2.Unlike martingales, mixingales form a very general class of stochastic

processes; many of the processes for which limit theorems are known to hold canbe characterized as mixingles, although supplementary conditions are generallyneeded. Note that mixingales re not adapted sequences, in general. Xt is notassumed to be Fsmeasurable, although if it is, (16.2)holds kivially for every mk 0. The mixingale property captures the idea that the sequence (Tsl containsprogressively more information about Xt as s increases; in the remote past nothing

is known according to (16.1),whreas in the remote future everything will eventu-ally be known accordig to (16.2).

The constants ct are scaling fatorj to make the choice of (,uscale-independent,d multiples of 11A11pwill often fulfil this role. As for mixing processes (seean

j14.1), we usually say that the sequence is of size-(> if (m= O(m-%) for (? > (>.

However, the discussion following (14.6)also applies to this case.

Page 268: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

248 Theory of Stochastic Processes

16.2 Example Consider a linear processX

x = 77%i-./, (16.3)jc-,

where (I7,)+-Zis a fw-bounded mmingale difference sequence, with p k 1. Also1et 5t = ct&s, s S f). Then

X

F(Al 5t-m4 = 77 jtb-j, a.s.j=m

(16.4)

Xt - F(.Y,l5t+m4= 77e-jUt+js a.s. (16.5)j=m+L

Assuming (Us*-ooto be unformly w-bounded, the Minkowski inequality showsthat (16.1)and (16.2)are satisfied with ct = sup,ll U,llp for every t, and (,,,=

TJ=r,,( l0j1+ Ie-j!). f-Y,,F,l is therefore a fz-mixingale if X; =m I0.j1+ l0-./1).-.-9 0

as m -- x, and hence if the coefficients (%J=-xare absolutely summable. The fone-

sided' process in which %= 0 foj < 0 arises more commonly in the econometricmodelling context. In this case Xt is gr-measurable and Xt - Ext l5t+m4= 0 a.s.,but we may set ct = supysfll t&11pwhich may increase with t, and does not have tobe bounded in the limit to satisfy the definition. To prove Xt integrable, givenintegrbility of the &,, requires the absolute summability of the coefficients,and in this sense, integrability is effectively sufficient for a linear process tobe an f-l-mixingale a

We could say that mixingales are to mixing processes as martingale differencesare to independent processes', in each case, a restriction on arbitrary dependenceis replaced by a restriction on a simple type of dependence, predictability of thelevel of the process. Just as martingale differences need not be independent, somixingales need not be mixing. However, application of 14-2 shows that a mixingzero-mean process is an adapted Ap-mixingale for some p k 1 with respect to thesubfields 5t = c(A,Xf-t,...), provided it is bounded in the relevant norm.

To be precise, themeandeviations of any fv-bounded sequence whichis a-mixingof size -(?, for r > 1, form an p-mixingale of size

-:(1//7

- 1/r) for p satisfying1 S p < r. If the process is also (-mixingof size -(?, application of 14.4tightens up the mixingale size. The mean deviations of a (-mixingfv-boundedsequence of size -(p is an Lxmixingale of size

-:(1

- 1/r) for 1 < p S r. Thereader can supply suitable definitions of ct in each case. lt is interesting thatthe indicated mixingale size is lower (absolutely)than the mixing size, exceptonly in the (-mixingsequence having finite sup-nonn (fv-boundedfor all r).Although these relative sizes could be an mefact of the inequalities wlzich canbe proved, rather than the sharpest available, this is not an unreasonable result.If a sequence has so many outliers that it fails to possess higher-order moments,it would not be surprising to find that it can be predicted further into thefuture than a sequence with the same dependence structure but more restrictedvariations.

Page 269: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixingales 249

The next examples show the type of case arising in the sequel.

16.3 Example An Ar-bounded, zero-mean adapted sequence is an Q-mixingaleofsize - if either r > 2 and the sequence is a-mixing of size

-r/(r-

2), or r k 2and it is (-mixingof size

-r/2(r

- 1). u

16.4 Example Consider for any j k 0 the adapted zero-mean sequence

fXtXt+.i- c,,?+./,5twjl ,

where ojt-vj= Extxt-, and (A) is defined as in 16.3. By 14.1 this is a-mixing(p-mixing) of the same size as Xt for finite j, and is fv/z-bounded, since

Il.X)-)+./11rn :< IIAIIrII.X)+./Ilr.by the Cauchy-schwal'tz inequality. Assuming r > 2 and applying 14.2, this is anfal-mixingale of size

-1

in the a-mixing case. To get this result under g-mixingalso requires a size of

-r/(r-

2), by 14.4, but such a sequence is also a-mixingof size

-r/(r-

2) so there is no separate result for the (-m''ng

case. I:a

Mixingales generalize naturally from sequences to arrays.

16.5 Definitipn The integrable array ((Xnt,@nfJ'7=-x)';=1is an L-mixingale

if,Jfor p k 1, there exists an anuy of non-negative constants (c,:,)-x, and a non-negative sequence ((,,,)7such that (,,,--+ 0 as m ..-A x, and

IlFtAkrl@n,,-,,,)lIp< c,,,(,?, (16.6)IlAk - A'(Akrl5n,t+m411,f cn,(,u+1 (16.7)

hold for all f, n, and m 0. (:1

The other details of the definition are as in 16.1. A1l the relevant results formixingales can be proved for either the sequence or the array case, and the proofsgenerally differ by no more than the inclusion or exclusion of the extra sub-script. Unless the changes are more fundamental than this, we generally discussthe sequence case, and leave the details of the array case to the reader.

One word of caution. This is a low-level property adapted to the easy proof ofconvergence theorems, but it is not a useful constnlct at the level of time-seriesmodelling. Although examples such as 16.4 can be exhibited, the mixingale prop-erty is not generally preserved under transfonnations, in the manner of 14.1 forexample. Mixingales have to little structure to permit results of that sort. Themixingale concept is mainly useful in conjunction with either mixing assumptions,or approximation results of the kind to be studied in Chapter 17. There we willfind that the mxingale property holds for processes for which quite generalresults on transformations are available.

16.2 Telescoping Sum Representations

Mixingale theory is useful mainly because of an ingenious approximation method.A sum of mixingales is nearly' a martingaleprocess, involving a remainder which

Page 270: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

250 Theory of Stochastic Processes

canbeneglectedasymptoticallyundervriousassumptionslimiungthedependence.For the sake of brevity, 1et Es stand for Ext ITx). Then note the simpleidentity, for any integrable random variable Xt and any m 1,

Xt = X (f t+kxt - Etn-bxtj + Et-m-lxt + Xt - E?+p1X).W-m

(16.8)

Verify that each tenn on the right-hand side of (16.8)appears twice with oppositesigns, except for Xt. For any k, the sequence

fEtwkxt- F,+1-1X,, 5t+k17=1is a mmingale difference, since Et-vk-kEt.vkxt- Et+k-3Xt)= 0 by the LIE. Whenl.&,S) is a mixingale, the remainder terms can be made negligible by taking mlarge enough. Observe that fEt+nXt,S+,,,I';=-xis a mmingale, and sincesupmfl Et.vnv'tkS FI.&l< x by 10.14, it converges a.s. both as m

--y

c.a and as m--y

-x, by 15.7 and 15.9, respectively. In view of the fact that 11Fr-,,j.X)IIJ, .-+

0 and)1aYf- Et+mxtIIp-- 0, the respective a.s. limits must be 0 and Xt, and hence we areable to assert that

X

Xt = (F,+kX,- Et+k-kxtlh,a.s.P=-x

(16.9)

Letting Sn = :=1Xs we similarly have the decompositionm l n

Sn = X Fn + XEt-m-jxt + X Xt - Et+mxt)k=-m >1 r=1

where

(16.10)

lu = ';--Iekxt - F,+,-1A),

,.-1(16.11)

and the processes (Fns a+k) are martingales for each k. By taking m largeenough, for fixed n, the remainders can again be made as small as desired. Theavantage of this approach is that martingale properties can be exploited instudying the convergence characteristics of sequences of the type Sn. Results ofthis type are elaborated in j16.3 and j16.4.

If the sequence (A1 is stational'y, the constants lcfl can be set to 1 with noloss of generality. ln this case, a modified form of telescoping sum actuallyyields a representation of a partial sum of mixingales as a single martingaleprocess, plus a remainder whose behaviour can be suitably controlled by limitingthe dependence.

16.6 Theorem (afterHall and Heyde 1980: th. 5.4) Let (Xt,S)be a stationaryAl-mixingale of size

-1.

There exists the decomposition

Xt = B + Zt - Zt+I, (16.12)where FlZjI < x and (W$,T,) is a stationary m.d. sequence. (:1

Page 271: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixingales

Sn = Fn+ Z1 - Zn+1

(Fn,gnl is a mmingale.

Proof Start with the identify

Xt = F,,s + Zmt- zr,,,f+1,where, for m k 1,

251

(16.13)

(16.14)

m

lp,,,f= X Ext..s - Et-3Xt+s4+ EtXt+m+L +A-m-1 - Et-tXt-m-3J=-m

(16.15)

m

Zmt = %-Et-$Xt+s -A-,-1 +Et-3Xt-s-t4. (16.16).=o

As in (16.8),every term appears twice with different sign in (16.1 ), except forXt. Consider the limiting cases of these random variables as m

-->

x, to bedesignated J'Pfand Zt respectively. By stationarity,

E Iff-lA+sl = FIEt-s-txt Iand

Fl Xt-s-j - Et-jxt-s-k 1= E IXt - Fm-Y,l ;

hence, applying the triangle inequality,to X

E Iz,l S TE IEt-s-txt I + 77E IXt - Etnxt IM r.m

*

< 2X(,< =.

M(16.17)

Writing Jpr = Xt - Zt + Zf+1,note that

E IJFfI S FI-Y,I+ 2F Iz,I < =, (16.18)

and it remains to show that J'P'fis a m.d. sequence. Applying 10.26(i) to (16.15),

Et-jb'mt = Et-3Xt+m+3a.s., (16.19)

and stationarity and (16.1) imply that

FIEt-$Xt+m+3l = FIEt-m-zxt I --A 0 (16.20)as m

.--h

x, so that FI&-1B%l -->

0 also. Anticipating a result from the theory ofstochastic convergence (18.6),this means that every subsequence tr?h,k e IN1contains a further subsequence fmkq),j e IN) such that Ie-lW,,,w),rl --> 0 a.s.as j

.-+

x. Since Wmkqj,t-->

Jpf for every such subsequence, it is possible toconclude that A'(WfITf-1) = 0 a.s. This completes the proof. w

Page 272: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

252

The technical argument in the final paragraph of this proof can be better appre-ciated after studying Chapter 18. It is neither possible nor necessary in thisapproach to assert that F(G,,vl @r-l) --> 0 a.s.

Note how taking conditional expectations of (16.12) yields

F(ml 72-1) = Zt - 4+1 a.s. (16.21)It follows that <, is almost surely equal to the centred r.v. Xt - Ext IF,-l).

Theory of Stochastic Processes

16.7 Example Consider the linear process from 16.2, with (Utj a stationary inte-grable sequence. Then Xt is stationary, and

X

Fl.Yl l f FI!71I77I%I < *.

j=-oo

If the coefficients satisfy a stronger summability condition, i.e.00 X X X

77 X((%(+ 10-/1)= 77-10-1+ X?nlo-,?,l< x, (16.22)rn=1 j=m za=l ?l=1

then Xt is an Al-mixingale of size-1.

By a rearrangement of terms we obtain thedecomposition of (16.12)with

X

B,, = 77uj &,jcz-x

(16.23)

andoo o x

Zt = X X 0y Ut-t,t - X 0-,. Utwm-q,

zrl=l j=m j=m(16.24)

where E'IZ,I < cxa by (16.22).u

16.3 Maximal Inequalities

As with martingales, maximal inequalities are central to applications of themixingale concept in limit theory. The basic idea of these results is to extendDoob's inequality (15.15)by exploiting the representation as a telescoping sum ofmartingale differences. MacLeish's idea is to 1et m go to x in (16.10),andaccordingly write

X

Sn = Fnz, a.s.#=-x

(16.25)

16.8 Lemma Suppose (xalyhas the representation in (16.25).Let tJklX-x be asummable collection of non-negative real numbers, with ak = 0 if l'ak = 0 a.s., andak > 0 otherwise. For any p > 1,

# x p-1

# u # 77a >-cf-1Fl y zl#. (16.26)E max IsjI k n- 1Lsjs n b=-x ak>o

Page 273: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixingales 253

Proof For a real sequence taloJx and positive real sequence llkloo-x,1et K =

X7=-xlkand note thatOQ p x p x

X xk = A'' X xklaaklp K A'''-' X tz,'-#I.xkI#,P=-x k=-x #=-x

(16.27)

where the weights aklK sum to unity, and the inequality follows by the convexityof the power transformation (Jensen's inequality). Clearly, (16.27)remains trueif the terms corresponding to zero xk are omitted from the sums, and for these

cases set ak = 0 without loss of generality. Put xk = Fn, take the max over 1 S j< n, and then take expectations, to give

j x p- Ij .gy

max jyg jp .E max Isjbl' < 77ak y-')aktt.--- akg lsjsnL<j<n

To get (16.26),apply Doob's inequality on the right-hand side. w

This lemma yields the key step in the proof of the next theorem, a maximal inequ-ality for Q-mixingales. This may not appear a very appealing result at firstsight, but of course the interesting applications arise by judiciouschoice of the

sequence lJkl.

16.9Theorem (Macleish 1975a: th. 1.6) Let (Xf,J;2l=-xbe an fw-mixingale, 1etSn

= E7.1-Y,, and let ltul7 be any summable sequence of positive reals. Thenco co n

E maX f f 8 F7Jk ($2+(2j)J 1 + 2F-4(2gu-,1- uk.1j) N c2j

. (16.29)ls/fn t-.=l k=l >1

Proof To get a doubly intinite sequence ttzkl=-x,put a-k = ak for k > 0. Then,applying 16.8 for the case p = 2,

Cr oo

E max SJ K 4 X ak X tz-k1F(L2J. (16.30)1<)<n l=-x V-x

Since the termsmaking up lk are mmingale differences and pairwise uncorre-lated, we have

F(Fn2k)= %)EEt+kXt -Et+k-jXt)l. (16.31),.-1

NOW, Ek-vkxtEt-vk-kxt) = EEk-tk-vkxtk-vk-titqj = E(Elt+k-jXt)by the LIE, from

whichit follows that

E Et-vkxt- Et+k-lxtL$l= E(Elt+kXt- Elt+k-;Xt). (16.32)Also let ztk= x,- Et-vkxpand it is similarly easy to verify that

EEt+kXt - Et+k-LXt41= F(z,,1-1 = z,1)2

=Ezlk-j -z2,J. (16.33)1,

Page 274: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

254

Now apply Abel's partial summation formula (2.25),to get

Theory of Stochastic Processes

<x) n <x)

X c-k1A'(F2aJ= X X a-kE(Et+kXt - Etwk-jXtlla-x >1 k=-x

#j * <>3

= X X 6l-kE(Elt-kXt- Elt-k-jxt) + Xt7-:1F(z2,,:.j - z2f:),-1 t.-o k=l

n <x)

-1#(F2X ) + TECEI X)(J-k1-

tz-k-lj)= )q &0 t t t-k t>1 k=1

X

-3Ez1 ) + y7F(z2J(u1 j- u-11) (16.g4)+ J1 10 t

al

where the second equality follows by substituting (16.32)for the cases k f 0, and(16.33) for the cases k > 0. (16.29)now follows, noting from (16.1) that

2 X ) < c2(2 and from (16.2)that EZlk) S c2(2 xE(Et-k , t k , t k+1 .

Ppttingx x

-1((2+ (2)+ 2y-q(2:(t7-:1- u-jy.1jl,K = 8 Tak tztl () 1

t'af k=1(16.35)

this result poses the question, whether there exists a summable sequence ftzklsuch that K < x. There is no loss of generality in letting the sequence ((:); bemonotone. If (,u= 0 for m < x, then (v,+y= 0 for a1lj > 0, and in this case one

2 2 jtemmay choose ak = 1, k = 0,...,- + 1, and K reduces to m + 1)(% + (1).Anatively, consider the case where (k> 0 for evel'y k. If we put at = ((),and thendefine the recursion

(z z a j/2.(j

,ak=

ztu-jg((1+4Jk-1)

(16.36)

ak is real and positive if this is true of tu-1 and the relation-j

-1 -2

ak-c:-1

= (1 ak

- l 2 2is satisfied for each k. Since aj ((c+ (1)f 2J(), we have

(16.37)

x x x 2K = 8 Xtzk J-o1((+ (2j)+2Xtu< 16 Xak .

1=0 k=1 k=o(16.38)

ln this case, for k > 0 we find-2

-1 -1 -1 -2 -2

(, = ak -z-1)Jk f ak -J1-l (16.39)

so that

Page 275: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixingales 255

m m

X(-,2G (0-2+ X ak-l -

,-z-21 ) = am-l.k=o =1

(16.40)

Substituting into (16.38),we get

K f 16 X*(X*(- 2j

- ' /2 2

(16.41)

This result links the maximal inequality directly with the issue of the summa-bility of the mixingale coefficients. In particular, we have the followingcorollary.

16.10 Corollary Let (.X),S)be an Q-mixingaleof size -. Then

E maxu/ K #Xd, (16.42)J1f.jf n >1

where K < x.

-1/2--2

Proof If (1 = Ok ) for > 0, as the theorem imposes, then Z'l=1(k =

2+2:b 2 27 and ( 1(-k2)-1/2 tp(--1-8 d hence is summable Over m.Om ) y . . q == ) an

The theorem follows by (16.41).wHowever, it should be noted that the condition

x m .j/p,

X 77(-,2 < c.o (16.43)

-1/2- jj ( = qk+ g)-1/2 jog k + 2)-1-:is weaker than (k= Ok ). Consider t e case k (1/2+ g () ejxenfor E > 0, so that k (z -- x for evel'y > .

-2 k..v2)(log k.- 2)2+2e< (??;+2)2(log -+2)2+26X( = X( , (16.44)

and (16.43)follows by 2.31. One may therefore prefer to define the notion ofisize =

-'

in tenns of the summability condition (16.43),rather than by ordersof magnitude in m. However, in apractical context assigning an order of magnitudto (,,,is a convenient way to bound the dependence, and we shall find in the sequelthat these summability arguments are greatly simplified when the order-of-magnitude calculus can be routinely applied.

Theorem 16.9 has noobvious generalizationgomtheo-mixingalecasetogeneralLp for p > 1, as in 15.15, because (16.31)hinges on the uncorrelatedness of theterms. But because second moments may not exist in the cases under consideration,

a comparable result for 1 < p < 2 would be valuable. This is attainable by aslightly different approach, although at the cost of raising the mixingale sizefrom

..ul

to-1;

in other words, the mixingale numbers will need to be summable.

16.11 Theorem Let tXr,?hl'l..,obe an fv-mixingale, 1 <p < 2, of size-1,

and 1et Sn

= Z(=1-V;then

Page 276: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

256

tx N

p p P F PE max lsjI s 4 c, j y7(z 77d,

1sjn t-.al >1

where Cp is a positive constant.

Theory of Stochastic Processes

(16.45)

Proof Let F,,kbe defined as in (16.11), and apply Burkholder's inequality (15.18)and then Love's cr inequality with r = pI2 e (, 1) to obtain

n p/2

FIFazlr f CpE X Et-vkxt- &+k-1X,)2>1

n

S CAXFI(Et+kXt- &+1-1.41F.

>1(16.46)

Now we have the mixingale inequalities,

lIF,+zX,- .&,+z--1.Y,I1??f IIF,+1AIl,+lI.E',+z-1-'tlI,< 2c,(z

fork < 0 and

(16.47)

11F,+kA- A',+,-1AlI,= I4z,,z-1- z'kll,

< IIz,,k-111p+Iz,k.II,S 2c,(1 (16.48)

for k > 0, where Ztk is defined above (16.33).Hence,n

e'Irnzl' s 2,cp(f Ec>1

(16.49)

(put (0= 1), and substitution in (16.26),with (t7z)7'a positive sequence and -ak

= ak, gives

p cx> p-j x n

p y+1c # Efz y'-qtzj-ptjNc;. (16.50)E mx IS)I < ?, k- 1

lijin t..o t-azo ,...1

1-# b b1efor p > 1 only in the case (z= Oa, and theBoth ak and ak (( can e summaconclusion follows. .

A case of special importance is the linear process of 16.2. Here we can special-ize 16.11 as follows:

16.12 Corollary For 1 < p S 2,

(i) if Xt = Z7=-oo%Ut-),thenp x p

# S C P IOeI+ 77(I01( + l0-zI) n sup F# (7,1#.,E mx IsjI p jLsjsn al s

(ii) if Xt = Z7zejh-p then

o np p

p s c P >'')I0zl 77sup FI&,IP.E max I5'./i , !lsjs n k.'.'zo >1 st

Page 277: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Mixingales l57

Proof In this case, Et-kxt - Et-k-kxt = kut-k. Letting tJkl7 be any non-negative

constant sequence and a-k = ak,

7-7JI-PFI1u1, < CvTakz-pI0:I#7'7cf, (16.51)ap* JS/ >1

wherect = supxll&x11pin case (i),and c, = supsxtll rxllpin case (ii).choosingak

= I0kIand substituting in (16.26)yields the results. .

Recall that the mixingale coefficients in this case are (,, = T)=m( I0./1+ I0-./1), solinearity yields a dramatic relaxation of the conditions for the inequalities tobe satisfied. Absolute summability of the % is sufficient. This correspondssimply to (,,,--) 0. A mixingale size of zero suffices. Moreover, there is noseparate result for faa-bounded linear processes. Putting # = 2 yields a resulithat is correspondingly superior in terms of mixingale size restrictions to 16.11.

16.4 Uniform Square-integrability

One of the most important of McLeish's mixingale theorems is a further conse-quence of 16.9. lt is not a maximal inequality, but belongs to the same family ofresults and has a related application. The question at issue is the uniform inte-grability of the sequence of squared partial sums.

16.13 Theorem (fromMacLeish 1975b: lemma6.5,' 1977: lemma3.5) Let taYsTfl1 S = l 1m, and :,2

= 7=!clwhere ct is detined inbe an Q-mixingaleof size -z,

n . nl l oo iformly integrable, then so is the(16.1)M16.2).lf the sequence (Xtlct ) ,=1 is un

sequence(maxlxjspl/vz,, jooawl.

Proof A preliminary step is to decompose Xt into three components. Choose posi-B 1 d thentive numbers B and m (to be specified below), 1et 1 t = ( jx,Issc,), an

define

Ut = Xt - Etwmxt+ Et-mxt (16.52)B E a lf (1653)Ft = Et-vmxt1 t

-

t-m t t .

B 1 - 1#) (16.54)Zt = Et-mxt 1 - 1 t ) - Et-mxt t ,

such that Xt = Ut + Ff+ Zt. This decomposition allows us to exploit the fotlowingcollection of properties. (To verify these, use various results from Chgpter 10 onconditional expectations, and consider yhe cases k 2 m and k < m separately.)First,

2 1 2 2 jy 55;E Et-kut = E Et-wmjvk S ctLkvm, ( .

a a a a jy 56;Eut - Et-vkut) = Ext - Et-vkwmlx f c/((t,v??;)+1, ( .

for k k 0, where kv m = maxfkml. Second,

2 F = EEl )# 1* - El X 1#)EE t-k , t-ksm , , t-m / , , (16.57)

Page 278: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

258 Theory of Stochtic Processes

2 = F(F2 X 1*-F2 )# 1W) (16.58)E - Et+kYt) /+?,, t f faxkz.?n t , ,

whereko. m=mintk,rrll. Theterms areboth zero if kkm and are otherwise bounded

by Exlt 1P,) ; Bct. Third,

F# z = EEl :.)m4 1 - 1() - e'lt-ttvj - lt )), (16.59)t-k t t-

E(z - F,+zz,)2 = EEl ,aftl- 1t) - A+(k=)m(1 - 11/)), (16.60)t t +

where the tenns are zero for k 2 m and bounded by f(#l(1 -1Pf)) otherwise. Note

that F(AQf(1- 11))/U= ElxtlctjlLf jxycgjlsl) -.-.: 0 as B

-->

x uniformly in /, bythe assumption of uniform integrabillty.

The inequality

(x'x,)2

s 3g(x?s,) 2+ (x?,-,) 2

+ (x'z,)2j

>1 >1 >1 >1

(16.61)

for 1 S j S n follows from substituting Xt = Ut + F, + Zt, multiplying out, andapplying the Cauchy-schwartz inequality. For brevity. write

?/v2xj = S) n,

= (S 1Ullgnl,uj r=

='

1YttllMnl,

A'./ ?=

= (Y lZf)2/v2.zj fn s

Then (16.61)is equivalent to xj f 3(w+yj +p), for each j = 1,...,n. Also 1etkn = maxlssaay, and define n, y'k, and similarly; then clearly,

.f 3n + yh + k). (16.62)

For any r.v. Xk 0 and constant M > 0, introduce the notation &>(A') = F(1(x>>).X),

so that the object of the proof is to show that supn&ion) -.-/ 0 as M-->

x. As aconsequenceof (16.62)and 9.29,

'k f 3&v/3( + yh + k)s 6eqn) + &wK@-n) +En)). (16.63)

We now show that for any er> 0, each of the expectations on the right-hand side of(16.63) can be bounded by e by choosing M laree enough. First consider &;n);=L1/2- j j: nto thisgiveia(16.55)and (16.56),and assuming (,a= Om ),we can app y .

-1-8 f k S m and ak = k-1-9 for k > m. Applying (16.29)case, setting ak = m or ,

ith ' U substituted for S) in that expression producesw >1 t

E( ) f 8 (r?z+1)/1-1-8 + X k-1-0 (2m1++2X (2,/n m&-+1 1=>a+1

-= O(m ), (16.64)

Page 279: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

259

where the order of magnitude in m follows from 2.27(iii). Evidently we can choosem large enough that ELn) < :. Henceforth, let m be fixed at this value.

A similar argument is applied to E(L4,but in view of (16.59)and (16.60)we2may choose ak = 1, k = 0,.. y.m, and ak = 0 otherwise. Write, fonnally, EEt-kzt <

2 2 d Ezt - Etnztjl < ct(2:, wheret7?(kan

Mixingales

maxlu,u,,F((X,/c,)21( Ixgc,I>s)), k < m,2( =

k k m,(16.65)

and then application of (16.29)leads to

ELn)f 16(- + 1) max Elxtlctll 1(1xgoI>s)).1< t < n

(16.66)

This term goes to zero as B -->

x, so let B be fixed at a value large enough thatEqnj < E.

For the remaining term, notice that l'f = Z'l'=-,u+1(xwhere

B y jl (j6 6y)k,1= Et-vkxtf ,- Et+k-3 , ,. .

For each k, (efl,@f+:)is a m.d. sequence. If 16.8 is applied for the case p = 4and ak = 1 for IFcl < m, 0 otherwise, we obtain (notforgetting that for yj > 0,

2 = maxytyy?J)(maxyyy)j 4 4 m

..2 1 1 4 ? ),4Eyn, - wf max 77r, < w v (2-+ 1) 77E nk), (16.68)v

'n

l sj s n r=1 V'n -'

kzz-m

where Fn1 = Z:=1(a.Now, given l',,z = Fa-j,1+ (,,k,we have the recursion

4) = F(F4 j k) + 4F(F3.) yjup + 6F(F2.j 2upEYnk n- , n , n ,

4F(F -1 zt3,u)+ F(j4uJ.+ n , (16.69)

The (fkare bounded absolutely by lBct; hence consider the terms on the right-hand

side of (16.69).The second one vanishes, by the m.d. property. For the third one,

we have

2 2 2 2 4 2 2f'(l%-l,t4n2f E-n-,lBcn) f (2f) Vn-lcn, (16.70)and for the fourth one, note that by the Cauchy-schwartz inequality,

3 j k (2#)4v-1c3u.

(16.71)FIL-l,klnk n

Making these substitutions into (16.69)and solving the implied inequalityrecursivelyyields

n & N

Elvkl f (2#)46Xvl-1U+ 4Xv,-1c)+ Xdn

>1 >2 >1

4 4f 11(2#) vn. (16.72)

Page 280: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

260

& < EX2) forPlugging this bound into (16.68),and applying the inequality a u(m

.Y.k 0 and a > 0, yields tinally

Theory of Stochastic Processes

a 6 .2

&Ml6(.yn) f uE- (.yn) f j 4642-+ 1)411(2s)4(s s,

. (16.73)

By choice of M, this quantity can be made smaller than e.Thus, according to (16.63)we have shown that &AQ)< 18E for large enough M,

or, equivalently,

b-)-->

0 as M --)

x. (16.74)By assumption, the foregoing argument applies uniformly in n, so the proof iscomplete. w

The array version of this result, which is effectively identical, is quoted forthe record.

16.14Corollary Let (X ,,@nrJbe an fu-mixingale array of size -, and let Sn =

2'1

. 2 yc2 j isX1=jXnt and vn = E'Jzzlcnt,where cnt is given by (16.6)V16.7),if fAkf n,

uniformlyintegrable, (maxlsyunial/vnz1-=a.!is uniformly integrable.

Proof As for 16.13, after inserting the subscript n as required. .

Page 281: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

17Near-Epoch Dependence

17.1 Definitions and Examples

As noted in j14.3, the mixing concept has a serious drawbackfrom the viewpoint ofapplications in time-series modelling, in that a function of a mixing sequence(even an independent sequence) that depends on an intinite number of lags and/orleads of the sequence is not generally mixing. Let

X = #f(...,16-l,1$,'P)+1,...), (17.1)

where Js is a vector of mixing processes. The idea to be developed in tllis chapteris that although Xt may not be mixing, if it depends almost entirely on the tnear

epoch' of fF,) it will often have properties permitting the application of limittheorems, of which the mixingale property is the most important.

This idea goes back to lbragimov (1962),and had been formalized in differentways by Billingsley (1968),McLeish (1975a),Bierens (1983),Gallant and White(1988), Andrews (1988),and Ptscher and Prucha (1991a),among others. Thefollowing definitions encompass and extend most existing ones. Consider first adefinition for sequences.

17.1 Dennition For a stochastic sequence lFfl+-Z,possibly vector-valued, on aprobability space (f1,@,#),let lj+-: = c(J$-,j,...,J$+,,,), such that (8??f+-'''''');=mis anincreasing sequence of c-tields. lf, for p > 0, a sequence of integrable r.v.slA lt7 satisfies

IIX,- F(.X)I?7-'''''')II,,S vp,, (17.2)' where vm-- 0, and flrl+-2is a sequence of positive constants, Xt will be said tobe near-epoch dependent in fv-/ztp?'?zl(JV-NED) on (FfJ+-7.I:lMany results in this literature are proved for the case p = 2 (Gallant and White,1988,for example) and the term near-epoch dependence, withoutqualitkation, maybe used in tlzis case. As for mixingales, thre is an extension to the array case.

17.2 Definition For a stochastic array f(Fnfl17-x)';=1,possibly vector-valued, ona probability space (f1,F,#), let snt-mt-,,, = c(Fn,,-,,,,...,F,,,r+,,,).lf an integrable

arraytt.krll---xl7-l,satisfies

Ilxn'- F(-Yn,lT,$+7-,,,)11,K dntvm, (17.3)

A'

where Mm--> 0, and fdritt is an array of positive constants, it is said to be

fV-NED bn (Fnr) . u

Page 282: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

262

We discuss the sequence case below with the extensions to the array case beingeasily supplied when needed. The size terminology wlch has been defined foimixing processes and mixingales is also applicable here. We will say that the

sequence or array is LP-NED of size -(p0 when Mm = O(m-% for (p > (pf).

According to the Minkowski and conditional modulus inequalities,

117,- e-'L117'''''')11,S 11*-,- wll,+ Il.E'(-'t- w!/7:)11,

<2IIX,- 81,,11,, (17.4)

Theory of Stochastic Processes

wherew = EXt). The role of the sequence (4) in (17.2)is usually to accountforthe possibility of tending moments, and when 11.1,- g,,ll,is uniformly bounded,

we should expect to set dt equal to a finite constant for all f. However, adrawbackwith the definition is that (l can always be chosen in such a way that

11-Y,- Ext IF,'+-:)11,inf = 0,

dtt

for every m, so that the near-epoch dependence property can break down in thelimitwithout violating (17.2).lndeed, (17.2)might not hold except with such achoiceof constants. In many applications this would represent an undesirableweakepingof the condition, which can be avoided by imposing the requirementdrs2jlm- gll??, or for the array case, dnt S 2II-L,-p.rllp. Under tls restriction we

can set v,a < 1 with no loss of generality.Near-epoch dependence is not an alternative to a mixing assumption', it is a

propertyof the mapping from (FfJto (,&J,not of the random variables themselves.The concept acquires importance when (Ffl is a mixing process, because then (m)inherits certain useful characteristics. Note that f;txfl 1j+-:) is a finite-lag,Aorl/s-measurablefunction of amixing process and hence is also mixing, by 14.1.Near-epoch dependence implies that (.') is tapproximately' mixing in the sense ofbeingwellapproximatedby amixingprocess. Andas weshowbelow, anear-epoch-dependentfunction of a mixing process, subject to suitable restrictions on the

moments,can be a mixingale, so that the various inequalities of j16.2 can beexploitedin this case.

From the point of view of applications, near-epoch dependence captures nicelythecharacteristics of a stable dynamic econometric model in which a dependentvariableXt depends mainly on the recent histories of a collection of explanatoryvariablesor shock processes Fs which might be assumed to be mixing. Thesymmetricdependence on past and future embodied in the definition of a G-NEDfunctionhas no obvious relevance to this case, but it is at worst a harmlessgeneralization. ln fact, such cases do arise in various practical contexts, suchas the application of two-sided seasonal adjustment procedures, or similarsmoothing filters; since most published seasonally adjusted time series are theoutput of a two-sided tilter, none of these variables is strictly measurable with-

out reference to future events.

17.3 Example Let (Jl+-7 be a zero-meqn, As-bounded scalar sequence. and define

Page 283: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Sp/ch Dependence 263

xt = X ejvt-j.jc-x

Then, by the Minkowski inequality,

111,- Ext I8/',+-''',?,)Il,= 77 jvt-j - e,+-,?,'''Ff-./l+ e-jvt-v.i- e7,,,':'14+./))j=m+3 P

f dtvm, (17.6)

(17.5)

where v,u = 17,.+1(I%I+ I0-jI), and dt = 2sup,I1Jillp, all /. Clearly, v,,z-->

0 ifthe sequence (%) is absolutely summable, and v,,iis of size -(p0 if I0./1+ l -j 1=

/4-1-:) for (? > (p. ln the one-sided case with oy= 0 foj < 0, we may put #f =

supysfll7,11p,which may be an increasing function of t; compare 16.2. In

The secondexample, suggestedby GallantandWhite (1988),illustrates hownear-epoch dependence generalizes to a wide class of 1agfunctions subject to a dynamicstability condition, analogous to the summability condition in the linear example.

17.4 Example Let t%'t)be a Zp-bounded stochastic sequence for p 2 and 1et asequence (m) be generated by the nonlinear difference equation

Xt = hvt, Xf-1), (17.7)where ((.,.) l is a sequence of differentiable functions satisfying

:/)(r,Xsup < b < 1.a.xFA

(17.8)

As a function of .x, h is called a contraction mapping. Abstracting from thestochastic aspect of the problem, write vl as the dummy first argument of ft. Byrepeated substitution, we have

h = J,(v,,,-1(v,-1.,-2(v,-2,...)))

=',(vf,v,-1,vf-2,...), (17.9)

and, by the chain rule for differentiation of composite functions,

gt'

-j h-jf b .

vt-j vt-j(17.10)

Detine a Fj-m/f-measurable approximation to gt by replacing the arguments with1ag exceeding m by zeros:

p';'(v,,...,v,-,,,)= :,(14,...,v,-r,,,0,0,...). (17.11)By a Taylor expansion about 0 with respect to vt-j for j > m,

cw+

,:18t -

# t= vt-j,

vt-g.l=m.z.l

(17.12)

Page 284: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

264

where * denotes evaluation of the derivatives at points in the intervals (0,vr-j1.Now define the stochastic sequence (X,) by evaluating gt at (1$,J6-1,...). Note

thatIIX,- F(X,I1',+-:)112f IIX,- #'J'(7,,...,F'-,,,)II2 (17.13)

Theory of Stochastic Processes

by 10.12. The Minkowski inequality, (17.12),and thep (17.10) further imply thatX

lIA7- #:'IIa= 77Gt-.ivt-jl.j=-+1

< jlGt-jvt-j j)2jcm-

X

< b.i- llF/-jvt-./llcjzm-

1 ms b sup IlF,-./vf-./lia,1-#

j>m(17.14)

*

where Gt-j is the random variable detined by evaluating Llvt-pgtj with

respect to the random point vt-j,vt-j-L ,...), and Ft-j bears the correspondingrelationship with lvt-pft-j. Xt is therefore fU-NED of size -=, with constants

tf <: supssf lIF,F,Ia,if this nonn exists. In particular, Hlder's inequalityallows us to make this derivation whenever 117,lI2rand (IF,IIar/r-1)exist for r > 1,and also if 1114112< x and Ft is a.s. bounded. n

17.2 Near-Epoch Dependence and Mixingales

The usefulness of the near-epoch depencence concept is due largely to the nexttheorem.

17.5 Theorem Let (X,)*-x be an fv-bounded zero-mean sequence, for r > 1.(i) faet (Ff) be a-mixing of size -a. lf Xt is AP-NED of size

-.b

on (F1) for 1 S p< r with constants fdtl ,

fXl,T-Jx) is an fv-mixingale of size-min

(>,aslp - 1/r)) with constants ct t< maxl 11Xfllr,#f) .

(ii) Let ( $) be (hmixing, of size -J. If Xt is fV-NED of size-.b

on (Fr) for 1 S

p f r with constants (J, (X,,F-'x) is an fw-mixingale of size.-lnint//,

J(1 - 1/r)), with constants ct <( maxf llXfIlr,4).Proof For brevity we write Est.) = F(. lF,J) where %,'= c(Fx,...,F). Also, for m K1, let k = (-/2), the largest integer not exceeding ml2. By the Minkowskiinequality,

Ile'-foo-mxt ljp< jlJ2x-''t.Yf- Fj+-),V)Ilp+ IIF-'x-mEtkkxtj11,, (17.15)

and we bound each of the right-hand-side terms. First,

jlE-t..-'''(.4- Etkkxt) 11pK tL'-t c: ''' )Xt - Frf+-l.&IP)11/P

Page 285: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Fp/c DepeWence 265

II-Y,- VtkkAllp

< dtvk, (17.16)

using the conditional Jensen inequality and law of iterated expectations. Second,E7PX is a tinite-lag measurable function of F,-1,...,Ff+s and hence mixing of-.k tthe same size as (FfJ for finite k. Hence for part (i), we have, from 14.2,

11F-'--'''(F71k-Y,)1I,< 6al/P-1/rIIA:+-lA-,IIr < 6al/P-l/rjjxf 11r. (17.17)combining(17.16) and (17.17)into (17.15) yields

llF-'ol'''XIlpf maxt II-Y,IIr,4l(,,,, (17.18)1/#-1/r Also applying 10.28 giveswhere(,u= 6a1 + vk. ,

111,- F-lx+'''-Yfllp< 2 lI-,- Fl+-'''''W,IIp< ldts'm K 2(,,,. (17.19)

since (,,,is of size-minth,

atlp - 1/r) ), part (i)of the theorem holds with ct <(

maxflI#,1lr,l.The proof of pal't (ii)is identical except that in place of (17.17)we must substitute, by 14.4,

11E-tqn-mEftkkxI1p< 2911-l''rjj

A'j+-lkxfljr < 2911-l/rjjx, jj,. . (17.20)

Let us also state the following corollary for future reference.

17.6 Corollary Let ffxnrl+f'==-xl';xl

be an fv-bounded zero-mean array, r > 1.(i) If Xnt is AP-NED of size

-.b

for 1 f p < r with constants (t&,)on an an-ayf Fnrl which is a-mixing of size -J, then tXpv,Fj,-xl is an fw-mixingale ofsize

-minty,

alqlp - 1/r) J, with respect to constants cnt (< maxt IlAk,Ilr,t&,l.

(ii) If Xnt i AP-NED of size-.b

for 1 f p < r with constants (J,SJon an array(FSCJwhich is t-mixingof size -J, then (-Lr,ij,-xl is an fv-mixingale ofsize

-minlh,

J(1 - 1/r)) , with respect to constants cnt < maxl llAkfIIr,&,l.

Proof Immediate on inserting n before the t subscript wherever required in thelast proof. w

The replacement of Ff by F,v and ;sl by 5ntsis basically a formality here, since

none of our applications will make use of it. The role of the array notation willalways be to indicate a transfnnation by a function of sample size, typically thenormalization of the partial sums to zero mean and unit variance, and in thesecases 5js = @fyfor al1 a,.

Reconsider the AR process of 14.7. As a special case of 17.3, it is clear thatin that example Xt is CP-NED of size -x

on Zt, an independent process, and henceis a wmixingale of size -x, for every p > 0. There is no need to imposesmoothness assumptions on the marginal distributions to obtain these properties,which will usually be a11we need to apply limit theoryto the process.

These results allow us to tine-tuneassumptions on the rates of mixing and near-epoch dependence to ensure specitic low-levelproperties whichare needed toproveconvergence theorems. Among the most important of these is summability of the

sequences of autocovariances. If for example we have a sum of terms, Sn = Z:=1Xr,

Page 286: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

266

we should often like to know at what rate the variance of this sum grows with n.Assuming Extj = 0 with no loss of generality,

n n- 1 n-t

F() = XA'(Ff)+ 2X TElxtupwml,

>1 >1 m=1

2a sum of n terms. If the sequence (zY,l is uncorrelated only the n variances2 s jappear and. assuming unifonnly bounded moments, Esn) = On). or genera

dependent processes, summability of the sequences of autocovariances (EXtXt-j),j e N) implies that on a global scale the sequence behaves like an uncorrelated

2 osequence, in the sense that, again, E(Sn) = (n).For future reference, here is the basic result on absolute summability. To

fulfil subsequent requirements, this incorporates two easy generalizations. Firstwe consider a pair Xt,Yt), which effectively permit.s a generalization to randomvectors since any element of an autocovariance matrix can be considered. To dealwith the leading case discussed above we simply set Ff = Xt. Second, we frame theresultinsuchawayas toaccommodateendingmoments, imposingfv-boundednessbut not uniform fv-boundedness. It is also noted that, like previous results, thisextends trivially to the array case.

Theory of Stochastic Processes

17.7 Theorem Let fXf,)',) be a pair of sequences, each &&r-l)-NED of size-1,

ith respect to constants fdxtvd't) for r > 2, on either (i)an a-mixing process ofW

size-s(r-2),

or (ii) a g-mixingprocess of size-rlr-

1), where t,v, << IIx,II,

and(r,

<< 11r,11,.vlaenthe sequences

IExtYt+msIm e (N

X J' '

l r t+m r(17.21)

are summable for each t. Also, if arrays lAk,l%,l are similarly f,s(r-1)-NED ofsize

-1

with respect to constants ftfr,JJllwith dnt rt IIAk/IIrand dnYt << 11rnrjlr,the sequences

IE-ktl'rn,t.vmjIX y

'

nt r n,l+pl r(17.22)

are summable for each n and t. EI

Since r > 2, the constants appearing in (17.21) and (17.22) are smaller(absolutely) than the autocorrelations, and the latter need not converge at thesame rate. But notice too that r/(r- 1) < 2, so it is always sufficient for theresult if the functions are JU-NED.

Proof As before, let Ests) = E. 1TJ),and let k = (</2J.By the triangle inequal-ity,

lE-vmx I f 1E-vmxt - Y+-zAlll + If(1$+,x1+-1#,)l. (17.23)The modulus and Hlder inequalities give

Page 287: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Fp/c Dependence 267

IEb-vmxt - e+-CA))I < 111--,+,,,IlrllA -V+-l-'tIIr/(,-I)

f 11l+,,, Ilgzlcv-l',(17.24)

X i f size-1.

Also, applying 17.5(i) with p = rllr - 1), a = r/tr - 2),where v,,, s oand b = 1,

IE i' E7kx )I = 1Ftekt+-11$+,?,F7kXJ1t+m-.k

t

<IIzE)f+-.',1$+,,,11r/(r- 1)IIA1+-',A)11r< 11E-t-+kYtwm11r/tr- 1)111,11r

:; II-')IIrcl'+,,,(It (17.25)

where cr < maxt 11F,IIr,Jl'l and (; is of size-1.

Combining the inequalities in(17.24) and (17.25)in (17.23)gives

IExtYtmmlI s maxl 11F,+,,,11ryJ,Il.f/llrcr+,,,ltv( + (()

f<Il7IlrII1$+,,,IIrt,,,, (17.26)

where (,,,= (vJ+ (() is of size-1.

This completes the proof of (i).The proof of(ii) is similar using 17.5(ii) with p = a = r/tr - 1) and b = 1.

For the array generalization, simply insert the n subscript after every randomvariable and scaling constant. The argument is identical except that 17.6 isapplied in place of 17.5. .

17.3 Near-Epoch Dependence and Transformations

Suppose that (Xjt,...,Xvt4' = Xt = g(...,Ft-1,FsFf+1 ,...) is a v-vector of LP-NEDfunctions, and interest focuses on the scalar sequence ((aV)J, where (t:1- F- R,1-q

RV, is a fv/f-measurable function. We may presutne that, uner certain condi-tions on the function, ($(#f) will be near-epoch dependent on fJSJif the elementsof Xt are. This setup subsumes the important case v = 1, in which the question atissue is the effect of nonlinear transformations on the NED property. The depen-dence of the functional form 9f(.)on t is only occasionally needed, but is worthmaking explicit.

The first cases we look at are the sums and products of pairs of sequences, forwhich specialized results exist.

17.8 Theorem Let Xt and Ft be LP-NED on (J/IJof respective sizes -(px and -f?y.

Then Xt + F, is AP-NED of size-mintfpx,tpyl

.

Proof Minkowski's inequality gives

11Xt + F,) - E'7'''''(m+ F,)II,< 112,- F7''''''A)Il,+ 11FJ - A7''''''1',11,<dhx + d% '' < dvm-tm tm t (17.27)

Page 288: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

268

/vJFl and v,n = vx +vp,F= tnt,?l-'nint'x,'rll .wheredt = maxf t , r m .

A variable that is Q-NED is AP-NED for 1 S p S q, by the Liapunov inequality, sothere is no loss of generality in equating the orders of norm in this result. The

same consideration applies to the next theorem.

Theory of Stochtic Processes

17.9 Theorem Let Xt and l be Q-NED on (FrJof respective sizes -x and -tpj..

Then, XtYt is AI-NED of size-.rninttpsfpxl

.

Proof By rearrangement, and applications of the triangle and Cauchy-schwartzinequalities, we have

E IXtn - F7':(mF,) j

= E jxtyt- x,l+-rlyfl+ (me1+-,1F,- (F7;m)(M+-''''''J',))

- Fj+-,1((-Y,- eX,1m)(1', - A'7': F,)) j<111,11211l', - 17,11lI2+ IIe'7''',''l',II2II-Y,- e'7lx)llz

+ Ilxr- e7,,,'''.&II, 11F, - el+-,,,'''Frll,

< 11.1,11zdkivmY+ 11ir,11ztl%-I + f'',v,,,Ft/'/v,l

<dtvm, (17.28)

=maxt IIx,II2t//,11F,1128,,dtidxtl andWhere dt1' X F X gtyyj-rninltpxrli gVm = Mm + Mm + MmTm =

.

ln both of the last results we would like to be qble to set 1$ = Xtwjfor somefinite j. A slight moditication of the argument is required here.

17.10 Theorem If Xt is AP-NED on (Wl, so is Xtwjfor 0 < j < x.

Proof lf Xt is fP-NED, then

Il-Y,w- Ext-q I@f'+Xy+-'''''')11,< 2 11.X',+./- EXt+) IT',++'t.,+-:)11,S dt vm, (17.29)

ing 10.28, where #;X=

2JXf+j. we can writeus

11Xt-vj- Et--Yrm./l S7,1) 11pf d'txvm',

wherev m < j0 ,

#'Vr?j =

.

Tm-j, m > J

and Mm'is of size -(p if vm is of size -.. .

Putting the last two results together gives the following corollary.

17.11 Corollary If Xt and Ffare Q-NED of size -gx and -(ps XtYtwkis AI-NED of

(17.30)

Page 289: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Fp/c DepeWence 269

size-minttps:xl

. n

By considering Zt = m-gwjlk-gwj,the AI-NED numbers can be given here as

Vo, m f W/21+ 1/

V?u =

Vr,l-(1/21-1, m > (W21+ 1F X F X d the constants are 4#f-wmY+k-g1a,assumingwherev,a = vm+v,u +vruvp,, an

hat dxt and dYtare not smaller than the corresponding Lz nonns.t

Al1 these results extend to the array case as before, by simply including theextra subscript throughout. Corollary 17.11 should be compared with 17.7, and caretaken not to confuse the two. ln the former we have k fixed and finite, whereasthe latter result deals with the case as m

--+

x. The two theorems naturallycomplement each other in applying truncation arguments to infinite sums of prod-ucts. Applications will arise in subsequent chapters.

More general classes of function can be treated under an assumption of continu-ity, but in this case we can deal only with cases where 9f(A) is Q-NED. Let

4)(x):-I' F-> R, 'T iRV

be a function of 1/ real variables, and use the taxicab metric onRV,

(17.31)

1 d 2 We consider a set of resultsto measure the distance between points x an x .

that impose restrictions of differing sevelity on the types of function allowed,but offer a trade-off with the severity of the moment restrictions. To begin with,impose the uniform Lipschitz condition,

1 xl I < B pt.tl xll a.s., (17.32)l(1),(.1 ) - 9/ ) , ,

where Bt is a tinite constant.

17.12 Theorem Let Xitbe AZ-NED of size -a

on fF2) for i = 1,...,v, with constantsdit. If (17.32)holds, ((aV)) is also AZ-NED on fF2) of size -c, with constantsa finite multiple of maxftljll.

Proof Let 4),denote a Fj+-l-measurable approximation to /ml. Then

V

l 2 1 2ptx ,.'r ) = IA'f -xi I,

izu1

+Ak

II4h(A)- FJ-,,''>,(.X))Il2< 119,(2,)- 9,112 (17.33)by10.12. Since ((j+-;;1A) is an Tjlrl-measurable random variable, (17.33)holdsfor this choice of ,, and by Minkowski's inequality,

119,(1,)- F7''''''9,(A)112< #,Ilp(.2',,.E7;7,-Y,)112V

u Bt 11-'t,- e,!,,,',,-'t,II2i=1

V

< Bt ditvimf=1

Page 290: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

270

f dtvm, (17.34)-1 Y he latter sequence being of size -.awhere dt = vffmaxjltitl and v,,, = v Zjazlvfm,t

by assumption. x

Theoty of Stochastic Processes

lf we can assume only that the Xit are TV-NED on F, for some p e (1,2),thisargument fails. There is a way to get the result, however, if the functions (t arebounded almost surely.

17.13 Theorem Let Xitbe JV-NED of size -a

on (Ff),for 1 f p f 2, with constantsdit, i = 1,...,v. Suppose that, for each t, I (A) j f M < x a.s., and also that

l 2 i B (#l xl) 2M) a.s., (17.35)I9,(.1) - #f(-Y)l f m nl lp, ,

where Bt < x. Then ((# )) is Q-NED on (Ftl of size-apl2,

with constants a%finitemultiple of maxjttf) ).i X and let z = #,p(A'l .Y2)/2M so thatProof For brevity, write ( = 9r( , , ,

1 2 / zMmintz, 1). Thenl#,- 9,I

,7(9)-4,:2

- Jtzs,)(9l-tg)2'

+ f'(z>1,(9l-tbldr

<

<(2.2 Jtzs,lzz'+ Jtzoll'l2s(z,)f (234)

=BitEpx,xl)pl, (17.36)

with (17.36),we can write/2 gAy)1-p/2 combining(17.33)where Bjt = Bq ( .

114,,().;,)- F7:,(#2Il2s #1rlIp(--F7''''''.f,) 119/2v plz

< #l, Ilx - .#7j+-,,j&'.lI?)=1

v p11

< B3t ditpimf=1

< dtvm, (17.37)h d = #1,vP/2maxj(J(/,2j and vm = (v-l kvimlpll which is of size

-aplz

byW el'e t t= ,

assumption. .

An important ekample of this case (withv = 1) is the tnmcation of Xt, althoughthis must be defined as a continuous transformation.

17.14 Example For M > 0 let

Page 291: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Fp/c Dependence

.x, IxI f M

@)= M, x > M

-M, x < -M

(17.38)

or, equivalently, 4,(x)= x1( Ixls.v) +M(x/Ixl )1(I.vI>v). In this case1 2 < jxl - x2jl9(-Y) - #(-Y)I ,

so set B = 1, and 17.13 can be used to show that I()IXJJis Q-NED if (X,) isG-NED. The more conventional truncation,

Xt, IXtI S MXflf IxlsM) =

0, otherwise,(17.39)

cannot be shown to be near-epoch dependent by this approach, because of the lackof continuity. n

A rther variation on 17.12 is to relax the Lipschitz condition (17.32)byletting the scale factor B be a possibly unbounded function of the random vari-ables. Assume

1 xl j < B (x1x2)p(x1x2) a s., (17.40)I9,(.Y ) - 9r( ) , , , .

where, for each f,

1xl '!r

x'r (17.41)Btx , ):A/s-measurable function. To deal with this case requires ais a non-negative, B

lemma due to Gallant and White (1988).17.15Lemma Let B and p be non-negative r.v.s and assume Ilpllq< c'o, 11#11#(t?-1)< x, and Ilfpllr< x, for q k 1, and r > 2. Then

Illpll2S 2(IIpIIJ-2II#II-/1-l)Il#pII;)'/2Cr-1). (17.42)

Proof Detine

c - (IIpIlII#II/(-l)II#pII-rr)''f1-r), (17.43)and let #1 = ltapscls. Then by the Minkowski inequality,

II#pII2< II#1pII2+11(.:-#1)pII2.The right-hand-side ter'ms are bounded by the same quantity. First,

1/2

II#lpII2= Jspsctfplof'1/2

<c1?2jBp dp

/ c''2(IIpIIII#ll/f:-1))''2, (17.45)

(17.44)

Page 292: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

272

applying the Hlder inequality. Second,

1/211(/- Allplla =

jsvwcBtldp

F/ldtpry of Stochastic Processes

1/2

< cl-rll

j (splupBoc

< t)2'-r/zljlp

Ij;/2 (17.46)

where the first inequality follows from r > 2 and #p/C > l . Substituting for C in(17.45) and (17.46)and applying (17.44)yields the result. .

The general result is then as follows.

17.16 Theorem Let l#f)be a v-dimensioned random sequence, of which each ele-ment is Q-NED of size -a

on (FCJ,and suppose thatY/m)is fvbounded. Supposefurther that for 1 s q S 2,

I1p(.X',,F7;Jl.fJllt?< x,

lI#,(-Y,,e-7:A',)IIc&c-1)< x,

and for r > 2,

Il#?(#/,%+--M#,)p(#f,Fj+->,)IIr< oo.

Thn ((f(A) l is JU-NED on (Ff) of size-alr

- 2)/2(r - 1),

Proof For ease of notation, write p for p(.X',,F7:#Jand B for BtxtnE7-zXtl.As in the previous two theorems, the basic inequality (17.33)is applied, but nowwe have

11(-))- e'7Jr9(A-f)l12< 114,(.X',)- #(F77X))Il2S llfpll2s 2IlpII:Cr-2)/2(r-1)Ij#jj4(;-(:21(2)(r-1)jjspjjrr/2(r-1)

where the last step is by 17.15. For q f 2,

(17.48)

-1E( v which is of size -a by assumption.where dt = v maxjltl) and Mm = v .1 im,Hence, under the stated assumptions,

lIf(-'t)-.E:+-.p,''',(-Y,)lI2< div,?,tr-:D/zr-ll (17.49)7

(17.47)

V V

Ilpllq< lIpIl2< 7711.1/,-.E1+-,,,',,.X112s 7-ltrvf,,,= dtvm-f=1 f=1

where#J= II#llJ)-(22(2)(r-1)jl#pjlrr/ztr-ll(yst.

Observe the important role of 17.15 in tightening this result. Without it, thebest we could do in (17.47)would be to apply Hlder's inequality directly, toobtain

Page 293: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near fp/ch Depeeence 273

lI#pII2< IIpIIuII#II2w(?-l),q 1. (17.50)The minimum requirement for this inequality to be useful is that B is boundedalmost surely pennitting the choice q = 1, which is merely the case covered by

1.v2

17.12 with the constant scale factors set to ess sup Bt(X , ).The following application of this theorem may be conasted with 17.9. The

moment conditions have to be strengthened by a factor of at least 2 to ensure thatthe product of Q-NED functions is also Q-NED, rather thanjust LI-NED. There isalso a penalty in terms of the AZ-NED size which does not occur in the other case.

17.17Example Letav = Xt,Yt)and ()tffl = XtYt.Assume that lI#,II2r< x and 111$II2r< x for r > 2, and that Xt and Ff are ZZ-NED on (Fll of size -J. Then

l-'1F) - -Yl1'lI K I-)

II - l'l I+ I.1)

-

.11

IlFl

K (Ixl I+ IFl I)(IFl - l'1l + I-Y)- A I)

=#(A1,A)p(#l,.f), (17.51)

defining B and p. For any q in the range (4/3,41,the assumptions imply

II#(#),-Yl)Ilw(?-1)K 1111II/(-1)+ 11FllIw(?-1)< x, (17.52)

Ilp(.Cl,-Yl)Il?< IIl'l1I?+IIl'lIl?+11.X1Il?+Il-Ylll < =, (17.53)

and

II#(-Yl,-Yl)p(-Y),-Yl)IIr:K11.11112r111')Il2r+II.X1II2rIlFlll2r+II.X1lIlr

+IIx)lI2rIlA:II2r+lIFlII2rIIFlI12r+IIFlIllr

+IIFlII2rII.X1II2r+IIFlII2rIIAIl2r(17.54)

Putting -Y1= Xt and .Y1 = E7-;,Xt, the conditions of 17.16 are satisfied for q inthe range (4/3,21and Xtj is Q-NED of size -J(r - 2)/2(r- 1). n

17.4 Approximability

ln j17.2 we showed that N-NED functions of mixing processes were mixingales,and most of the subsequent applications will exploit this fact. Another way tolook at the JV-NED property is in tenns of the existence of a finite lag approx-imation to the process. The conditional mean E(Xt IJ;j+-/l)can be thought of as afunction of the variables 6-m,...,Vt+m, and if (Ff) is a mixing sequence so is1F(A! Fj+-'QI,by 14.1. One approach to proving limit theorems is to team a limittheorem for mixing processes with a proof that the difference between the actualsequence and its approximating sequence can be neglected. This is an alternative

way to overcome the problem that 1ag funtions of mixing processes need not bemixing.

Page 294: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Iheory of Stochastic Processes

But once this idea occurs to us, it is clear that the conditional mean might notbe tlle only function to possess the desired approximability property. More gener-ally, we might introduce a definition of the following sort. Letting 1$ be l x 1,

l(2m+1) gyr-yvlys-measurablefunction, where st*m=we shall think of /C:R F-y R as a t-m t-mc(F,-?,,,...,F,+,,;).

17.18Dennition The sequenee fm) will be called u-approximable(p > 0) on the

sequencefv,) if for each m s s there exists a seq'uJncef/li'l of s?i/fx'-'/'/-measurablerandomvariables, and

II-Y,-/ClI,S dtvm, (17.55)where fdtI is a non-negative constant sequence, and vm--A 0 as m

--)

c'o. (X,) willalso be said to be approximable in probability (or mapproximable) on fFf ) ifthere exist (T), (t$), and (vpr,las above such that, for every > 0,

#(IA -

'71> dtn) f vm. u (17.56)

The usual size tenninology can be applied here. There is also the usual extension

to arrays, by inclusion of the additional subscript wherever appropriate.If a sequence is fv-approximable for p > 0, then by the Markov inequality

#( lXt - h'l I > dtn) < ()-PlI-Y, - /1'J'11p#< vm', (17.57)where s'm'= 8-J'vj,; hence an p-approximable process is also Q-approximable.AnAP-NED sequence is fp-approximable, although only in the casep = 2 are we able toclaim (from10.12) that Ext 11j+-'''m)is the best w-approximator in the sense thatthe p-nonns in (17.55) are smaller than for any alternative choice of hl'.

17.19 Example Consider the linear process of 17.3. The function

h';l= X jvt-j (17.58)j=-m

isdifferent from A'(#,I@,'+-':)unless (J/',) is an independent process, but is also an

zw-approximatorfor xt since

112,- hmt11,= 77 jvt-j +-jb-vjt

< dts'm,

1 >jzzm-

where vm = 7=,u+1(joy1+ l -j l) and dt = sup,ll 7,IIp.o17.20 Example In 17.4, the functions g are fv-approximators for Xt, of intinitesize,whenever sup,sf IIF,F,IIp< x. n

One reason why approximability might have advantages over the AP-NED prop-erty is the ease of handling transfonnations. As we found above, transferring theJV-NED property to transformations of the original functions can present diffi-culties, and impose undesirable moment restrictions. With approximability, thesedifficulties can be largely overcome. The first step is to show that, subject toZr-boundedness, a sequence that is approximable in probability is also fw-approx-imable forr < r, and moreover, that the approximator functions can be bounded for

(17.59)

Page 295: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Epoch Depcndence

each finite m. The following is adapted from Ptscher and Prucha (1991a).17.21Theorem Suppose fXf) isfv-bounded, r> 1,andfvapproximablebyT.Thenfor 0 < p < r it is fv-approximable by J?;'= /CI ( Ihmtj<tw,u), where Mm< x for each

m e IN.

Proof Since hmtis an fm-approximator of Xt, we may choose a positive sequencef ,u) such that 8,u -->

0 and yet #( IXt - h'l I > dtnm) f Mm --+ 0 as m -- x. Alsochoose a sequence of numbers (A&) having theproperties Mm

--:x, butlfmv,u.-+0.

-1/2 h is no loss of generality in assumingFor example, Mm = vm would serve. T ere-1 1 B Minkowski's inequality we are able to writesupm'/k ; . y

IlA-/7IIpf A)r,,+z4lm+Al,?,, (17.60)

where 41,,,,= II(A- 1:,)1(Ix,-pI>.,.u) 11p,A1 = 11(x,-:7''')1fIx,-r;'I:<tu,,,I,I>.f,A<,,,) llp,t m t

43 = (I(xt-

-''')1(

Ix,-'krllxtfj:m,shrylstjAfm) 11p.t l11 f

Hlder's inequality implies that

11A'1'11,S II#II,qIIl'II?v&-l),> 1. (17.61)chooseq = r/# and apply (17.61)to Amitfor i = 1,2,3. Noting that 112,- /')'IIr<IImIIr+dtMm,again by Minkowski's inequality, and that 11lsllfx= #(F), we obtain

thefollowing inequalities. First,

41,,,,/ Il-Y,- l'7IIr#(I-Y, - /'71> dtnm)

f t4(IIA/IIrM,,,-1 + 1)MmMm. (17.62)Second, observe that

l IXt -

'l l S dtnm. II';'I > dt Mml i t IXtI > dtMm -,,,) l

and hence, when slm> &m,

P Ixt- /';'I < dtnm, I/T1> dtsl) S #( 11,1> dtMm -

,,,))

f IIXIIIrrdirum - nml-rby the Markov inequality, so that

(17.63)

Al,,, < 11.2,-:'IIr#(11,-/T1< dtnm, I/')'I > dtsl)

<(IIAlIr+J,M,?,)II-Y,IIrrtrfr(Az,,-

,,,)-r

S (IIA/4Ilrr+'+ II.X',/f,II3ArJ',,,(M,,,- nm4-r. (17.64)-1 b 1 ) And lastly,(The final inequality is f'rom replacing Mm y .

3 / dn (1765)Atm , m .

in view of the fact that 7;T= h'l on the set f I/';I

I S dtMml. We have therefore

Page 296: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

established that

11X - A-'''11S d'vm',t t p t (17.66)

where

v'

= M v,n+MmMm - m)-r+

and %'m' --) 0 by assumption, since r > 1, and.'

d; = ztf/rnax(lI.Y,/#,IIr,Il.Xr/ttllrr+l,IlA/t4ll rr, 1). .

(17.67)

(17.68)If o-approximability is satistied with dt <f IIx,II,,then J; = 2.

The value of this result is that we have only to show that the transfonnation ofan fvapproximable variable is also fvapproximable, and establish the existenceof the requisite moments, to preserve L

-approximability

under the transformation.?Consider the Lipschitz condition speclfied in (17.40).The conditions that need tobe imposed on #(.,.) are notably weaker than those in 17.16 for the LP-NED case.

17.22 Theorem Let 'J'= htt...shmvt)be the fo-approximator of Xt = Xjt,...,Xvt)'of size -(p. If #t:

RVF-> R satisfies (17.40),and EBtXt,h'1)E) < x for : > 0,

then (j,4.Y,)is fo-approximable of size -(?.

Proof Fix 6 > 0 and M > 0, and define dt = Elxllff.The Markov inequality gives

#(I#r(m)- 9,('7)I > dt) S PBt-knhllqpxphl't > s, #f(m,:') > M4

+ #(#f(A'r,h'7)p(m,:') > dtn, Btxt,h', S M)

F(#/(#s')')E)/3#+ Plpxt,hll > dt&M). (17.69)

Since M is arbitrary the first term on the majorant side can be made as small asdesired. The proof is completed by noting that

V

Ppxt,h'l) > J,&M) = P X Im,-

'rri> dtnlM

=1

K P Ut Iff - ll > ditnlMjzz1

%'

< #( jXit -

't',

t l > dit&lM)

V

f Xvjm-+

0 as m--y

x. w=l

(17.70)

It might seem as if teaming 17.22 with 17.21 would allow us to show that, given anfv-bounded, Q-NED sequence, r > 2, which is accordingly Q-approximableandhence Q-approximable,any transfonnation satisfying the conditions of 17.22 isQ-approximable,and therefore also Q-NED, by 10.12. The catch with circum-

Page 297: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Near Fp/c Dependence 277

venting the moment restrictions of 17.16 in this manner is that it is not possibleto specify the Q-NED size of the transfonned sequence. In (17.67),one cannot puta bound on the rate at which thesequence (,,, )may converge without specifying thedistributions of the Xit in greater detail. However, if it is possible to do thisin a given application, we have here an alternative route to dealing with trans-formations.

Ptscher and Prucha (1991a),to whom the concepts of this section are due,define approximability in a slightly different way, in terms of the convergence ofthe Cesro-sums of the p-nonns or probabilities. These authors say that Xt isfv-approximable p > 0) if

n

1limsup- 7) 111,-'yllp-- 0 as m

.-->

x, (17.71)11n-x >1

and is Q-approximableif, for every > 0,

1 n

limsup-X#(lx,- :,1> )---> O as m

--+

x.n

n-yx >1(17.72)

lt is clear that we might choose to define near-epoch dependence and the mixingaleproperty in an analogous manner, leading to a whole class of alternative converg-ence results. Comparing these alternatives, it turns out that neither definitiondominates, eachpermitting a form of behaviourby the sequences which is ruled outby the other. If (17.55)holds, we may write

1 n 1 n

limsup--

Y 112,-'/lln < limsup - Tdt v,a--> 0

n''' ''r A

nn-yx >1 n--hx >1(17.73)

so long as the limsup on the majorant side is bounded. On the other hand, if(17.71) holds we may detine

1 n

v,,,= limsup - 7)111,-hmt11,,n-x >1

(17.74)

and then dt = supp,t 111,- Tllp/vmlwill satisfy 17.18 so long as it is t'initeforfinite t. Evidently, (17.71)permits the existence of a set of sequence coordin-

ates for which the p-norms fail to converge to 0 with m, so long as these areultimately negligible, accumulating at a rate strictly less than n as n increases.

the other hand, (17.55)permits trending moments, with for example #f = 0(/),On

, > 0, which would contradict (17.71).Similarly, for ,u > 0, and vm > 0, define dtm by the relation

#( IXt - h'II > dtmm) = Mm, (17.75)and then, allowing v,,i

--> 0 and m-- 0, define dt = suvmdtm. (17.56)is satisfied

if dt < x for each tinitet; this latter condition need not hold under (17.72).Onthe other hand, (17.72)could fail in cases where, for jed 8 and every m,P IXt -

';l I > ) is tending to unity as /-->

x.

Page 298: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 299: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

1A?

THE LAW OF LARGE NUMBERS

Page 300: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 301: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

18Stochastic Convergence

18.1 Almost Sure ConvergenceAlmostsureconvergence wasdefinedformally in 512.2.Sometimestheconditionisstated in the form

# limsup IXn - 11 > 6: = 0, for a11E > 0. (18.1)n->x

Yet another way to express the same idea is to say that #(C) = 1 where, for each (t)

e C and any s > 0, I-L() - Xt)l I> E at most a finite number of times as we passdown the sequence. This is also written as

#( IAk - Xl > E, i.o.) = 0, all E, > 0, (18.2)where i.o. stands for tinfinitely often'.

Note that the probability in (18.2)is assigned to an attribute of the wholesequence, not to a pmicular n. One way to grasp the 'infinitely often' idea isto consider the event U';=,,,lIXn - *-1 > erl; in words, 'the

event that has occurredwhenever ( IXn - 11 > el occurs for at least one n beyond a given point m in thesequence' . lf this event occurs for every m, no matter how large, ( IXn - 11 > eloccurs infinitely often. ln other words,

Cr r

f IXn - 71 > s, i.o.) = O U( IXn - #1 > El

vl=1 n=m

= limsupt IXn - 11 > el .

n-x

(18.3)

Useful facts about this set and its complement are contained in the followingj emma.

18.1 Lemma Let fEn G TlT be an arbitrary sequence. ThenX

(i) P limsup En = 1im P UEm .

n--x n--x m=n

X

(ii)# liminf En = 1im # OEm .

n-'hoo m=n

Proof The sequence (U7=,aFa1'Q1is decreasing monotonically to limsuw Fn. Part(i) therefore follows by 3.4. Part (ii) follows in exactly the same way, since the

Page 302: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

282

sequence f*n=mbJo;=lincreases monotonically to liminf En. .

A fundamental tool in proofs of a.s. convergence is the Borel-cantelli /'r/;zz?.t7.This has two parts, the tconvergence'

part and the Kdivergence'

part. The formeris the most useful, since it yields a very general sufficient condition forconvergence, whereas the second parq which generates a necessary condition forconvergence, requires independence of the sequence.

The frw of f-zzrg: Numbers

18.2 Borel-cantelli lemma(i) For an arbitrary sequence of events fEn e ??7)T,

X

XpEnb < -

n=1PEn, 1

.0.)

= 0. (18.4)

(ii) For a sequence (En e 5 )7 of independent events,X

#(&) = x = Pk i.o.) = 1.n=1

(18.5)

Proof By countable subadditivity,e w

'

x

# UEn < XPEn4.n=m n=m

(18.6)

The premise in (18.4)is that the majorant side of (18.6)is tinite for m = 1.This implies X*n=mPEn) --> 0 as m

-->

x (by 2.25), which further impliesX

1imP UEn = 0. (18.7)pl-hl n=m

Part (i) now follows by part (i) of 18.1.To prove (ii),note by 7.5 that the collection (AN,(

e ;)7 is independent; hencefor any m > 0, and m' > m,

; ;/'

# g)Ecn = I-IPYn) = 1-I(1 - P(En44h=n1 n=m n=m

< exp -XPEn) -- O as m' -+

.x,

n=m

by hypothesis, since e-x > 1 -

x. (18.8)holds for a1l m, so

(18 . 8)

X

rtliminfEcnl= lim P OYn = 0,m--koo n=m

(18.9)

by 18.1(ii). Hence,

PEn i.o.) = 'tlimsup En4 = 1 - Ptliminf E = 1. . (18.10)

To appreciate the role of tls result (the convergence partl in showing a.s.convergence, consider the particular case

Page 303: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Convergence 283

En = f ): IXnlk - X()) IIf *n=3PEn)< x, the condition PEn) > 0 can hold for at most a finite number of

n. The lemma shows that P(En i.o.) has to be zero to avoid a contradiction.Yet another way to characterize a.s. convergence is suggested by the following

theorem.

18.3 Theorem (Xn) converges a.s. to X if and only if for all 6: > 0

lim # sup lXn -

.:-1

S s = 1. (18.11)pl-x nkm

Proof LetX

A,,,(E) = (')1: IAk(t,))-#()

I S e1 e @,n=m

(18.12)

and then (18.11) can be written in the fonn lim,,,-+x#(Am(e)) = 1. The sequencefA,,,(E) 17 is non-decreasing, so A,,,(E) = UXlA/:); letting A(E) = U,,7=lAm(E),

(18.11) can be stated as #(A(:)) = 1.Define the set C by the property that, for each ) s C, (Xn((t))lTconverges. That

is, for (t) e C,

3 ??z() such that sup lA%((J)) - X(l 1< e, for all i) > 0. (18.13)numl)

Evidently, e C= ) e Am(E) for some (l, so that CiA(E). HencePto = 1implies#(A(E)) = 1, proving tonly ito.

To show if' , assume #(A(E)) = 1 for al1 E > 0. Set E = $Ik for positive integerk, and define

X <x) C

A* = f)Ajlk) = tjAtlkjc .

t=1 &1(18.14)

The second equality here is 1.1(iv). By 3.6(ii), #(A*) = 1 - #(U7=1A(1/k)9 = 1.But every element of z4* is a convergent outcome in the sense of (18.13), hence A*c C, and the conclusion follows.w

The last theorem characterizes a.s. convergence in tenns of the unform prox-imity of the tail sequences (IXnll) - X()) l1';.,,,to zero, on a setAm whose measureapproaches 1 as m

--y

x. A related, but distinct, result establishes a direct linkbetween a.s. convergence and uniform convergence on subsets of (.

18.4 Egoroff's Theorem lf and only if Xn-6:t-:

X, there exists for every > 0 aset C(8) with #(C48)) k 1 - 8, such that Ak() .-+ X() uniformly on C(&).

Proof To show Eonly if', suppose aL()) converges uniformly on sets C(I/Q,k =

1 2 3 The sequence (C(1/k) k e (N) can be chosen as non-decreasing by mono-1 , ;*** ,

tonicityof #, and #(U7=1C(1/k)) = 1 by continuity of #. To show tif', 1et

Page 304: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

284 The fuzw of fzzrgd Numbers

A,,() = O (t,):lXn(t,))-X()I < llml,n=klmj

( l 8 . 15)

km) being chosen to satisfy the condition #(A,,,(6)) 1-2-&'.

In view of a.s.convergence and 18.3, the existence of finite klm) is assured for each m. Then if

*

c(8)= nAa,tl,rrI=1

(18.16)

convergence is uniform on C(l by construction; that is, for ever.y ) e C(8),

IXnt)l - X()) I < Im for n k km), for each m > 0. Applying 1.1(iii) and subaddi-tivity, we tind, as required,

#(c()) = 1-# UAm()f

m=1

*

1- E (1- #(z4m()))m=1

k 1 - 8. . (18.17)

18,2 Convergence in ProbabilityIn spite of its conceptual simplicity, the theol'y of almost sure convergencecannot easily be appreciated without a grasp of probability fundamentals, andtraditionally, an alternative convergence concept has been preferred in econo-metric theory. If, for any E > 0,

1im#(IXn - X1 > f:) = 0, (18.18)n'M

Xn is said to converge in probability (inpr.) to X. Here the convergent sequencesare specified to be, not random elements (X,,()) 1T,but the nonstochastic sequences(#( IXn - Xl > E) )T.The probability of the convergent subset of ('1 is left unspec-ified. However, the following relation is immediate from 18.3, since (18.11)implies (18.18).18.5 Theorem If Xn

-66.9

X then Xn -E'.% X. (a

The converse does not holdkConvergence in probability imposes a limiting condi-tion on the marginal distribution of the pth member of the sequence as n

..-.h

x. Theprobability that the deviation of Xnfrom X is negligible approaches 1 as we movedown the sequence. Almost sure convergence, on the other hand, requires thatbyond a certain point in the sequence the probability that deviations are negli-gible from there on approaches 1. While it may not be intuitively obvious that asequence can convergein pr. but not a.s., in 18.16 below we show thatconvergencein pr. is compatible with a.s. n/nconvergence.

However, convergence in probability is equivalent to a.s. convergence on a

Page 305: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Convergence 285

subsequence', given a sequence that converges in pr., it is always possible, bythrowing away some of the members of the sequence, to be left with an a.s. conver-gent sequence.

18.6 Theorem Xn -EI-> X if and only if every subsequence fAk, k e IN) contains afurther subsequence fxnkcj,j e (N) which converges a.s. to X.

Proof To prove'only if : suppose P( IXn - XI> E) -- 0 for any e > 0. This means

that, for any sequence of integers fns k e INl , #( IXnk- XI> e) --> 0. Hence foreach j G LNthere exists an integer kU)such that

#( 1Xnk- 71 > 1/.j) < 2-/, a11k k k(j). (18.19)

Since this sequence of probabilities is summable over j, we conclude from thefirst Borel-cantelli lemma that

#( IXnkv- XI > $lj i.o.) = 0. (18.20)It follows, by consideration of the intinite subsequences Ljk J) for J > 1/:,that #( IXnkv- XI > e i.o.) = 0 for every 6: > 0, and hence the subsequenc:fxnw)J converges a.s. as required.

To proveKif'

: if (Ak) does not convergence in probability, there must exist asubsequence (rl1)such that inflf#tlAk-rl > s)l s, for some 6: > 0. This rules

out convergence in pr. on any subsequence of (?u),which nlles out convergencea.s. on the same subsequence, by 18.5. w

18.3 Transformations and ConvergenceThe following set of results on convergence, a.s. and in pr., are lndamental

tools of asymptotic theory. For completeness they are given for the vector case,even though most of our own applications are to scalar sequences. A randomk-vector Xn is said to converge a.s. (inpr.) to a vector X if each element of Xnconverges a.s. (in pr.) to the corresponding element of X.

18.7 Lemma Xn --> X a.s. (inpr.) if and Only if IlAk- #II -->

0 a.s. (in pr.).19

Proof Take first the case of a.s. convergence. The relation Il#n- #ll -E't-y 0 may beexpressedas

kP 1im X xni- Xill < :2

n-yoo f=1(18.21)

for any 6: > 0. But (18.21)implies that

# lim 1Xni - XiI < E, = 1,.

..,k

= 1,n'-oo

(18.22)

proving 'if'. To proveKonly ito, observe that if (18.22)holds, f'tlimp,--hxllAk- #1I

IS 1 for any : > 0. To get the proof for convergence in pr., replace< k E) =,

#(limn-+x...) everywhere by limn-+x#(...), and the arguments are identical. w

Page 306: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

286

There are three different approaches, established in the following theorems, tothe problem of preserving convergence (a.s.or in pr.) under transformations.

The fww of f-zzr'd Numbers

k R be a Borel function, 1etCg R be the set of conti-18.8 Theorem Let ': R-->

nuity points of g, and assume PX e G) = 1.(i) If Xn

-6.+

X then g(Xn4-6-+

#(m.(ii) If Xn -Z'.:o X then gxn) .-.E'.4 '(m.

t

Proof For case (i), there is by hypothesis a set D e @,with #(D) = 1, such thatA%() -- #4(z)),each (t G D. Continuity and 18.7 together imply that glxnll --

-1 C ) ch D. This set has probability 1 by 3.6(iii).g(#()) for each (.t e X ( g

Toprove (ii),analogous reasoning shows that, for each r> 0, 3 > Osuch that

1t0:IlX() -A)lI < ) c'-kr,sj c (: l'(A())-p(#())l

< :). (18.23)Note that if #(#) = 1 then for any A G @,

PA rn #) = 1 - P(Ac LJ Bc) k PA) - PBc) = PA4 (18.24)by de Morgan's law and subadditivity of #. ln paYcular, when PX e Cg) = 1,(18.23) and monotonicity imply

#((I#n-fIl < Y S #(I'(Ak) - #(mI < E). (18.25)Taking the limit of each side of the inequality, the minorant side tends to 1 byhypothesis. w

We may also have cases where only the difference of two sequences is convergent.

18.9Theorem Let t#a)and (zn)be sequences of randomk-vectors (notnecessarilyconverging) and g the function detined in 18.8, and let P(Xn e Cg) = P(Zn e Cg) =

1 for evel'y n.

(i)lf IlA%-zn

11-f..>

O then 1:(.Yn)- #(za)I-61..>

0.

(ii)If llAk-znll -r...y

0 then I#(.2L)- #(zn)1 -E',ly 0.

Proof J'ut E'n = Xn-lc ), EZa= Z-a1(c#), EX

= ()xnmjl,and FZ=

OQazzjvn.

PCEX)=

!PEZ)= 1, by assumptlon and 3.6tiii). Also let D be the set on which Il#n- Zn11

converges. The proof is now a straightforward variant of te preceding one, withX EZ laying the role of r1(C ). .the set E c p g

The third result specifies convergence to a constant limit, but relaxes the conti-nuity requirements.

R be a Borel function, continuous at a.18.10 Theorem Let g: R --,9

(i) If Xn-6-,.-+

a then gx-6-y

ga).(ii) If X. -E(- c then g(Xn4 -E(-> ga).

Proof By hypothesis there is a set D G F, with #(D) = 1, such that Xn -- c,each (l) e D. Continuity implies #(A%()) --> ga) for (1)e D, proving (i).Likewise,

1: llA%()-cll

< l c lt,):l#(Ak()) -g(c)I < El, (18.26)and (ii) follows much as in the preceding theorems. *

Page 307: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Convergence 287

Theorem 18.10(ii) is commonly known as Slutsky's theorem (Slutsky 1925).These results have a vast range of applications, and represent one of the chief

reasons why limit theory is useful. Having established the convergence of one setof statistics, such as the first few empirical moments of a distribution, one canthen deduce the convergence of any continuous function of these. Many commonlyused estimators fall into this category.

18.11Example LetA,ybe a random matrix whose elements converge a.s. (inpr.) toa limit A. Since the matrix inversion mapping is continuous everywhere, the

-1-1 -1 -1

j jyj j.: : ejement byresults a.s.lim An = A (plimA,y = A ) follow on app y g .

element. E!

The following is a useful supplementary result, for a case not covered by theSlutsky theorem because Fn is not required to converge in any sense.

18.12 Theorem Let a sequence (Fn)Tbe bounded in probability (i.e.,Op(1) as n-->

x); if Xn -EI-> 0, then XnYn.-CL>0.

Proof For a constant # > 0, define FPn= L1 ( Iyklus). The event ( lXaF,,I el for

6:> 0 is expressible as a disjoint union:

t IXnYnI sl = l IA%IIF$ l k e:1 tp fIXnIIi'n - F$ l El. (18.27)

For any : > 0, ( IXnIIF,, I k :1 c l IXnI 2 e/#l, and

#(l-YnlIi I k s) < #(l-Yn1 zlB4--> 0. (18.28)

By the 0:(1) assumption there exists, for each > 0, B& < cxa such that%I > 0) < for n e EN. Since ( IXnIIF,, -

YB l sl c l Il'n -

YB I > 01,#( IL - Fn n n

(18.27)and additivity imply, putting B = Bs in (8.28),that

lim P( lXnk l k e) < . (18.29)n.->x

The theorem follows since both 6: and 6 are arbitrary. .

18.4 Convergence in L Norm#Recall that when F(l XnIP) < x, we have said that Xnis fw-bounded. Consider, for

p > 0, the sequence (II-L-XIIpl7.lf F(llxnllp < :cn all n, and limn-jxllAk --Y11p= 0,l When p = 2 we speak ofXn is said to converge in N zm?wzto x (writeXn

---+

m.convergence in mean square (m.s.).

Convergence in probability is sometimes called fvconvergence, terminologywhich can be explained by the factthatfv-convergence implies fx-convergence for0 < q < p by Liapunov's inequality, together with the following relationship,which is immediate from the Markov inequality.

18.1 Theorem lf Xn-1f,+

X for any p > 0, then Xn .T.LA X. n

The converse does not follow in general, but see the following theorem.

18.14 Theorem If Xn.-#-(..>

X, and ( IL IPITis uniformly integrable, then Xn -f:E.-)X.

Page 308: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

288 The fzzw of zlr': Numbers

Proof For e > 0,

e-lAk-11:

= F(1( Ix,,-xl''>:) lXn-11#)

+ F(1f Ixa-xtzkEl laY?,--YlP)

K e'(1(Ixn-xj''>EJ l-Yn-XI#)+ s. (18.30)Convergence in pr. means that #( lXn - Xl > e)

.-.+

0 as n.-+

x. Uniform integrab-ility therefore implies, by 12.9, that the expectation on the majorant side of(18.30) converges to zero. The theorem follows since e: is arbitrary. K

We proved the a.s. counterpart of this result, in effect, as 12.8, whose conclus-ion can be written as: jXn - X1 -E.. 0 implies A'lXn - XI

-,-h

0. The extension f'romthe fw1case to the Lp case is easily obtained by applying 18.8(i) to the case #(.)

= I.lP.

One of the useful features of Lp convergence is that the Lp norms of Xn - Xdefine a sequence of constants whose order of magnitude in n may be determined,providing a measure of the rate of approach to the limit. We will say for examplethat Xn converges to X in mean square at the rate nk if Ijxf-.Y112= On-h, but

-k This is useful in that the scaled random variable nkxt - A') may benot on ).non-degenerate in the limit, in the sense of having positive but finite limitingvariance. Determining this rate of convergence is often the tirst step in theanalysis of limiting distributions, as discussed in Part V below.

18.5 Exnmples

Convergence in pr. is a weak mode of convergence in that without side conditionsit does not imply, yet is implied by, a.s. convergence and Lp convergence. How-ever, there is no implication from a.s. convergence to Lp convergence, or vice

versa. A good way to appreciate the distinctions is to consider dpathological'

cases where one or other mode of convergence fails to hold.

18.15 Example Look again at 12.7, in whichAk = 0 withprobability 1 - 1/n, and Xn

= n with probability 1/n, for n = 1,2,3,.... A convenient model for this sequenceis to let ( be a drawing from the space (r0,1),m,1j,pz)where m is Lebesgue

measure, and define the random variable

n, ) e (0,1/a),Xnlj = (18.31)

0, otherwise.

The set f : limuAkt) # 0) consists of the point (0J, and has p.m. zero, so thatP 0 (1 - 1/rl)+ t'l'ln = n#-1 lt will beXn -E't-A 0 according to (18.1).But Fl Xnl = . .

recalled that this sequence is not uniformly integrable. lt fails to converge inLp for any p > 1, but for the case p = 1 we obtain Exn) = 1 for every n. Thelimiting expectation of Xn is therefore different from its almost sure limit. nThe same device can be used to define a.s. convergent sequences wltich do notconverge in Lp for any p > 0. It is left to the reader to construct examples.

Page 309: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Convergence 289

18.16 Example Let a sequence be generated as follows: .Y1= 1 with probability 1;xztxzl are either (0,1) or (1,0)with equal probability; (X4,X5,X are chosen

from (1,0,0),(0,1,0),(0,0,1)with equal probability; and so forth. For k =

1,2,3,... the next k members of the sequence are randomly selected such that oneof them is unity, the others zero. Hence, for n in the range (kt/c- 1) + 1,jkk + 1)J,P(Xn = 1) = 1/k, as well as FIXnlP = 1/k for p > 0. Since k .-.A x as n.-+ x, it is clear that Xn converges to zero both in pr. and in Lp norm. But since,for any n, Xnwj= 1 a.s. for infinitely many j,

#(I-LI < E, i.o.) = 0 (18.32)for 0 f 6: f 1. The sequence not only fails to converge a.s., but actuallyconverges with probability 0.

l/rx J whose members are either 0 or kllr in theConsider also the sequence fk n ,

'kk - 1) + 1 lk(k + 1)j. Note that E( Iklrx I#)=PVr-1 d b suitablerange (r , z n , an y

choice of r we can produce a sequence that does not converge in Lv fop > r. Withr = 1 we have Ekxn) = 1 for al1 n, but as in 18.15, the sequence is not uniformlyintegrable. The limiting expectation of the sequence exists, but is different fromthe probability limit. I:a

ln these non-uniformly integrable cases in which the sequence converges in Ljbut not in f.,l+e for any 0 > 0, one can see the expectation remaining formallywell-defined in the limit, but breaking down in the sense of losing its intuitiveintepretation as the limit of a sample average. Example 18.15 is a version of thewell-known St Petersburg Paradox. Consider a game of chance in which the playerannounces a number n e N,and bets that a succession of coin tosses will produce n

l+1 hheads before tails comes up, the pay-off for a correct predidion being E2 . T e-/-1 h x ected winnings are f 1', that is to say,probability of winning is 2 , so t e e p

it is aEfair game' if the stake is fixed at f 1. The sequence of random winnings

aoXn generated by choosing n = 1,2,3,... is exactly the process specified in 18.15.If n is chosen to be a vel'y large number, a moment's reflection shows that theprobability limit is a much better guide to one's prospective winnings in a t'initenumber of plays than the expectation. The paradox that with large n no one wouldbe willing to bet on this apparently fair game has been explained by appeal topsychological notions such as risk aversion, but it would appear to be an adequateexplanation that, for large enough n, the expectation is simply not a practicalpredictor of the outcome.

18.6 Laws of Large NumbersLet (m)Tbe a stochastic sequence and definex = n-1Z7=1Xf.Suppose that E(Xt) =

-1 ith lp,1< pa,' this is trivial in the mean-stationary case ing,?and n Z7=1p--)

g,wwhich gf = g, for al1 t. In this simple setting, the sequence is said to obey theweak law of large numbers (WLLN) when kn -CC..Ag,, and the strong law of largenumbers (SLLN) when kn

-.62-.4

p,.These statements of the LLNS are standard and familiar, but as characterizations

Page 310: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

290

of a class of convergence results they are rather restrictive. We can set jt = 0with no loss of generality, by simply considering the centred sequence (A) - g,flq;centring is generally a good idea, because then it is no longer necessary for thetime average of the means to converge in the manner specified. We can quite easily

-1 h time that n-1E: 1(AX))- w)--) 0. In such caseshave n E7=1h4-- cxa at t e same =

the 1aw of large numbers requires a modified interpretation, since it ceases tomake sense to speak of convergence of the sequence of sample means.

More general modes of convergence also exist. It is possible that kn does not

converge in the manner specified, even after centring, but that there exists a-1

sequence of positive constants (Jn ):' such that an 1 cxa and an X1=jXt-...+0. Resultsbelow will subsume these possibilities, and others too, in a fully general array

k =

i triangular stochastic array withformulation of the problem. lf f(AICJ,21 l n=1 s a(1k1':.1 an increasing integer sequence, we will discuss conditions for

knSn = Txnt--EC->0. (18.33)

>1

Ihe fzzw of fzzr'd Numbers

A result in this form can be specialized to the familiar case with Xnt =

-1 b t there are important applications where thean Xt - g' and an = kn = n, ugreater generality is essential.

We have already encountered two cases where the strong law of large numbersapplies. According to 13.12, kn

-810

g' = F(X1) when (m) is a stationary ergodic

sequence and FI.11

I < x. We can illustrate the application of this type of resultby an example in which the sequence is independent, which is sufficient forergodicity.

18.17 Example Consider a sequence of independent Bernoulli variables Xt withPxt = 1) = P(Xt = 0) = :1.,that is, of coin tosses expressed in binary form (see12.1). The conditions of the ergodic theorem are clearly satisfied, and we can

l de that n-1E) jA--f..y

E(Xt) = j. This is called Borel's normal numberconc u =

theorem, a normal number being defined as one in which Os and ls occur in itsbinary expansion with equal frequency, in the limit. The normal number theoremtherefore states that almost every point of the unit interval is a normal number;that is, the set of normal numbers has Lebesgue measure 1.

Any numberwith aterminating expansion is clearly non-normal and we know thata1lsuch numbers are rationals; however, rationals can be normal, as for example,t, which has the binary expansion 0.01010101010101... This is a different resultX

from the well-known zero measure of the rationals, and is much stronger, becausethe non-normal numbers include inutionals, and fonn an uncountable set. Foreyample, anynumberwithabinary expansionof theformo.l 1:1 11:21 1&g11...wherethe bi are arbitrary digits is non-normal; yet this set can be put into 1-1 cor-respondence with the expansions

.bjbz%,...,

in other words, with the points ofthe whole intelwal. The set of non-normal numbers is equipotent with the reals,but it none the less has Lebesgue measure 0. (a

A useful fact to remember is that the stationary ergodic propery is preservedunder measurable transformations', that is, if (A) is stationary and ergodic, so

Page 311: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Stochastic Convergence 291

is the sequence (#(.YJ)whenever g: R F.-+R is a measurable function. For example,2 - 1 2

--t.!..y

we only need to know that E(X34 < x to be able to assert that n 1J=laY,2 h dic theorem serves to establish the strong law for most stationaryEX34. T e ergo

sequences we are likely to encounter', recall from j13.5 that ergodicity is aweaker property than regularity or mixing. The interesting problems in stochastic

convergence arise when the distributions of sequence coordinates are hetero-geneous, so that it is not trivial to assume that averaging of coordinates is astable procedure in the limit.

Another result we know of which yields a strong law is the mmingale conver--1 -s...,

() wjwnevergence theorem (15.7),which has the interpretation that an Z7=1.VfZ7.1A)) is a submartingale with FI17=1A I< cxl uniformly in n, and an

--->

oo. Thisparticular strong 1aw needs to be combined with additional results to give it abroad application, but this is readily done, as we shall show in j20.3.

But, lest the law of large numbers appeyr an altogether trivial problem, itmight also be a good idea to exhibit some cases where convergence fails to occur.

18.18 Example Let (m) denote a sequence of independent Cauchy random vari-ith characteristic function 9xf(X)= e- lXI for each t (11.9).It is easy toables w

-aI,l/,, - I.I xxord.verify using formulae (11.30) and (11.33) that #:a(D= e = e .

ing to the inversion theorem, the average of n independent Cauchy variables isalso a Cauchy variable. This result holds for any n, contradicting the possibilitythat Xncould converge to a constant. l:a

18.19 Example Consider a processl

Xt = lgsz. = Xt- I + jthzt, t = 1,2,3, ...

J=1

(18.34)

with Ab = 0, where IZ/J7is an independent stationary sequence with mean 0 andvariance c2, and (%17is a sequence of constant coefficients. Notice, these areindexed with the absolute date rather than the lag relative to time /, as in thelinear processes considered in j14.3. For m > 0,

t

COv(X/,Xtwm)= VartXJ= c21 Fs.x=l

(18.35)

* l . i this case the effectFor (X,)Tto be uniformly fw-bounded requires Zyzzlvs< x, nof the innovations declines to zero with t and Xt approaches a limiting randomvariable X, say. Without the square-summability assumption, Vartl

-->

=. Anexample of the latter case is the random walk process, in which vs = 1, all s.Since Cov(xY1,m)= W1c2for every /, these processes are not mixing. Xnhas zeromean, but

n n J-l

1vara) =

w 7'vartx,l+ 277 yvartxp . (18.36)n >1 >2 j=1

If 17 lvy?< c,o, then limn-oxvartk) = c27,=1vl; otherwise Vartxa) -->

x. In eitherJ=

case the sequence fYa) fails to converge to a fixed limit, being either stochastic

Page 312: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

292

asymptotically, or divergent. n

These counter-examples illustrate the fact that, to obey the law of large numbers,

a sequence must satisfy regularity conditions relating to two distinct factors:the probability of outliers (limitedby bounding absolute moments) and the degreeof dependence between coordinates. ln 18.18 we have a case where the mean failsto exist, and in 18.19 an example of long-range dependence. ln neither case can Xnbe thought of as a sample statistic which is estimatinj a parameter of the under-lying distribution in any meaningf'ul fashion. ln Chapters 19 and 20 we devisesets of regularity conditions sufficient for weak and strong laws to operate,constraining both characteristics in different configurations. The necessity of aset of regularity conditions is usually hard to prove (theexception is when thesequences are independent), but various configurations of mixing andfw-boundedness conditions can be shown to be sufficient. These results usuallyexhibit a trade-off between the two dimensions of regularity; the stronger themoment restrictions are, the weaker dependence restrictions can be, and vice

The fww of zzr'd Numbers

Versa.

One word of caution before we proceed to the theorems. In j9. 1 we sought tomotivate the idea of an expectation by viewing it as the limit of the empirical

average. There is a temptation to attempt to defne an expectation as such alimit', but to do so would inevitably involve us in circular reasoning, since thearguments establishing convergence are couched in the laguage of probability.The aim of the theory is to establish convergence in particular sampling schemes.lt cannot, for example, be used to validate the frequentist interpretation ofprobability. However, it does show that axiomatic probability yields predictionsthat accord with the frequentist model, and in this sense the laws of largenumbers are among the most fundamental results in probability theory.

Page 313: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

19Convergence in Lp Norm

19.1 Weak Laws by Mean-square ConvergenceThis chapter surveys a range of techniques for proving (mainly)weak laws of largenumbers, ranging from classical results to recent additions to the literature. Thecommon theme in these results is that they depend on showing convergence ihfv-norm, where in general p lies in the interval (1,21.Initially we consider the

case p = 2. The regularity conditions for these results relate directly to thevariances and covariances of the process. While for subsequent results thesemoments will not need to exist, the faz case is of interest both because theconditions are familiar and intuitive, and because in certain respects the resultsavailable are more powedl.

Consider a stochastic sequence (m)T,with sequence of means (g.r1T,and vari-2 There is no loss of generality in setting g,f= 0 by simply consider-ances (cf )7.

ing the case of faV- w)7,but to focus the discussion on a familiar case, 1et us-1 ite) and so consider the question, whatinitially assume Fn= n 1J=1p<---:: p, tfin ,

fticient conditions for E(Xn- g)2 --> 0:7 An elementary relation isare suEk - g,)2= vartfnl+ EXn) - g)2., (19.1)n

where the second term on the right-hand side converges to zero by detinition of p..Thus the question becomes: when does Var(Xn)--> 0? We have

n n n t- l

Var(Xa)= E ?z-1X(.2V- g,2

=?z-2 N(U+ 2XXcs , (19.2)

>1 >1 /2 ml

where (U= Vartx and nts = Cov(Xt,Xs). Suppose, to make life simple, we assumethat the sequence is uncorrelated, with nts = 0 for f # s in (19.2).Then we havethe following well-known rsult.

19.1 Theorem If (.41Tis uncorrelated sequence and*

-2g2 < x77t , ,

>1(19.3)

then Xn-i+

g,.Proof This is an application of Kronecker's lemma (2.35),by which (19.3)implies VartYn)= ?z-21(U-- 0. .

This result yields a weak 1aw of large numbers by application of 18.13, knownas Chebyshev's theorem. An (amply)suftkient conditton for (19.3)is that the

>nG

Page 314: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

294

variances are uniformly bounded with, say, suprczff B < x. Wide-sense stationaryfall into this class. ln such cases we have Vartxa) = 0(n-1). But sincesequences

11we need is Var(Xn)= /41), cl -.->

x is evidently permissable. If cl - f1-8 fora -lGl has terms of O(f-1-8)

and therefore converges by 2.27.> 0, Z:=1f t ,

Looking at (19.2)again, it is also clear that uncorrelatedness is an unnec-essarily tough condition. It will suffice if the magnitude of the covariances canbe suitably controlled. Imposing unifonn fu-boundebness to allow the maximumrelaxation of constraints on dependence, the Cauchy-schwmz inequality tells usthat lcsl S B for a11t and s. Rearranging the formula in (19.2),

The zzw of fxzr'c Numbers

1 n n nVar(x- ) =

-

Xcl + 2Xncf ,-I + 2X t,t-z + ... +'2cn1

n-

,

n ,..1 p.2 >3

n g n-1 n

xw'x c2, + u x x Jc,.,-- )n >1 n rnczl f=m+1

n-1# 2< - + u E n - mlBm,

-=1(19.4)

where Bm = supl ctz-ml, and Bm ; B, all m k 1. Thisvariant on 19.1.

suggests the following

19.2 Theorem If (a))Tis a uniformly Q-boundedsequence, and Zm''Q1r?:-1#,,j< x

where Bm = suprl nzt-ml, then kn -V2-+g.Proof Since n - mjln < 1, it is suftkient by (19.4)to show the convergence of

n-1

llows immediately from the stated condition and. llnlkmcjBm to zero. Tlzis foKronecker's lemma. .

-1- g s (). a veryA sufficient condition, in view of 2.30, is Bm = Ottlog m) ), ,

mild restriction on the autocovariances.There are two observations that we might make about these results. The first is

to point to the trade-off between the dimensions of dependence and the growth ofthe variances. Theorems 19.1 and 19.2 are easily combined, and it is found that bytightening the rate at wlzich the covariances diminish the variances can growfaster, and vice versa. The reader can explore these possibilities using therather simple techniques of the abote proofs, although remember that the Int,t-mIwill need to be treated as growing with t as well as diminishing with m. Analogoustrade-offs are derived in a different context below.

The order of magnitude in n of Var((Xk),which depends on these factors, can bethought of as a measure of the rate of convergence. With no correlation andbounded variances, convergence is at the rate n-1/2 in the sense that Vartx'nl = .

-1). but from (19.4),Bm= Om-&) implies that Var(Xn)= 0(n-%. lf convergence0(n ,

rates are thought of as indicating the number of sample observations required toget kn close to g, with high contidence, the weakest sufficient conditionsevidently yield convergence only in a notional sense. lt is less easy in some oftlafamoro oonernl reqllltq below to link exolicitlv the rate of convemence with the

Page 315: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convergence in Lp Ntpn'rl 295

degree of dependence and/or nonstationarity; this is always an issue to keep inmind.

Mixing sequences have the property that the covariances tend to zero, and themixing inequalities of j14.2 gives the following corollary to 19.2.

19.3 Corollarylf f.&l'7is either (i)uniformly Q-boundedand uniform mixing withX

y-lrpz-ljll/2 < x (19.5)rrl=1

or (ii) unifonnly fo-bounded for 8 > 0, and song mixing with

- 1 /(2+)Tm W < x,

rn=1(19.6)

then Xn-il.> p..1/2 y.Proof For part (i),14.5 for the case r = 2 yields the inequality Bm ; lB(m . or

a /(z+y xotingpart (ii), 14.3 for the case p = r = 2 + 8 yields Bm S 6IImII2+8a,,,.

hat B S lIAll22+a, the conditions of 19.2 are satisfied in either case. .t

f 19 3(i) is (),,, = O((1og ra)-2-f) for any 6: > 0. ForA sufticient condition or-(1+V)(l+e))sr : > () is suftkient. In the size termi-19.3(ii), m= Ottlog m)

nology of j14.1, mixing of any size will ensure these conditiops. The most signif-icant cost of using the strong mixing condition is that simple existence of thevariances is not sufticient. This is not of course to say that no weak 1awexistsfor Q-boundedstrong mixing processes, but more subtle arguments, such as thoseof j19.4, are needed for the proof.

19.2 Almost Sure Convergence by the Method of SubsequencesAlmost sure convergence does not follow from convergence in mean square (acounter-example is 18.16), but a clever adaptation of the above techniquegyields

a result. The proof of the following theorems makes use of the method ofsubsequences, exploiting the relation between convergence in pr. and convergencea.s. demonstrated in 18.6.

Mainly for the sake of clarity, we first prve the result for the uncorrelated

case. Notice how the condittons have to be strengthened, relative to 19.1.

19.4 Theorem lf fXf lTis uniformly fmbounded and uncorrelated, Xn-1.-y

g,. n

A natural place to start in a sufticiency proof of the strong 1aw is with the

convergence pal't of the Borel-cantelli lemma. The Chebyshev inequality yields,under the stated conditions,

vart/n) s#(IXn- FnI > :) S a Sz (19.7)

for B < x, Fith the probability on the left-hand side going to zero with the@' .right-hand side as n

-->

x. One approach to the probtem of bounding the quantity

Page 316: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

296

#( IXn- Fnl > E, i.o.) would be to add up the inequalities in (19.7)over n. Sincethe partial sums of 1/n form a divergent sequence, a direct attack on these lines

'='-2

1 64 and we can add up the Juhsequence ofdoes not succeed. However, Za.1a =. ,

the probabilities in (19.7),for n = 1,4,9, 16,..., as follows.

Proof of 19.4 By (19.7),

1.64/XP IXn2- Fn2I > e) f a< x.

n2 E(19.8)

The fuzw of fzzr'c Numbers

Now 18.2(i) yields the result that the subsequence (X2,n e N) converges a.s. Theproof is completed by showing that the maximum deviation of the omitted termsfkomthe nearest member of (X 2) also converges in mean square. For each n definen

on2 = max lkk- knzI (19.9)nl Sk<(rl+1)2

and consider the variance of Dnz. Given the assumptions, the sequence of thevartx ) = (1/n2)).1@ tends monotonically to zero. For nl < k < (n+ 1)2 re-arfangement Of the terms produces

(19.10)

and when the sequence is uncorrelated the two terms on the right are also uncorre-lated. Hence

/ 2 l kn'k - knz= (-,

- 1 knz+ j X Xt,p=n2+1

2 2 1 2 k

# - X c) = l -n vart#nz)+ ) X o'zjvart k n v >n2+1

2 2 sn B 2< 1 - -

u + (k- n )-k n t,z

1 1 1 1= B -u -

z < B .s. -

z.

nk & nk (a+ 1)(19.11)

Vartoazl cannot exceed the last tenn in (19.11),and

7) (rl-2-(n+ 1)-2) < F''l@-2- (n+ 1)-2) = 1,n2 n

so the Chebyshev inequality giyes

(19.12)

#XpDnl > o < u < x,

n2 E(19.13)

and the subsequence fDnz, n e INJ also converges a.s. Dn2k l(V-knl1for any kl d (n+ 1)2 and hence, by the triangle inequality,between n an ,

lkk- g I s lknz- Fn2I + Ikk- knzl + lFl - Fn2I

Page 317: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convergence in Lp Notm

.'..:y '(''b

yyj;y''sqjg'?tj(jyjjjjyL'77;:'.. .'

. ..:.J1.

.) 5?.., . E. Jl.' .

'. x . . .

ggg

u Iknz- Fn2I + Dnz + Igk= Fa2I. (19.14)

The sequences on the majorant side are positive and converge a.s. to zero, hencedoes their sum. But (19.14)holds for nl < k < (a+ 1)2 sr (p2,n 8 X1, SOso

that k ranges over every integer value. We must conclude that kn--632''

l1' K

We can generalize the same technique to allow autocorrelation.

19.5 Corollary If (A)1' is uniformly Q-bounded, and

B* = XBm < x,

m=1

where Bm = sup Int,t-mI, then X- -E-> g.. u

Note how much tougher these conditions are than those of 19.2. lt will suffice-1 lo -)-1-) for > 0 Instead of, in effect, having thehere for Bm = Om ( g .

autocovariancesmerely decline to zero, we now require their summability.

(19.15)

Proof of 19.5 By (19.4),Var(Yu)< CB+ lB*)ln and hence equation (19.7)holds in

themodified form,

1.64(,+zB.) lq j6)XP Iknl - ik I > e) f < x. ( .

2n2 E

Instead of (19.11) we have, on multiplying out and tnking expectations,

k 21 nVartxk - knz)= Var j. X Xt - 1 - -- Xnzp=n2+1

2 2 k k 1-n2-1

n - 1= 1 -

v Vart-Ynzl+ - X cl + 2 X X Gt,t-m& P zx.l panaz ,s=1Nn

2 k l-12 n

-- 1 - - X X wt-m.

k k f=nN.l m=l-n2(19.17)

2/k)2(#+ 2#*)/r theThe first term on the right-hand side is bounded by (1 - n ,

2 + 2 2 k 2s+second by k - n )(#+ IB 4Ik , and the third (absolutely)by 2(1 - n l ) .

Adding together these latter tenns and simplifying yields

2 21 1 1 1 n +Vartxk - Xnz)< --j-

y B + 2 u-

y + 1 - -- B

2 21 1 1 1 n +

w-

- B + 2 =-

- + 1 -

zB . (19.18)

ni n +1)-L Z

n + 1)d n + 1)

2 1)2)2 = On-l) so the term in #* is summable. ln place ofNote, (1 - n ln + ,

(19.13)we cgn write

Page 318: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

298 The ww of fzzrgc Numbers

B+ A-I#*XpDnz > s) <

z< x,

n2 E(19.19)

where K3 is a finite constant. From here on the proof follows that of 19.4. w

Again there is a straightforward extension to mixing sequences by direct analogywith 19.3.

19.6 Corollary If (X/ITis eitherti) unifonnly Q-boundedand unifonn mixing withX

y')j1/2 < x (19.20)tl=1

or (ii) uniformly Qu-bounded for 8 > 0, and strong mixing withX

/(2+)Xcu < x,

-=1(19.21)

then kn-.6.'3..-h

0. n

Let it be emphasized that these results have no pretensions to being sharp ! Theyare given here as an illustration of technique, and also to define the limits ofthis approach to strong convergence. In Chapter 20 we will see how they can beimproved upon.

19.3 A Mmingale Weak LawWe now wantto relax the requirement of tinitevariances,andprove fv-convergencefor p < 2. The basic idea underlying these results is a truncation argument. Given

a sequence tXf)Twhich we assume to have mean 0, define Ff = 1(Ix,1ss)Xr,whichequals Xt when IA l f B < c'o, and 0 otherwise. Letting Zt = Xt - Ff, the 'tail

component' of Xt, notice that Ezt) = -EYt4 by construction, and kn = Ys+ Zn.Since F, is a.s. bounded and possesses al1 its moments, arguments of the type usedin j19.1 might be brought to bear to show that X

-i2=

g,y (say).Some otherapproach must then be used to show that Zn-Z'..yg,z=

-g,y.

An obvious techniqueis to assume uniform integrability of (IXrIPI. In this case, supffl zrlr can bemade as small as desired by choosing B large enough, leadin-y(vfcthe Minkowskiinequality, for example) to an w-convergence result for Zn.

A different approach to limiting dependence is called for here. We cannot assumethat J'f is serially uncorrelated just because Xt is. The serial independenceassumption would serve, but is rather strong. However, if we let Xt be a martin-gale difference, a mild strehgthening of uncorrelatedness, this property can alsobe passed on to l'r, after a centring adjustment. This is the clever idea behindthe next result, based on atheorem of Y. S. Chow (1971).Subsequently (seej19.4)the m.d. assumption can be relaxed to a mixingale assumption.

We will take this opportunity to switch to an array formulation. The theoremsare easily specialized to the case of ordinary sample averages (seej18.6), but insubsequent chapters, array results will be indispensable.

Page 319: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convergence in Lp Ntprrrl 299

19.7 Theorem Let t-L,,?L,)be a m.d. array, tcnfla positive constant array, andt/cnl an increasing integer sequence with kn

'1'

x. 1f, for l S p f 2,(a) t lXntlcnt6l'l is uniformly integrable,

kn

(b)limsup Xcn,< x, andn-yx >1

kn

lim77cn2f= 0,(c)n-boo >1

henEylAk,--f.-

0. ot t

The leading specialization of this result is where Xnt = Xtlan, where fA,F,)is a m.d. sequence with 5nt = 5t and ftzr,lis a positive constant sequence. Thisdeserves stating as a corollary, since the formulation can be made slightly moretransparent.

19.8 Corollary Suppose (.X',,?;,J7'is a m.d. sequence, and fbt),lJnl, and (/cnlare

constant positive sequences with an'1'

cxa and kn'1'

x, and satisfying(a) (kxtlbtIPl is uniformly integrable, 1 < p !-; 2,

kn(b)

'/bt

= O(Jn), and>1

kn77b2 = t?(t);(c) ,

>1

-1jg#g J( .(eo g.then an t 1 t

Proof Immediate from 19.7, defining Xnt = Xtlan and cnt = btlan. w

Be careful to distinguish the constants an and kn. Although both are equal to n inthe sample-average case, more generally their roles are quite different. The casewith kn different from n typically arises in tblocking'

arguments, where the arraycoordinates are generated from successive blocks of underlying sequence coor-dinates. We might have kn = Erltxlfor a e (0,1) (g.:jdenoting the largest integer

1-abelow x) where the length of a block does not exceed grl j. For an application ofthis sort see j24.4.

Conditions 19.8(b) and (c) together imply an 1 x, so this does not need to beseparately asserted. To form a clear idea of the role of the assumptions, it ishelpful to suppose that bt and an are regularly varying functions of their argu-

ts. It is easily verified by 2.27 that the conditions are observed if bt - t 15men 1+Ffor f'J>-1,

and an - log n fpr f' =-1.

Infor any ;$k-1,

by choosing an - nparticular,setting bt = 1 for a11 f, an = kn = n yields

IIXkII,--> 0. (19.22)

choosingan = /,z1/7,will automatically satisfy condition (a),and condition (b)willalso hold when bt = Ot ). On the other hnd, a case where the conditions

Page 320: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

300

l-1 tfail is where :1 = 1 and, for t > 1, bt = Zy=l)y= 2 . In this case condition (a)2 O 2 dicting conditionimposes the requirement bn = Oan), so that bn = (Ju), contra

5 for every j3> 0.(b). The growth rate of bt exceeds that of t

Proof of 19.7 Uniform integrability implies that

The Izzw of wr'd Numbers

supftlA%,/cn,lP1f Ix,,,/o,I>>)) ---) 0 as M --- x.

n ,1

One may therefore tind, for tr > 0, a constant Bs < x such that

SuP(1IXn,1(lxa,I>s:cn,)Ilp/cn/jf s. (19.23)

n,f

Define Ynt= Ak,1fIxarlsseo/), and zn,= Xnt - Fn,. Then since Exnt I@n,f-1)= 0,

Xnt = Fn,- F(l%,IFn,/-1)+ Znt - F(Zn,I@n,,-l).

By the Minkowski inequality,

'-x-,js jjy-;k-(,--,-s(r--1v-,,-1))j +177, ,...1 ,>1

kn'A''7znt-fznrl 9n,,-1)) . (19.24)>1 P

Consider each of these right-hand-side terms. First,

j:--7.)-jti,,,/-str--rl,,,,--1)) t,u (>--.'-j(r-r-,(r,--lv,-,,-1.))))ak 1/2

N

= 77e'(l',,r-F(I&fI@,,,,-1))2

,-1p 1/2 k 1/2

< TEYI < #s Xo2,nt

>1 >1(19.25)

The tirst inequality in (19.25)is Liapunov's inequality, and the equality followsbecause (Ynt- ElYntITn,f-l)) is a m.d., and hence orthogonal. Second,

kn kn kn

77 znt- Ftznfl @n,,-l)) < 77Ilzsfslp+ 77lIF(z,,I@,,,,-1)11,>1 P >1 >1

kn kn

<277 lIz,,,1l,f leTcnt.>1 >1

(19.26)

The second inequality here follows because

FIF(zn?IF,,,,-l)137f EE Izn,lP l@n,,-l)) = Fl zn,lP,

from, respectively, the conditional Jensen inequality and the law of iteratedexpectations. The last is by (19.3).

lt follows by (c) that for ir > 0 there exists Ne k 1 such that, for n k Ns,

Page 321: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convergence in Lp Ntprm 301

kn2 < s-2:2A-fnr f: .

>1(19.27)

Putting together (19.24)with (19.25)and (19.23)shows that

kn7'IXU;

K Bz (19.28),..a1 p

k b dition (b).Since r is arbitrary,for n k Ne, where B = 1+ 2,21c,,r < x, y conthis completes the proof. *

The weak law for martingale differences follows directly, on applying 18.13.

k -/-% O u19.9 Corollary Under the conditions of 19.7 or 19.8, Z/21X,a .

If we take the case p = 1 and set cnt = 1/n and kn = n as above, we get the resultthat unifolnn integability of faYt)is sufticient for convergence in probability ofthe sample mean Xn. This cannot be significantly weakened even if the martingaledifference assumption is replaced by independence. If we assume identicallydistributed coordinates, the explicit requirement of uniform integrability can bedropped and Al-boundedness is enough; but of course, this is only because theuniform property is subsumed under the stationarity.

You may have observed that (b) in 19.7 can be replaced bykn

(b')limsup-lxd, < x.

n--x /=1

It suftices for the two tenns on the majorant side of (19.24)to converge in Lp,and the cr inequality can be used instead of the Minkowsk inequality in (19.26)to obtain

kn p kn

E 77(z,,,-F(z,,,I@n,,-l) K c,(2k,,)P-1>7d,.

/=l >1(19.29)

However, the gain in generality here is notional. Condition (b') requires thatlimsuptr,-ox/cjc'i< x, and if this is true the same property obviously extends to

in 19.8 with bt -/7 and an

- nY,flkcuf). For concreteness, put cnt = btlan aswhere j3and y can be any real constants. With kn - ?7t' for x > 0, note that themajorant side of (19.29)is bounded if a(1 + I$)- 'y < 0, independent of the valueof p. This condition is automatically satisfied as an equality by setting an =

k b but note how the choice of an can accommodate different choices of kn.Xt2l t,None the less, in some situations condition (b)is stronger than what we kmow to

be sufficient. For the case p = 2 it can be omitted, in addition to weakening themartingale difference assumption to uncorrelatedness, and uniform integrability tosimple fu-boundedness. Here is the an'ay version of 19.1, with the conditions castin the amework of 19.7 for comparability, although all they do is to ensure thatthe variance of the partial sums goes to zero.

Page 322: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

302 The fww of fwr'c Numbers

19.10 Corollary lf (-L,) is a zero-mean stochastic an'ay with Exntxns) = 0 for ty: s, and

(a) (Xntlcntl is uniformly Q-bounded, andkn

(b)1im 7)c2a,= 0,n--yx >1

thenEtzlxn,-1J-+ 0. c!

19.4 A Mixingale Weak Law

To generalize the last results from martingale differences to mixingales is nottoo difficult. The basic tool is the telescoping series' argument developed inj16.2. The array element Xntcan be decomposed into a tinite sum of martingaledifferences, to which 19.7 can be applied, and two residual components wlch canbe treated as negligible. The following result, from Davidson (1993a),is anextension to the heterogeneous case of a theorem due to Andrews (1988).19.11 Theorem Let the al'ra fAk,,V)=-xbe a Al-mixingale with respect to aconstant array (cnf) . lf

(a) Lxntlcnttis uniformly integrable,kn

(b) limsup X cnt < x, andn--yoo /=1

kn1(c) lim cnt = 0,

n-yx >1

k xwhere kn is an increasing integer-valued function of n and kn'1'

x, then X/:1 nt

-#. 0. a

There is no restriction on the size here. It suffices simply for the mixingalecoefficients to tend to zero. The remarks following 19.7 apply here in just thesame way. In particular, if Xt is a Al-mixingale sequence and fxtlbtj isuniformly integrable for positive constants (/7f), the theorem holds for Xnt =

Xtlan and cnt = btlan where an = l=jbt.Theorems 14.2 and 14.4 give us thecorresponding results for mixing sequences, and 17,5 and 17.6 for NED processes.It is sufficient for, say, Xnt to be fv-bounded for r > 1, and V-NED, for p 2 1,on a a-mixing process. Again, no size restrictions need to be specified. Uniformintegrability of fxntlcntkwill obtain in those cases where ll-kllris finite for

r > 1 and each /, and the XED constants likewise satisfy dnt >) II-L,IIr.A simple lemma is required for the proof:

19.12 Lemma lf the array fxntlcntis uniformly integrable for # 2 1, so is theatray fEt-jxntlcnt J for j > 0.

Proof By the necessity part of 12.9, for any ir > 0 3 8 > 0 such that

Page 323: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convergence in Ap Norm

sup sup jyl.L,/o,l dP < :,n ,t

303

(19.30)

where the inner supremum is taken over all E e 5 satisfying #(F) < 8. Since 5n,t-jg 5, (19.30)also holds when the supremum is taken overF e %not-jSatisfying #(F)< . For any such F,

JsIXntlcntIdP = jyk-jlXntlcntIdp jyIEt-jxntlcnt IJ#, (19.31)

by definition of Et-j.), and the conditional Jensen inequality (10.18).We mayaccordingly say that, for s > 0 H 8 > 0 such that

sup sup jsIEt-jxntlcnt6dP < :,n.t

(19.32)

taking the inner supremum over E e %n,t-jsatisfying #(F) < 8. Since Et-jxnt isFa,f-measurable, uniform integrability holds by the sufticiency part of 12.9. .

Proof of 19.11 Fix an integer j and letkn

Fn./= 77k-qxnt - Et-vj-kxnt).

,--1The sequence l Ynj,@n,n+jJ';=1is a martingale, for each j. Since the array

l kjxnt - Et-vj-txntjlcntlis uniformly integrable by (a)and 19.12, it follows by (b)and (c) and 19.7 that

vnj-f?.t-A 0. (19.33)k X telescoping sum. For any M k 1,We now express Zf21 nf as a

M- 1 kn kn

X Ynj = X Et+M-tXnt- X Et-uxnt, (19.34)j=3-M >1 >1

and hencekn M- 1 kn kn

Xxn,= X Ynj+ X xnt- Etmu-txnt) + Xk-uxnt.>1 j=1-M >1 >1

The triangle inequality and the fal-mixingale property now givekn M- 1 kn kn

E X-tn, s 77FIFnyl + XF1 xnt- A',+v-lAkll + XFI Et-uxnt'>1 j=-M >1 >1

(1.35)

M- 1 kn

< 77E I1k/1 + 2(::Xcnr.j=-M >1

(19.36)

According to the assumptions, the secondmemberon the right-hand side of (19.36)!

-az'N z'

.#'*8N*

. . . . . Q' .. n - .-.3 -

!- - - --

.- .. FY , * *' % , ' P' UIT-

Page 324: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

304

/ for M k Mv. By choosing n large enough, the sum of 2M- 1 terms on the right-

hand side of (19.36)can be made smaller than z1:for any finite M, by (19.33).So,by choosingMk Me we have 171Vzlxkjl<e when n is large enough. The theorem is

now proved since : is arbitrary. w

The fww of zzr'd Numbers

A comparison with the results of j19. 1 is instructive. In an fu-bounded process,the Q-mixingaleproperty would be a stronger fonn o'fdependence restriction thanthe limiting uncorrelatedness specified in 19.2, just as the martingale propertyis stronger than simple uncorrelatedness. The value of the present result is thesubstantial weakening of the moment conditions.

19.5 ytpproxinAable ProcessesThere remains the possibility of cases in which the mixingale property is noteasily established

-perhaps

because of a nonlinear transformation of a AP-NEDprocess which cannotbe shown to preserve the requisite moments for application ofthe results in j17.3. In such cases the theory of j 17.4 may yield a result. On theassumption that the approximator sequence is mixing, so that its mean deviationsconverge in probability by 19.11, it will be sufficient to show that this impliesthe convergence of the approximable sequence. This is the object of the followingtheorem.

19.13Theorem Suppose that, for each m e !N,(Wn/l is a stochastic array and thecentredarray fhmnt- Ehmnl satisfies the conditions of 19.11. If the array l-LrlisAl-approximable by fJln;.'11with respect to a constant array (t&,), and limsupu-,x

k hen Ztjaa, .-P-f-+ 0. nXtltdntf B < =, t t

Establishng the conditions of the theorem will typically be achieved using 17.21,by showing that Xnt is far-bounded for r > 1, and approximable in probability onhmntfor each m, the latter being pz-order lag functions of a mixing array of anysize.

Proof Sincekn kn kn kn

XXn, f X xnt- hmnt, + X hmnt-F('J) + XF('lf) (19.37)>1 >1 >1 >1

by the triangle inequality, we have for > 0

kn kn

#(X.jxn, j> ju P (jX.jxn,- h:7,) j> la)

kn kn

+P (jX.jht - F(I)) j> !a)+ P (!X.jF(7,) I> (a) (19.38)

by subadditivity, since the event whose probability is on the minorant sideimplies at least one of those tnn the majorant. By the Markov inequality,

Page 325: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Convernence in fg Norm 305

kn kn3 mP X tXnr- hmntj > j' f g E X xnt- Jln

>1 >1

kn3 m< XFlkr - u,IF >1

kn3f -, Xdnt vm.

>1(19.39)

#( IXktzkEhmnt)I > &3) is equal to either 0 or 1, according to whether the non-stochasticinequality holds or does not hold, By the fact that Exnt) = 0 andAl-approximability,

IEhmntt I = IExnt) - E (Mnf) I < FIXnt - hmntI < dmvm, (19.40)and hence

kn kn knXF(''',,rl < 77IEhmn 1,< Tdntvm< Bvm.>1 >1 >1

We therefore find that for each m e EN

knlimsup# Txnt >

n-+x >1

(19.41)

kn3B m m 8S vm + limsup # X hnt - Ehntjj > y + 1(svm>&3)8

n--)x >1

3#= v,u + 1(svs,>/a), (19.42)

by the assumption that hmntsatisfies the WLLN for each m e EN.The proof iscompleted by letting m -- x. w

Page 326: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

20The Strong Law of Large Numbers

20.1 Technical Tricks for Proving LLNS

In this chapter we explore the strong 1awunder a range of different assumptions,omindependentsequencestoner-epochdependentfunctionsof mixingprocesses.

Many of the proofs are based on one or more of a collection of ingenious technicallemmas, and we begin by studying these results. The reader has the option ofskpping ahead to j20.2, and referring back as necessary, but there is somethingto be said for forming an impression of the method of attack at the outset. Thesetheorems are found in several different versions in the literature, usually in aform adapted to the pmicular problem in hand. Here we will take note of theminimal conditions needed to make each trick work.

We start with the basic convergence result that shows wy maximal inequalities(for example, 15.14, 15.15, 16.9, and 16.11) are important.

20.1 Convergence Iemma Let (A)Tbe a stochastic sequence on a probabilityspace (f,T,#), and 1et Sn = Z:=IX, and So = 0. For (t) e f, let

Mll) = inf sup ISjb = Smfk I .

m j>m(20.1)

lf P(M > e) = 0 for all E > 0, then Sn-6-->

S.

Proof By the Cauchy criterion for convergence, the realization (,%()))convergesif we can find an m such that ISj -

u%1 ; e:for all j > m, for a11 : > 0; in otherwords, it converges if M4(t)) < c, for al1 : > 0. .

This result is usually applied in the following way.

20.2 Corollary Let lc,)Tbe a sequence of constants, and suppose there exists p >

0 such that, for every m 0 and n > m, and every e > 0,

Kx-xn

P max lSj - 5k1 > e f - X c(, (20.2)mmusn E >,1+1

where K is a t'initeconstant. If X7wjct< x, then Sn

-16+

S.

Proof Since tcf) is summable it follows by 2.25 thatlim---jxzOf =m+1c(

= 0. Let Mbe the r.v. in (20.1). By definition, M ; supom jSj - Sm1for any m > 0, and hence

P(M > e) lim sup ISj - SmI > erm,-nt j>m

Page 327: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong aw of fxzr'c Numbes 307

r=

<--

lim X c pt = 0 ,

perm-- /=,n+1

the limiting case of (20.2).20.1 completes the

(20.3)

where the final inequality isproof. w

Notice how this proof does not make a direct appeal to the Borel-cantelli lemma toget a.s. convergence. The method is closer to that of 18.3. The essential trickwith a maximal inequality is to put a bound on the probpbility of all occurrencesof a certain type of event as we move down the sequence, by specifying a probabil-ity for the most extreme of them.

Since S is tinitealmost surely, Xn.-61-:

0 is an instant corollary of 20.2. How-ever, the result can be also used in a more subtle way in conjunction withKronecker's lemma. lf :=1F, converges a.s., where f1',1 = (Xtlat and (th) is a

-1:7 x-j-.

0 Thissequence of positive constants wiyh an,1-

x, it follows that an .1

t .

*

is of course a much weaker condition than the convergence of L1=jXt itself. Mostapplications feature at = t, but the more general formulation also has uses.

There is a standard device for extending a.s. convergence to a wider class ofseqtlences, once it has been proved for a given class: the method of equivalent

sequences. Sequences (A)';' and fF,):' are said to be equivalent ifX

X Pxt y, J',) < x. (20.4)>1

By the first Borel-cantelli lemma (18.2(i)),(20.4)implies P(Xt :y: l, i.o.) = 0.ln other words, only on a set of probability measure zero are there more than afinite number of f for which X,((J)) # F,4)).-

20.3 Theorem lf Xt and Fr are equivalent, L1=j(Xt - Fr) converges a.s.

Proof By definition of equivalence and 18.2(i) there exists a subset C of f1, with#(f - C) = 0, and with the following property: for all (J) e C, there is a finitenn((l))such that X,()) = iX(9)for t > ,7(,(). Hence

n n0(tl9

X(m(t,))- r/t,))) = X (-Y/)) - F,(t,))), V n k /70(t,)),

>1 >1

and the sum converges, for a11(l) e C. .

The equivalent sequences concept is often put to use by means of the followingthorem.

20.4 Theorem Let (Al7 be a zero-mean random sequence satisfyingX

XE Ix,IPla; < x (20.5)>1

for some p k 1, and a sequence of positive constants ttzf) . Then, putting 17 forthe indicator function 1( Ix,Isn)()),

Page 328: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

308 lhe f-aw of fwr'd Numbers

X#(I#,l > a < x,

>1

77lF(Al7ll lat < =,

>1

(20.6)

(20.7)

and for any r k p,X

77F(1A1 rliyaL< x. a>1

(20.8)

The idea behind tls result may be apparent. The indicator function is used totruncate a sequence, replacing a member by 0 if it exceeds a given absolute bound.The ratio of the tnlncated sequence to the bound cannot exceed 1 and possesses a1lits absolute moments, while inequality (20.6)tells us that the tnmcated sequenceis equivalent to the original under condition (20.5).Proving a strong 1aw under(20.5) can therefore be accomplished by proving a strong 1aw for a truncatedsequence, subject to (20.7)and (20.8).Proof of Theorem 20.4 We prove the following three inequalities:

#( IXtl > J,) F(1 - 11)

/ E I.Y,1#(1- 1/))/tz(

<Ekxt6pllal;. (20.9)Here the inequalities are because lXt IP/Y> l for (.f) G ( IXtI> tz,l, and becausF(l.Y,If71J)is non-negative, respectively.

IExtsll IIat = IF(.&tl - 1/) Ilat

< F(1Al (1 - 1/))/J,

s E lx,lP(1 - j.;))Ia;

< E6xt3p4lal;. (20.l0)

The equality in (20.10) is because Ex = 0, hence EXt$1) = -F(m(l - 1/)). Thefirstinequality is the modulus inequality, and the second is because on the eventl 1Xtl > J,l, (11,1lat? IAIlat for p 1. Finally, by similar arguments to theabove,

F(lAl rsLllalS F(I.)tI7'1tl)/t4for p < r

<E Ix,I#)/t7f. (20.11)The theorem follows on summing over t. .

There are a number of variations on this basic result. The tirst is a version formartingale differences in terms of thseone-step-ahead conditional moments, where

Page 329: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong fzzw of fwtzr'd Numbers 309

the weight sequence is also allowed to be stochastic. The style of this result isappropriate to the class of martingale limit theorems we shall examine in j20.4,in which we establish almost-sure equivalence between sets on which certainconditions obtain and on which sequences converge.

20.5 Corollary Let f.X),Tf)be a m.d. sequence, let (B$) be a sequence of positiveT/-l-measurable r.v.s, and for some p k 1 let

o

D = (t): XF(I -Y,IP l@,-1)((,))/M(t,))< x e >.>1

o

D1 = ): X#(Iml> <,1 @,-1)()) < x e ?/>1

Also define

(20.12)

(20.13)

r

Dz = ): X IF(m11I1,-1)() I/WXt,))< x e 5>1

xD? = : XF(IA)1 rll IT,-1)())/M() < x e @,

>1

(20.14)

(20.15)

and 1et D' = D1 r'nD1 ch D3. Then P(D - D') = 0. ln particular, If #(D) = 1 then#(D') = 1.

Proof It suffices to prove the three inequalities (20.9),(20.10), and (20.11) forthe case of conditional expectations. Noting that Ext ITf-l) = 0 a.s. and usingthe fact that Wf is Ff-l-measurable, a11 of these go through unchanged, exceptthat the conditional modulus inequality 10.14 is used to get (20.14). lt followsthat almost every E D is in D'. .

Another version of this theorem uses a different truncation, with the truncatedvariable chosen to be a continuous function of Xt; see 17.13 to appreciate whythis variation might be useful.

20.6 Corollary Let t.Yf)Tbe a zero-mean random sequence satisfying (20.5)for

p k 1. DefineXtlat, IXtI f at

l'f = Xtlz/tzf+ (.X)/I.X)I)(1- 17) = 1, Xt > at (20.16)

-1, Xt <-at.

Then,fr

7-7IEj'tb I>1

(20.17)

Page 330: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The zzw of fwtzr'd Numbers

X

XF1 1,,1'< =, r p.

,--1(20.18)

Proof Write hat to denote atxtlk XtI. Inequalities (20.10)and (20.11) of 20.4

are adapted as follows.f

IEtt l = lEXt1.1 + (1 - 12)(+J) llat

= lExt - (+tz,))(1- 1t) Ilat

< E IXt1(1 - Skllat+ E l1 - 1/ j< E( I.)1P(1 - 17))/J( + #( lX/I> at4

S 2F( IXt l#)/J(. (20.19)

The second equality in (20.19) is again because Ext) = 0. The first inequality isan application of the modulus inequality and triangle inequalities in succession,and the last one uses (20.9).By similar arguments, except that here the crinequality is used in the second line, we have

F41 1'Jl r) < F!.Y,1t + (1 - 1tJ)(+tz1rlaL< 2r-1(sj xgl/jrfa; + sj (1 - 1) jr)

S 2r-1(F( IXt3P3L)lal;+ P IXtI > J,)) for p ; r

zre' jxt1pjtai. (20.20)

The theorem follows on summing over t as before. .

Clearly, 20.5 could be adapted to this case if desired, but that extension will

not be needed for our results.The last extension is relatively modest, but permits summability conditions for

nonns to be applied.

20.7 Corollary (20.6),(20.7),(20.8),(20.17),and (20.18)a1l continue to holdif (20.5)is replaced by

77F(lAI#)1/t#z7A< cxa

>1(20.21)

for any q 2 1.

Proof The modified forms of (20.9),and of (20.19)and (20.20)(say)are

#(I#,l > a K P IA1 > h)1& S E l-Yfl#)/Jf)1&

3Iq < J,1&(.s(jxtjpyaqlllqlE(Yt4I S lE(Yt4I ,

(20.22)

(20.23)

Page 331: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong zzw of zzr'c Numbebs 311

(20.24)

where in each case the first inequality is because the left-hand-side member doesnot exceed 1. w

For example, by choosing p = q the condition that the sequence (llxf/tzflIp1issummable is seen to be suftkient for 20.4 and 20.6.

20.2 The Case of lndependence

The classic results on strong convergence are for the case of independentsequences. The following is the three series theorem' of Kolmogorov:

e'Ii',1r < (Fl yf1r)l& < zr/qE Ix,1pyalklq1

20.8 Three series theorem Let (aV)be an independent sequence, and Sn = k1=3Xt.Sn

-61-:

S if and only if the following conditions hold for some fixed a > 0:oo

X#(l-,I > a) < =,

>1

77A'(1(Ix,Isc)-Y,) < x,

>1

X

XVar(1lIx,I<u)A) < =. n

,-1

(20.25)

(20.26)

(20.27)

Since the event (&--) 5') is the same as the event (5'n+1

.--)

S'J, convergence isinvariant to shift transformations. lt is a remote event by 13.19 and hence inindependent sequences occurs with probability either 0 or 1, according to 13.17.20.8 gives the conditions under which the probability is 1, rather than 0. Thetheorem has the immediate corollary that Snlan -- 0, whenever an

'1'

x.

The basic idea of these proofs is to prove the convergence result for the trun-cated variables 1( lxrlsicj.X),and then use the equivalent sequences theorem toextend it to Xt itself. In view of 20.4, the condition

7

XFlx,IP < x, 1 < p < 2, (20.28)>1

is sufficient for convergence, although not necessary. Another point to noticeabout the proof is that the necessity part does not assign a value to a. Conver-gence implies that (20.25)420.27)hold for cv:ry a > 0.

Proof of 20.8 Write Fr = lf Ix/lsktzl.Yf,so that the summands in (20.26)and (20.27)are respectively the means and variances of Fr. The sequence (Yt- F(Ff)1 is inde-pendent and hence a martingale difference, so that Sn'- Sm'= Z:=,,,+1(Ff- EYt)4 is

a martingale for tixed m k 0, and Z7=m+1Var(i'r)= Varts' - Sm'4.Theorem 15.142 v (r) and K = 1,combined with 20.2, setting p = 2 in each case and putting cf = ar t

together yield the result that Sn'-6...y

S' when (20.27)holds. lf (20.26)holds,

this further implies that E7=1Fr converges. And then if (20.25)holds the

Page 332: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The fzzw of fxzr': Numbers

sequences f#fl and fFfl are equivalent, and so Sn -.F...y S, by 20.3. This provessuftkiency of the three conditions.

Conversely, suppose Sn-'1-.y

S. By 2.25 applied to xL()) for each (l) e f1, itfollows that lim--oxz/wruxf = 0 a.s. This means that P(1AI > a, i.o.) = 0, for anya > 0, and so (20.25)must follow by the divergence part of the Borel-cantellilemma (18.2(ii)).20.3 then assures us that Z7.1Fr also converges a.s.

2 V y ) If sl .-+

x as n--y cB, ).!(Ff - E(Yt)4Isn fails toWrite sn = Z')=I ar( t .

n

converge, but is asymptotically distributed as a standard Gaussian r.v. (This isthe central limit theorem - see 23.6.) This fact contradicts the possibility of

2 i bounded in the limit, which is equiv-1J=IF? converging, so we conclude that sn salent to (20.27).

Finally, consider the sequence (I', - EYt) l . This has mean zero, the same vari-

ance as 1$, and #(I%- F(F,) I > 1a4 = 0 for a11t. Hence, it satisfies the condi-tions (20.25)-420.27)(in respect of the constant 2a) and the sufficiency part ofthe theorem implies that Z2=1(Fr - F(Ff)) converges. And since Z2=1l'/ converges,(20.26) must hold. This completes the proof of necessity. w

The sufficiency part of this result is subsumed under the weaker conditions of20.10 below, and is now mainly of historical interest; it is the necessity proofthat is interesting, since it has no counterpart in the LLNS for dependentsequences. In these cases we cannot use the divergence part of the Borel-cantellilemma, and it appears difficult to rule out special cases in which convergence isachieved with arbitrary moment conditions. Incidentally, Kolmogorov originallyproved the maximal inequality of 15.14, cited in the proof, for the independentcase; but again, his result can now be subsumed under the case of martingaledifferences, and does not need to be quoted separately.

Another reason why the independent case is of interest is because of the follow-ing very elegant result due to Lvy. This shows that, when we are dealing withpartial sums of independent sequences, the concepts of weak and strong conver-gence coincide.

20.9 Theorem When fA-r)is an independent sequence and Sn = Z)=1Xt,Sn -P- S ifand only if Sn

-i1.-:

S.

Proof Sufficiency is by 18.5. It is the necessity that is unique to the particularcase cited. Let Smn= 1=m+jXt,and for some 6: > 0 consider the various ways inwhich the event ( lSmnl > :) can occur. ln particular, consider the disjointcollection

max Is'xj ; 2e,m V<k- 1

For each k, this is the event that the sum from m onwards exceeds 2: absolutelyfor the hrst time at time k, and thus

n .

U max lvmj l < 2:,kmm-j #l<./K1-1

Iskzl> 2: = max 1kj1 > 2: , (20.29)mjfn

Page 333: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong fww of fatzr'e Numbers 313

where the sets of the union are disjoint. lt is also the case that

U max IsmjIs 2e, I5-,,,tl> 2: ra l Isknl K :1 i l 1smnI> sl , (20.30)k=pl+1 rrlKw/sk-l

where the inclusion is ensured by imposing the extra condition for each k. Theevents in this union are still disjoint, and by the assumption of an independentsequence they are the intersections of independent pairs of events. On applying(20.29), we can conclude from (20.30)that

P max IsmjI > 2: rnin #( Ivkn I < e)m S./ S n m < k fn

n

< 77# max IsmjIk.=?n+1 rnf.jf-l

< #( IsmnI

< Jz, I5-pol >2j#(

IsknI < :)

(20.31)

lf Sn .T-(-) S, there exists by definition m k 1 such that

#( ISmnl (20.32)

for all n > m. According to (20.32),the second factor on the rninorant side of(20.31) is at least as great as 1 - :, so for 0 < : < 1,

Letting n --- x and

# max 1skyl> 2: < . (20.33)1 - :mzjsn

then m -- x, the theorem now follows by 18.3. w

This equivalence of weak and strong results is one of the chief benefits stemmingfrom the indepehdence assumption. Since the three-series theorem is equivalent toaweaklaw according to 20.9, we alsohave necessary conditions forconvergenceinprobability. As far as sufticiency results go, however, practically nothing islost by passing from the independent to the martingale case, and since showingconvergence is usually of disproportionately greater importance than showingnonconvergence, the absenc of necessary cpnditions may be regarded asmall priceto pay.

However, a feature of the three series theorem that is common to all the stronglaw results of this chapter is that it is not an array result. Being based on theconvergence lemma, all these proofs depend on teaming a convergent stochasticsequence with an increasing constant sequence, such that their ratio goes to zero.Although the results cah be written down in array fonn, there is no counterpart ofthe weak 1aw of 19.7, more general than its specialization in 19.8.

20.3 Martingale Strong Laws

Martingale limit results are remarkably powerful. So long as a sequence is a mart-lnaul,xalt-s-r/-nc.s- nn oarth-r roetrlr-tiranq nn ;tq aonenaencenre renllired nnd the

Page 334: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The fzzw of fzzr'd Numbers

moment assumptions called for are scarcely tougher than those imposed in the inde-pendent case. Moreover, while the m.d. property is stronger than the unconrlated-

ness assumed in j19.2, the distinction is vel'y largely technical. Given the natureof econometric time-series models, we are usually able to assert that a sequenceis uncorrelated because it is a m.d., basically a sequence which is not fore-castable in mean one step ahead. The case when it is uncorrelated with its ownpast values but not with some other function of lajged information could arise,but would be in the nature of a special case.

The results in this section and the next one are drawn or adapted chiefly f'romStout (1974)and Hall and Heyde (1980),althoughmany of theideasgo back toDoob(1953). We begin with a standard SLLN for f'z-bounded sequences.

o20.10 Theorem Let (.Yf,t.r)7'be a m.d. sequence with variance sequence (c1), and(th) a positive constant sequence with at

'lh

cxa. Snlan-.1%

0 ifX

77c2//2< x. u (20.34)t t>1

There are (at least) two ways to prove this result. The first is to use the mart-ingale convergence theorem (15.7)directly, and the second is to combine the maxi-ma1 inequality of 15.14 with the convergence lemma 20.2. ln effect, the secondline of argument provides an alternative proof of martingale convergence for thesquare-integrable cse, providing an interesting comparison of techniques.

First proof Detine Tn = l=jxtlat,so that (Tn,5nIis a square-integrable martin-gale. We can say, using the norm inequality and orthogonality of (xY)),

x 1/22 1/2 2 2supFl TnI S supern) = Xcr/l, < =,

n n >1(20.35)

leading directly to the conclusion Tn.-->

F a.s., by 15.7. Now apply the Kroneckerlemma to the sequences fL() ) for ) e f1, to show that Snlan -.Q1->0. w

Second proof For m k 0, fTn- Tm, ?L) is a martingale with

E(F - F )2 =mwjyltlalt. (20.36)

2 2/ 2 Finally, apply the KroneckerApply 15.14 fop = 2, and then 20.2 with ct = Gt at.lemmaas before. w

Compare this result with 19.4. If VartX/l = (US B < oo, say, then setting at =

' x 2 2 x 2t, we have Z tzzjntlt < #Z,=11// = 1.64# < x, and the condition of the theoremis satisfied, hence Xn= Snln -F-.-h 0, the same conclusion as before. But theconditions on the variances are now a 1otweaker, and in effect we have convertedthe weak law of 19.1 into a strong law, at the small cost f substituting the m.d.assumption for orthogonality. As an example of the general formulation, supposethe sequence satisfies

Page 335: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong fzzw of fzzr.d Number

X

Elxl)l?' < x.X t>1

(20.37)

We cannot then rell upon knconverging to zero, but (puttingat = tl) we can show-2 j jjj (jo so.that n E7=1X, = Xn n w

The limitation of 20.10 is that it calls for square integrability. The next stepis to use 20.4 to extend it to the class of cases that satisfy

Co

XFI xtI#/Jf < x

>1(20.38)

for 1 < p K 2, and some (Jfl'1'

x. It is important to appreciate that (20.38)forp < 2 is not a weaker condition than for p = 2, and the latter does not imply theformer. For contrast, consider p = 1. The Kronecker lemma applied to (20.38)implies that

Fl

-1y')F1x 1 ---/ 0.an t>1

(20.39)

For an - rl, such a sequence has got to be zero or very close to it most of thetime. In fact, there is a trivially direct proof of convergence. Applying themonotone convergence theorem (4.9),

N

E lim tzn-l Is I < E lim c-l :7 I-Y,In nn--yx n-x >1

= 1im c-n1>7e'lx,l. (20.40)

n--yx >1

For any random variable X, E I.YI= 0 if and only if X = 0 a.s.. Notlzing more isneeded to show that Snlan converges, regardless of other conditions.

Thus, having latitude in te value of p for which the theorem may hold is really

a matter of being able to trade off the existence of absolute moments against therateofdampingnecessrytomakethemsummable. Wemaymeetinterestingcasesinwhich (20.38)holds for p < 2 only rarely, but jince this extension is availableat small extra cost in complexity, it makes sense to take advantage of it.

20.11 Theorem If l.#),X)7is a m.d. sequence satisfying (20.38)for 1 f p ; 2,Snlan

--1t.:

0.

Proof Let l'f = 1(Ix,I>,)Xs and note that (aYfl and fFf) are equivalent under(20.38), by 20.4. Ff is also Tf-measurable, and hence the centred sequencef4,;,), where Zt = Ff - A'(l'rlS-1), is a m.d. Now,

Ezi) = F(e'(zlI@,-1))

=F(F(I'lI@,-1)- F(rfl 1,-1)2)

Page 336: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

316 The f-aw of fzzr'c Numbers

=s(r2,)- F(F(y,j y,-j)2). (20.41)

'= EYlllal < x and so, sinceAccording to 20.4 with r = 2, (20.38)implies that /.1 t t ,

2 < EY2) by (20.41),Ezt) ,

Ezlyal < x.X t ,

>1(20.42)

By 20.10, this is sufficient for Llzzjztlat-.6.1..2/

<%t, where 5'l is some randomvariable. But

n l n

'Xz,/z,= 7-)Ytlat - >'-A'(i',j 5t-L)lat.

>1 ,..-1 ,-z1

(20.43)

By 15.13(i), (20.38)is equivalent toX

A7el.&lplT,-l)/4' < x, a.s.>1

(20.44)

According to 20.5, (20.44)implies that Z7.11F(l'/ITr-l) tlat< x, a.s. Absoluteconvergence of a series implies convergence by 2.24, so we may say that

'/=1F(I$l st-Lllat -E1-.:/Sz. Hence, E7=1Ytlat -6YA ,5'1

+ Sz and sot7n-l7=l l'l

-6-.3.-:

0by the Kronecker lemma. lt follows by 20.3 and the equivalence of Xt and Ffimplied by (20.38)that Snlan

-61.+

0. w

Notice that in this proof there are no short cuts through the martingale conver-gence theorem. While we know that Llzubxtlatis a martingale, the problem is toestablish that it is unifonuly fal-bounded, given only information about the jointdistribution of (A)), in the fonn of (20.38).We have to go by way of a result for

p = 2 to exploit orthogonality, wlch is where the truncation arguments come inhandy.

20.4 Conditional Variances and Random WeightingA feature of martingale theory exploited in the last theorem is the possibility ofrelating convergence to the behaviour of the sequences of one-step-ahead condi-tional moments; we now extend this principle to the conditional variances

2 5 ) The elegant results of this section contain those such as 20.10 andE(Xt 1 t-1 .

20.11.The conditional variance of a centred coordinate is the variance of the innova-

tion, that is, of Xt - Ext IA-1),and in some circumstances it may be more nat-ural to place restrictions on the behaviour of the innovations than on the orig-inal setuence. ln regression models, for example, the innovations may coaespondto the regression disturbances. Moreover, the fact that the conditional momentsare Tr-l-measurable random variables, so that any constraint upon them isprobabilistic, pennits a generalization of the concept of convergence, followingthe results of j15.4; our confidence in the summability of the weighted condi-tional variances translates into a probability that the sequence converges, in the

Page 337: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Fe Strong fzzw of zzrge Numbers 317

manner of the following theorem. A nice refinement is that the constant weight

sequence tt7flcan be replaced by a sequence of Tf-l-measurable random weights.

zo-lzrfheoreml-et (Xf,Trlrbeam.d. sequence, (B$Janon-decreasingsequenceofpositive, Tr-j-measurable r.v.s, and Sn = 7=1Arf.Then

-' F(xll @,-1)/+1 < xrntm, '1'

xj-

jsnlwn--0j

= 0. u (20.45)#X>1

The last statement is perhaps a little opaque, but roughly translated it says thatthe probability of convergence, of the event l5',,/Gn -- 0), is not less than thatof the intersection of the two other events in (20.45).In particular, when oneprobability is 1, so is the other.

Proof If (A) is a m.d. sequence so is (xY,/J7f), since W$is Ff-l-measurable, andTn = 1=jXtIWtis a martingale. For ) e f, if L((l))

--> F()) and W%() 1 x then&()/lL() ---0 by Kronecker's lemma. Applying 15.11 completes the proof. .

See how this result contains 20.10, corresponding to the case of a fixed, diver-gent weight sequence and a.s. summability. As before, we now weaken the summ-ability conditions from conditional variances to pth absolute moments for 1 f p <2. However, to exploit 20.5 outside the almost sure case requires a moditkation

to the equivalent sequences argument (20.3),as follows.

20.13 Theorem If (-YfJand (Ff) are sequences of Tf-measurable r.v.s,

# XPCXt y. l'f11,-1) < c,o A X Xt - F,) converges>1 >1

(20.46)

Proof Let Et = (A # Ff) e 5t, so that P(Xt y: F,l8h-1) = F(1s,l 1,-1). According to15.13(ii),

Cr Cr

# ): XF(1s,I877,-1)() < c,o A (t): X 1s,() < x = 0.>1 >1

(20.47)

But E7.1 1s,((l))< cxl means that the numberof coordinates for which Xt :#: Ff()) ist'i ite and hence

''o

CX() - F,())) < x. (20.47)therefore implies (20.46).wn , t = 1 t

Now we are able to prove the following extension of 20.11.

20.14 Theorem For 1 f p f 2, let F1 = (X7=1F( lXt1:lT/-1)/Ji$' < x l and Ez =

(J'IS'1*

XJ. Under the conditions of 20.12,

#((F1 f'''h F2) - (&/W% --> 0J) = 0. (20.48)

Proof The basic line of argument follows closely that of 20.11. As before, 1et 1$

= 1(Ix,Isu,)X,, so that Zt = J',- F(F,I1,-1) is a m.d. and

F(z2I@f-1)= F(Fl 1@f-1)- (F4r,1 #t-1))2t

Page 338: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The fzzw of fzzr': Numbers

2,f E't I ,-1), a-s. (20.49)

Applying 20.5 and the last inequality,

# E3 - XF(z2, I;/.,)/J.y2t < x = 0. (20.50)>1

lt follows by 15.11 and the fact that Ej - C (Fl - D) QJ (D - C) that

P A'1 - Xzr/B'r -- 5'1 = 0,>1

(20.51)

where & is some a.s. finite random variable. A second application of 20.5 givesX

P Fj - :-.2IF(F,I@r-1)l/B < c.o = 0, (20.52)>1

which is equivalent (by 2.24) to

- s(y,Iy,-,)/u,, -- s, 1- o,p sl - x-1 )(20.53)

where S2 is another a.s. finite r.v. And a third application of 20.5 together with20.13 gives

X *

P A'1 - Xx,-X r,-+

s? = 0,>1 N1

(20.54)

for some a.s. finite r.v. S?. Now (20.51),(20.53),(20.54),the detinition of Zt,the Kronecker lemma and some more set algebra yield, as required,

*

0 = P e'1 - XF,/u$ -+ 5'1+ sz fn E1>1

=# (F1 r7 E1) - <lx i',.-+

o>1

= PEj f'7eil - t,5k/en--> OJ). . (20.55)

20.5 Two Strong Laws for Mixingales

The mmingale difference assumption is specialized, and the last results are notsufficient to support a general treatment of dependent processes, although theyare the central prop. The key to extending them, as in the weak law case, is themixingale concept. In this section we contrast two approaches toproving mixingalestrong convergence. The tirst applies a straightforward generalization of themethods introduced by McLeish (1975a);see also Hansen (1991,1992a) for related

Page 339: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong f-zzwof z7rg' Numbers 319

results. We have two versions of the theorem to choose from, a milder constrainton the dependence being available in return for the existence of second moments.

20.15 Theorem Let the sequence (m,@f)=-xbe a fv-mixingale with respect toconstants (cfl , for either

(i) p = 2, with mixingale size-z1,

or(ii) 1 < p < 2, with mixingale size

-1.

If X*tczlc'f

< cxo then Sn-6:4

S.

Proof We have the maximal inequality,l

E max ISIJ'?< KTcq',1 j < n /= 1

where K is a tiniteconstant. This is by 16.10 in the case of (i) and 16.11 incase (ii).By relabelling coordinates i't can be expressed in the fonn

(20.56)

E max lsj - smIP < K X cpt1 j n 1=rrl+1

for any choice of m and n. Moreover,

(20.57)

P max Iuh- SmI > e = # max Isj - smIJ' > e

m <j < n m<jn

1 p< ,--E max ISj - Sm1 (20.58)m<jn

by the Markov inequality. Inequalities (20.57)and (20.58)combine to yield(20.2), and the convergence lemma 20.2 now yields the result. w

We can add the usual corollary from Kronecker's lemma.

20.16 Corollary Let fxtlat,stloloosatisfy either (i),or (ii)of 20.15 with respectto constants fctlatt, for a positive sequence fat ) with at

'1h

x. lfX

'lcnlan < x,f t>1

(20.59)

then Snlan-1->

0. u

The second resuli exploits a novel and remarkably powerful argument due toR. M. de Jong (1992).20.17 Theorem Let (Xf,@,) be an fv-bounded, Al-mixingale with respect to con-stants fctj for r 1, and let (Jfl, (#f) be positive constant sequences, and fMt Ja positive integer sequence, with an

,1'

x. lfco n

TM exp'-zlanll

3lMlTBl < x, (20.60)n n tn=1 1.=1

Page 340: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

320 The facw of f-zzrpdNumbers

X

:7 sl-rssx,Irlat < x,t

>1

X

XLutctlat< x,

,-1

(20.61)

(20.62)

xwhere ((m),,,=oare the mixingale coefficients, then Snlan-1'..-j

0. u

Here (#,J and fA1)) are chosen freely to satisfy the conditions, given (tztJandfcrl, which suggests a considerable amount of flexibility in application. Thesequence (#,1 will be used to define a truncation of t#f1, the role which wasplayed by (tz,)in 20.11. The most interesting of the conditions is (.20.62),wlaichexplicitly trades off the rate of decrease of the mixingale numbers with that ofthe sequence fctlatj.This approach is in contrast with the McLeish method ofdefining separate summability conditions for themomnts and mixingale numbers,

as detailed in j16.3.B f 1 start by noting thatProof Writing 1 t or ( 1x,lxs,),

E X = Etsj1t,Xt + Et+j 1 - 11)Xt.t+j t

Hence, we have the identity

X = (Et+Mt-j llXr - Et-M:$BtXt)+ (f+M,-1(1 - li/lA - Et-Mtll - 1()Xf)t

+ xt - e',+,r,-1.;)+ Et-u7, (t0.63)

and, by the usual ttelescoping sum' argument,

Mt- 1Bx

- ; $Bx= :- z,, (20.64)Ef+M,-l 1 t t ,-.A,z, t t j

./=l-Mt

BX - E IW-Y and tzys?h+ylis a m.d. sequence. Note thatwhereZjt = Et.t t t ?+y-1 t t,

14,I f lBt a.s., by a double application of 10.13(ii). Summing yields

p n Mt- 1 nX-Y, = X X zp + 77F,+>,-1(1 -

1Bt%t

>1 J=1 jzzl -Mt -1

n l

- ThEt-utj -1Of)Xt

'+ F7(Xt - Et.vut-xtj>1 ,...1

+ xEt-uy>1

=ubn + szn-

<%n

+ s4n+ ssn. (20.65)The object will be to show that Sknlan -.6-.A 0 for k = 1,...,5.

Stming with Sjn, the main task is to reorganize the double sum. lt can beverified by inspection that

Page 341: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong fxzw of f-zzrge Numbers 321

n Mt- 1 Mn- 1 n

77 77 zp - 77 Yzpr=l j=-Mt j=1 -Ma t-nqj

Mn- 1 n -M2 Mn- 1 qj- 1

- X Xzy?- 77 + X Xzyf.j=3 -Mn N:1

'=1

-Mn j=M2 >1(20.66)

where qj = 1 for -M1 < j < M1, and qj = t for -Mt < j f -Mt-j and Mt-j f j < Mt,

for/ = 2,...,,,. Note that, for arbitrary numbers x1,...,xk, l l t-lxfI > EJ cUi ( lxi1> tlkk. Hence by subadditivity, and Azuma's inequality (15.20),I = 1

Mn - 1 n

P I.S3nI > an f X P XZy, > anzlnMnj=1 -Mn >1

-M; Mn- 1 qj- 1

+ 7) + X P Xz, > anelnun'=1-Mn =Af2 >1

,z n

f zfaexp-rzlalnl

(32152'-'NXBlt>1

-Af2 Mn - 1 qj- l

252/32A/2 ; #2+ :j + )j exp-e

n n ,

'=1-Mn .j=Jf2 >1

la 32M2 TBl (20.67)f 4Mnexp -:

n n t .

>1Under (20.60),these probabilities are summable over n and so Sjnlan

--1-4

0 by thefirst Borel-cantelli lemma.

Now let (l'r) be ay integrable sequence and detinen

,7,1- X Ytlat.,-1

(20.68)

By the Markov inequality,

1 aP max IS1 -

<%m'

I > er S ,-E max Ifj - SmIsm<jxn rrl<w/Kn

(20.69)

! n

< - 7-)E II',Ilat.E

rcal

XA'IYtllat < =,

>1(207U)

Page 342: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

322

then 5',l-6J-> f by an application of 20.2, and hence Sllan--t!-

0 by Kronecker' slemma. We apply tltis result to each of the remaining terms. For Szn. put l', =

B d note thatEr+M,-lX(1- 1 J), an

The fww of fzzr'd Numbers

?; n l

F7FIF,1lat < F7A'1(1- 114Xtl/J/ < A7#1,-rF1x,$r/cr, (20.71)>1 >1 >1

B IB I f lX (1 - 1)/#/l r. S5n is dealt with in ex-using the fact that IXf(1 - 1t)t t t

actly the same way. For s%n and %n,put successively Ff = Xt - Et-t-jxt and Ff =

Et-utxt, and by the mixingale assumption,n n

77FIF,1lat f 77ctlatjLut. (20.72)>1 >1

The proof is completed by noting that the majorant terms of (20.71)and (20.72)are bounded in the limit by assumption. .

The conditions of 20.17 are rather difticult to apply and interpret. We willrestrict them vel'y slightly, to derive a simple summability condition which can becompared directly with 20.15.

20.18 Corollary Let (A,Tf ) be an Lr-bounded, fyl-mixingale of size -(?() with

respec! to constants (cf). If fcf) and ftz/l are positive, regularly varying

sequences of constants with Ilmllr<< c, and an'1-

x, andn

X (c,/c,)1< ,,o

Nl(20.73)

where

2r(pc.#2(r-1)'0 =

(1+?.)(?c +2(r- 1)' (20.74)

then Snlan-6->

0.

-1- f r > 0 This is slowly varying at infinity byProof Define ot = (log o .

2.28, and the sequence fotltl is summable by 2.31. Apply the conditions of 20.17with the added stipulation that fBtI and f#4l are regularly varying, increasingsequences, and so consider the conditions for summability of a series of the formEsitnlexpl -q Uzln) ), for q > 0. Since Lnonlnjconverges, summability followsf'rom (nlonllhlnlexpt

-q U1(n4)-->

0. Taking logarithms, this is equivalent to

1ogn - logtu) + 1og&1(n) - q&a(n) .-.-)

-x. (20.75)Since U(n) = nvn) where Lnj is slowly varying, tls condition has the form

log n- logton) + pllog n + 1og(fy1(a))- 1)?2Q(?7)--

-x, (20.76)where pl and pa are non-negative constants and LL(n) and Lzln) are slowly yarying.The tenns logt/n) and log(fz1(n)) can be neglected here. Put p2 = 0 and Lzn) =

1+8 d the condition reduces toLlon = (logn) , an

Page 343: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Strong zzw of fzzr'd Numbers 323

(1 + pl - ntlognlllogn ----x, (20.77)

which holds for all pl , for any q > 0 and 6 > 0. Condition (20.60)is thereforesatisfied (recallingthat (#r) is monotone) if

2 2 2nMnBnlan < on. (20.78)Similarly, conditions (2.61)and (2.62)are satisfied if, respectively,

1-rFl X Irja (( B1 '-rclla

<( otlt, (20.79)B t t t t t

andM-t%ctlat < otlt, (p > (p0. (20.80)

We can identify the bounding cases of Bt and Mt by replacing the second order-of-magnitude inequality sign in (20.79),and that in (20.80),by equalities, leavingthe required scaling constants implicit. Solving for Mt and Bt in this way, sub-stituting into (20.78),and simplifying yields the condition

la/ 'z ?,/f, (20.81)(c, t

where (, = (2r(p+ 2(r - 1)1/(41+ rl(g + 2(r - 1)j. This is sufficient for (20.60),(20.61), and (20.62)to hold.

Since ct and at are specified to be regularly varying, there exist non-negative

constants p3, p4, and slowly varying functions Lz and L4 such that ct = tP3L5t)/P4fa (/). The assumption that (lctlauj is summable implies thatand at = 4

(p3 - p4)(o K-1.

But (p > (?o implies ( > %,so that (p3- p4)( <-1,

which inturn implies (20.81). This completes the proof. .

Noting that 1 f Qf 2, the condition in (20.73)may be compared with (20.59).Put(pn = J and r = 2 and we obtain (e = #,whereas with (pn = 1, we get (()=

242r - 1)/(3r - 1) which does not exceed r in the relevant range, taling valuesbetween 1 when r = 1 and 55when r = 2. Square-summability of ctlat is sufficientonly in the limit as both (20

-->

x and r-->

x. Thus, this theorem does not contain20.16. On jhe other hand, in the cases where (cfJis uniformly bounded and at = f,

we need only (o> 1, so that any r > 1 and (fb > 0 will serve. These dependencerestrictions are on a par with those of the Lj convergence law of 19.10, and astriking improvement on 20.16. The case r = 1 is not permitted for sample aver-

1+ages, but is compatible with at = ftlog f) for 6 > 0. In other words, thetheorem shows that

l4$-ly-qa:r ..., () a.s.(rltlogn) t>1

(20.82)

This amounts to saying that the sequence of sample means is almost surely slowly

varying as n -- x; it could diverge, but no faster than a power of 1ogn.

20.6Near-Epoch Dependent and Mixing ProcessesIn vinw nf fhe lnqf reKllltK. there are two Dossible aooroaches to the NED case. It

Page 344: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

324

turns out that neither approach dominates the other in tel'ms of pennissable condi-tions. We begin with the simplest of the arguments, the straightforward extension

of 20.18.

The frw of fxzrgd Numbers

z0-lgTheorem Letasequence lml=-xwithmeans lwl=-xbefv-NEDof size-b,

for1< p S 2, with constants dt f< 11*-,- g,,llp,on a (possiblyvector-valued) sequence(FrlJx which is a-mixing (f-mixing)of size -J. lf

*

7711(.:)- g,rl/hllti < x

>1(20.83)

for q > p in the a-rnixing case q p in the g-nzixingcase) Nvhere

lqb + lq - 1) lqa + 1)( = min , jqxs + gc ,(1 +qlb + lq - 1) ( (20.84)

henz-1E''Sxt - w)-fJ.-. 0.t n t=

Proof By 17.5, (Xt- jttl is a fal-mixingale of size-minth,

J(1 - 1/)) with

respect to constants (c,), with ct <f lI-Y,- $11. This is by 17.5(i) in theG-mixing case and by 17.5(ii) in the (-mixingcase. The theorem follows by 20.18,after substituting for (?tl in (20.74)and simplifying. .

This penuits arbitrary mixing and NED sizes and arbitrary moment restrictions, solong as (20.83)holds with ( arbitrarily close to 1. By letting b

-->

(x) one obtainsa result for mixing sequences, and by letting a

-->

(x) a result for sequences thatare AP-NED on an independent underlying process. Interestingly, in each of these

ial cases ( ranges over the interval (1,lqlq + 1)) as the mixing/fa -NED sizespec pis allowed to range from zero to -=.

By contrast, a result based on 20.15 would be needed if we could claim onlysquare-summability of the sequence (11(X, - g,,l/thllpl for finite p; this rules out(20.83) for any choices of a and b. The first of these rsults comes directly byapplying 17.5.

20.20 Yheorem For real numbers b, p and r, let a squence f-vl'!x with meansf

g,,)D-.x,

be JO-NED of size-.b

on a sequence (J$J=-x,with constants dt << 111,- wllp.For a positive constant sequence (th)

'1'

x, let f Xt - kttllatl be uniformlyfv-bounded, and 1et

X

7711(-,- w)/hII;< =.

>1(20.85)

-1 n X - ) --F--.->0 in each of the following cases'.Then an Xr=1(t w(i) b = ;1, p = 2, r > 2, (Ji) is (x-mixing of size

-r/(r

- 2).,(ii) b = 1, 1 < p < 2, r > p, (F,) is a-mixing of size

-#r/(r -p);

(iii) b =, p = 2, r k 2, fFfJ is (-tnixingof size

-r/2(r

- 1),.(iv) b = 1, 1 < p < 2, r 2 p, fF,) is (-mixingof size

-r/(r-

1).

Proof By 17.5, conditions (i)-(iv)are all sufficient for l Xt - P,,)/Jr, ?h1 to be

Page 345: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

325

an fv-mixingale of size -, where 5t = c(F,, s < f). The mixingale constants arectlat (t maxtlf, II-Y,- p.fllrl/c, = 11*-,- wIIr/c,.The theorem follows by 20.16. w

As an example, 1etXt possess moments of all orders and be AZ-NED of size-z1,

onan a-mixing process of size close to

-1

(lettingr,,,-.kx). Summability of the termsYaxtlat) is sufficient by (20.85).The same numbers yield ) = 8,/on putting q = 2and a = 1 in (20.84),which is not far from requiring summability of the Q-norms.

However, this theorem requires fv-boundedness, which if r is small constrainsthe permitted mixing size, as well as offerig poor NED size characteristics forcases w'ith p < 2. lt can be improved upon in these situations by introducing atruncation argument. The third of our strong laws is the following.

The Strong f-aw of fzzr'd Numbers

20.21Theorem Let a sequence (A)=-x with mans lwlm-xbe AP-NED of size-1/p

for 1 < p f 2, with constants dt (( 111,-g,II,,on a sequence (F,)--- which iseither

(i) a-mixing of size-r/(r-

2) for r > 2 or(ii) (-mixingof size

-r/2(r-

1) for r > 1, and r k p;and for q with p S q S r and a constant positive sequence (th)

'1<

x, 1etX

Fl 11(-4- p.fl/tll'''tntp,k/rl< x;

>1(20.86)

-1 -tl.u

g uthen an 17=1(m- w) .

Note the different roles of the three constants specified in the conditions.

p controls the size of the NED numbers, q is the minimum order of moment required

to exist, and r controls the mixing of (Ffl . The distribution of Xt does nototherwise depend on r.

P f The strategy is to shbw that thete is a sequence equivalent toroof Xt - ptllatl, and satisfying the conditions of 20.15(i). As in 20.6, let

F, = Xt - ttt3llat :l: (1- 11), (20.87)where 1t/ = 1( Ix,I<u,) and <+' denotes <+' if Xt > go

i-' otherwise. Note thatf Xt - bktjlat is AP-NED with constants dtlat, and J', is a continuous function ofXt - ptjlat with lF,1 ; 1 a.s. Applying 17.13 shows that l't is AZ-NED on (F,) ofize

-1

with constants 21-#/2(#latlpll since 11Ffllr < x for every finite r, itS :2, t .

further follows by 17.5 that if 5t = c(Fr-y, s 1 0), lF, - EYt4, @r)'; is anfaz-mixingale of size

-;1

with constants

ct << max ( dtlapll, 11F,IIrl. (20.88)Here, dt < 2 IIm- wll for any q k p, and 11rrllrK 2( 111,- g.rll/crlWr for any q <

r by the second inequality of (20.24).Condition (20.86)is therefore sufficientzfr the sequence lct J to be summable, and (Ft - F(J)) satisfies the conditions of

20.15(i).We can conclude that Z')=1(l',- EYt44 -E6-: S3, where l is some randomvariable.

According to 20.6, condition (20.86)is sufticient for Z'7.1IE) I < x. Thet/ari/xt N'; - r'/I' 4 tla/xrefnrsx rxnnxr,xraee tn n Gnlte llmlt hx, @ 94 enxr V= . r'/v 4 -

Page 346: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

326

C1, and

(20.89)

Inequalities (20.22)and (20.6)further imply that F2and Xt - bktllatare equiva-lent sequences, and hence

The zzw of fxzr'c Numbers

n

X I'J -E-> .%

+ C1.

,-1

n

77((1,- vlat - 1$)--6-:

sz,>1

(20.90)

where 51 is another random variable, by 20.3. We conclude that

X xt - vlat -EI- 5.1+ s, + ti'l = s?,,-1

(20.91)

' lemma that J-1J 1(4 - w)-f.-..y

0, the requiredsay. lt follows by Kronecker s n =

conclusion. wHere is a fill, more specialized, result. The linear function of martingale

differences with summable coefficients is a case of particular interest since itunifies our two approaches to the strong law.

20.22Theorem Let Xt = Lcej=-xjut-jwhere (U(k is a uniformly Ip-bounded m.d.

ence with p > 1, and lrv-xl 0;1 < n. Thensequ j....

sb..i :..

1.19 '-).

..

t. '..lk. drkr.

. 4..:-. ..)-

.-y

.... $tt),..,. ....

.......((r!ji;;7''').kigikr-r

.

................45))- 4:4()4.

....

,. ......-

.-).

. y...y.... i..,..

.....11 ,

..h.

. )1=1 u..l'.

. k '%.

r.-r 't'''.'(.

u'y.,.'.)

#t-.--?---....' .,

Proof Ff = Xtlt is a L-mixingale

with ct t< 1// and arbitrary size. It was shown!

in 16.12 that the maxlmal inequality (20.56)holds for this case. Application ofthe convergence lemma and Kronecker's lemma lead directly to the result. Alterna-tively, apply 20.18 to Xt with at = /. w

In these results, four features summarize the relevant characteristics of thestochastic process: the order of existing moments, the summability characteristicsof themoments, andthe sizes of themixing andnear-epochdependence numbers.The

way in which the currently available theorems trade off these features suggeststhat some unitkation should be possible. The Mct-eish-style argument is revealedby de Jong's approach to be excessively restrictive with respect to the dependenceconditions it imposes, whereas the tough summability conditions the latter'stheorem requires may also be an artefact of the method adopted. The repertoire ofdependent strong laws is currently being extended (deJong, 1994) in work as yettoo recent for incorporation in this book.

Page 347: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

21Uniform Stochastic Convergence

21.1 Stochastic Functions on a Parameter SpaceThe setting for this chapter is the class of functions

f : x 0- --+ 2-,

wllere (f1,F,g.) is a measure space, and (O-,p) is a metri space. We write.f4,0)

todenote the real value assumed by f ai the point (,0), which is a random variablefor fixed 0. But /(,.), alternatively written just

.f(),

is not a random vari-able, but a random element of a space of functions.

Ecnometric analysis is very frequently concerned with this type of object. Log-likelihoods, sums of squares, and other criterion functions for the estimation ofeconometric models, and also the first and second derivatives of these criterionfunctions, are a1l the subject of important convergence theorems on which proofsof consistency and the derivation of limiting distributions are based. Except in arestricted class of linear models, a11 of these are typically functions both ofthe model parameters and of random data.

To deal with convergence on a function space, it is necessary to have a crit-erion by which to judge when two functions are close to one another. In thischapter we examine the questions posed by stochastic convergence (almostsure orin probability) when the relevant space of functions is endowed with the uniformmetric. A class of set functions that are therefore going to be central to ourdiscussion have the fonn f*1fl --+ 2- where

f*k = sup.f((0,0).

(21.1)6c e

For examjle, if g and h are two stochastic functions whose unifonn proximity is atissue, we would be interestd in the supremum of

.f(,0) = I#(,0) - (,0) I.

An important technical problem arises here which ought to be confronted at theoutset. We have not so far given any results that would justify treating f* as arandom variable, when (O-,p)

may be an arbitrary metric space. We can write

((,):fs > xl = Uf ): /'(0,) > xl, (21.2)0 e O

andthe results of 3.26 show that f: /*4) > xl e 5 when (:.f(0,)>x)

e T for. k

each0, when e is a countable set. But typically O is a subset of (R ,#s)

orsomeihingof the kind, and is uncountable.

Page 348: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

328

This is one of a class of measurability problems having ramifications far beyondthe uniform convergence issue, and to handle it properly requires a mathematical

apparatus going beyond what is covered in Chapter 3. We shall not attempt to dealwith this question in depth, and will offer no proofs in this instance. We will

merely outline the main features of the theory required for its solution. Theessential step is to recognize that the set on the left-hqnd side of (21.2) can beexpressed as a projection.

Let Ot.ldenote the Borel tield of subsets of 0-, that is, the smallest c-tieldcontaining the sets of 0- that are open with respect to p. Then 1et (f1x 0, 9 @Bdenote the product space endowed with the product c-tield (thec-tield generatedfrom the measurable rectangles of 5 and fe), and suppose that f(.,.) is5 (& t.lfp-measurable. Observe that, if

Ax = f(),0): f4,0) > x) l 9 @Be, (21.3)the projection of Ax into f is

The ww of f-wr'd Numbers

Ex = f : f (,0) > x, 0 e 0-)

= ((t)..f'u > x) . (21.4)

In view of 3.24, measurability of f* is equivalnt to the condition that E l 5'iIbutfor rational x. Projections are not as a rule measurable transformations,

der certain conditions it can be shown that Ex e5P where L'j,5P,1b is theun ,

completion of the probability space.The key notion is that of an analytic set. A standard reference on this topic is

DellacherieandMeyertlg78); see alsoDudley (1989:ch.13), and StinchcombeandWhite (1992).The latter authors provide the following definition. Letting (f1,T)be a measurable space, a set E c f is called F-analytic if there exists a compactmetric space (O,p) such that E is the projection onto of a set A l 5 (&0e. Thecollection of T-analytic sets is written

.4(54.

Also, a function J: f F-) F iscalled T-analytic if (: J()) f xl e

.45)

for each .x

e PSince every E e 5 is the projection of E x 0- e 5 @ e, 5 454. A measurable

set (or function) is therefore also analytic..45)

is not in general a c-field,although it can be shown to be closed under countable unions and countableintersections. Tiie conditions under which an image underprojection is known to beanalytic are somewhat Weaker than the detinition might suggest, and it willactually suftice to let (O,e) be a Souslin space, that is, a space that ismeasurably isomorphic to an analytic subset of a compact metric space. A suffi-cient condition, whose proof can be extracted from the results in Stinchcombe andWhite (1992),is the following:

21.1 Theorem Let (f1,F) be a measurable space and (O-,fo)

a Souslin space. If B eM5 (&Se), the projection of B onto is in

.45).

nNow given themeasurable space (f1,@),define T&=() TB where (f1,;B,F)is the, t ,

completion of the probability space (f1,@,g,)(see3.7) and the intersection isU lled univer-taken over al1 p.m.s g defined on the space. The elements of ; are ca

Page 349: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Uniform Stochastic Convergence 329

sally measurable sets. The key conclusion, from Dellacherie and Mayer (1978:III.33(a)), is the following.

21.2 Theorem For a measurable space (f1,T),&.454 c 5 . n (21.5)

i b definition 5Uc FB for any choice of jt, it follows that the analytic setsS nce y

of 5 are measurable under the completion of (f1,@,jt)for any choice of g. In otherwords, if E is analytic there exist A, # e 5 such that A c E ? # and g(A) = jt(#).ln this sense we say that analytic sets are nearly' measurable. Al1 the standardprobabilistic arguments, and in particular the values of integrals, will beunaffected by this technical non-measurability, and we can ignore it. We canlegitimately treat J*((0)as a random variable, provided the conditions on 0- areobserved and we can assume J(.,.) to be tnear-l5 f9 Oe/s-measurable.

An analytic subset of a compact space need not be compact but must be totallybounded. It is convenient that we do not have to insist on compactness of theparameter space, since the latter is often required to be open, thanks to strictinequality constraints (thinkof variances, stable roots of polynomials and thelike). In the convergence results below, we find that 0- will in any case have tobe totally bounded for completely different reasons: to ensure equicontinuity; toensure that the stochastic functions have bounded moments; and that when a stoch-astic criterion function is being optimized with respect to 0, the optimum isusually required to lie almost surely in the interior of a compact set. Hence,total boundedness is not an extra restriction in practice.

The measurability condition on fl(.,) might be verifiable using an argumentfrom simple functions. lt is certainly necessary by 4.19 that the cross-sectionfunctions A.,0): F-> P-and J(),.): 0- --> 2-be, respectively, T/plmeasurable foreach 0 e 0- and s/s-measurable for each ) e f. For a finite pmition (O-1,...,0- ,n)

of 0- by e-sets, consider the functions

J(??,)(t0,0)= J(,$), 0 e O-j,j = 1,...,-, (21.6)where j is a point of O-j. If E) = (): f,%.l f a7l e 5 for each h then

5 (8)f(.), (21.7)Ax = l(,0):.f(,,z)(tt,0)

f xl = UFJTX O-jej

being a finite union of measurable rectangles. Since this is true for any.;r, fm)

is 5 (8)fe/s-measurable. The question to be addressed in any particular case iswhether a sequence of such partitions can be constructed such that hm)

--)

f asm -- x.

Henceforth we shall assume without further comment that suprema of stohasticfunctions are random variables. The following result should be carefully noted,not least because of its deceptive similarity to the monotone convergence theprem,although this inequality goes the oppositeway.The monotoneconvergencetheorem

concerns the expectation of the supremum of a class of functions (/',,4)1,whereasthe present one is more precisely concerned with the envelope of a class of

Page 350: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

330

functions, the function J*()) which assumes the value supeseltttl,ol at each pointof f.

21.3 Theorem sup F(.f(0)) f E sup J(0) .

0 e O 0 e O

The fzzw of frr'e Numbers

Proof Appealing to 3.28, it will suffice to prove this inequality for simplefunctions. A simple function depending on 0 has the' fonu

m

((,0) = Xw(0)1s/(tl))= w(0), ) e Ei.f=1

(21.8)

Defining G*j = supesowto),

sup (p4,0) = a*j, (t) e Ei.eee

(21.9)

Hence

supF@(0)) - E sup (/0) = sup X (w(0)- u14PEi) < 0,e(-) ee O 86O f=l

(21.10)

where the final inequality is by definition of a). w

21.2 Pointwise and Uniform Stochastic ConvergenceConsider the convergence (a.s.,in pr., in Lp, etc.) of the sequence (Q,,(0)Jto alimit function :(0), Typically this is a law-of-large-numbers-type problem, with

pn(0)= X4n,(0)>1

(21.11)

(we use array notation for generality. but the case qnt = qtln may usually beassumed), and :(0) = 1imn-+xF(:s(0)). Alternatively, we may want to consider thecase Gnj

--> 0 where

.QG,,(0) = 7.(n,(0)

- F(ts/0))).>1

(21.12)

By considering (21.12)we divide the problem into two parts, the stochasticconvergence of the sum of the mean deviations to zero, and the nonstochastic

onvergence assumed in the definition of Q(0).This raises the separate questionof whether the latter convergence is uniform, which is a matter for the problem athand and will not concern us here.

As we have seen in previous chapters, obedience to a law of large numbers callsfor both the boundedness and the dependence of the sequence to be controlled. Inthe case of a function on 0-, the dependence question presents no extra difficulty;for example, if wr(01)is a mixing or near-epoch dependent array of a givenclass, the property will generally be sharid by qnt' for any 01, 02 l 0. Butthe existence of pmicular moments is clearly not independent of 8. If there

Page 351: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

331

exists a positive array (D,,f) such that Iqnte) I S Dnt for a1l 0 G 0-, and IIDn,lIr<

x, uniformly in t and n, qnt) is said to be Lr-dominated. To ensure pointwiseconvergence on 0-, we need to postulate the existence of a dominating array. Thereis no problem if the qnt) are bounded functions of 0. More generally it isnecessary to bound 0-, but since 0- will often have to be bounded for a differentset of reasons, tlzis does not necessarily present an additional restriction.

Given restrictions on the dependence plus suitable domination conditions,pointwise stochastic convergence follows by considering (Gn(0)) as an ordinarystochastic sequence, for each 0 e 0-. However, this line of argument does notjuarantee that there is a minimum rate of convergence which applies for a1l0, thecondition of uniform convergence. If pointwise convergenceof (Gn(0)) to the limitG(0) is defined by

Gn(0) ..-0 0 (a.s., in Lp, or in pr.), each 0 e 0-, (21.13)a sequence of stochastic functions (G,,(0) ) is said to converge lznt/rpz/y (a.s.,inLp, or in pr.) on 0- if

sup lG,,(0) I .--y 0 (a.s.,in L;,, or in pr.). (21.14)9e (-)

To appreciate the difference, consider the following example.

Unijrm Stochastic Convergence

21.4 Example Let 0- = g0,x),and define a zero-mean an'ay (%,(0)l where

Z0, 0 f 0 f 1/2?zht

gnt) = - + Z(3In - 0), 1/2n < 0 f $In (21.15)n

0, $In < 0 < cxa

where (/ltJis a zero-mean stochastic sequence, and Z is a binary r.v. with #(Z =

1 PZ =-1)

= 1 Then Ga(0) = Z:=1%(0)= Hn + #n(0), where Hn = ?1-1E'h) = J. t = 1 tnand

Zn, 0 f 0 f 1/2s

Kn) = Z(1 - rl0), Slln < 0 f Sln. (21.16)0 1/n < 0 < x

We assume Hn -E1-> 0. Since Gn) = Hn for 0 > 1/n as well as for 0 = 0, Gn(0)-61-.:

0 for each fixed 0 e 0. ln other words, Gn) converges pointwise to zero, a.s.However, supese IKn) I = Izlzl= ,1for every n 1. Because Hn converges a.s.

there will exist N such that lHn I < :1 for all n k N, with probability 1. You canveri that when IHnI< :1the supremum on e of 1Hn + Kn) I is always attained atthe point 0 = 1/2.n. Hence, Fith probability 1,

sup IGn(0) I = IHn +z1zIfor n k N,es e

Page 352: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

332 The fzzw of f-zzr'd Numbers

-- as n,.-3

=. (21.17)

It follows that the unifonn a.s. limit of Gn) is not zero.Similarly, for n k N,

P sup IGn(0) I k e: = #( IHn +rlzl 2 E).(4G O

--+#( 1z I 2 e) = 1, (21.18)

so that the uniform probability limit is not zero either, although the pointwiseprobability limit must equal the pointwise a.s. limit. n

Our tirst result on uniform a.s. convergence is a classic of the probabilityliterature, the Glivenko-cantelli theorem. This is also of interest as being acase outside the class of f'unctions we shall subsequently consider. For a collec-tion of identically distributed r.v.s (.)((l)),...,A7a((t))) on the probability space(D,F,#), the empirical distribution functionis defined as

1 n

Fn(x,tn) =--

771(-xx(,X()).n ,-1

(21.19)

ln other words, the random variable Fnlx,k is the relative frequency of thevariables in the set not exceeding x. A natural question to pose is whether (andin what sense) Fn converges to F, the true marginal c.d.f. for the distribution.

For fixed x, (&(x,(l)))Tis a stochastic sequence, the sample mean of n Bern-oulli-distributed random variables which take the value 1 with probability F(x)and 0 otherwise. lf these form a stationary ergodic sequence, for example, we knowthat Fnlxbl

--> F@) a.s. for each x e R. We may say that the strong law of largenumbers holds pointwise on R in such a case. Convergence is achieved at x for a1l) e G, where #(G) = 1. The problem is that to say that tbefunctions Fn converge

a.s. requires that a.s. convergence is achieved at each of an uncountable set ofpoints. We cannot appeal to 3.6(iii) to claim that pt;)xs IRCx)= 1, and hence theassertion that Fnlx, -- F@) with probability 1 at a point .x

not specsedbeforehand cannot be proved in this manner. This is a problem for a.s. convergenceadditional to the possibility of convergence breaking down at certain points ofthe parameter space, illustrated by 21.4. However, uniform convergence is thecondition that suffices to rule out either difficulty.

ln this case, thanks to the special form of the c.d.f. which as we know isbounded, monotone, and right-continuous, uniform continuity can be proved byestablishing a.s. convergence just at a countable collection of points of R.

21.5 Glivenko-cantelli theorem If Fnx,k -- F(x) a.s. pointwise, for x e R, then

sup lFnlx,k - F@) l .--> 0 a.s. I:a (21.20)A

Proof First define, in parallel with Fn,

Page 353: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Uniform Stochastic Convergence 333

1 nF'(x,(,)) = - 771(--,x)(x,())),n

f=1(21.21)

and note that Fa'@,() ---2, F@-) for a1l ) in a set Cx',where #(Cx') = 1. For aninteger m > 1 1et

xjm= inf (.xe R: F(x) k.j//n),

j = 1,...,- - 1, (21.22)and also let x()r,, =

-x and xmm = +=, so that, by construction,

Fxjm-) - Fxj-j,m) < Slm, j = 1,...,-. (21.23)Lastly let

M=((,)) = maxmaxl 1Fnxjm,k - Fxjm) I, lFn'xjm.k - Fxjm-) Ilj. (21.24)Sjnm

Then, for j = 1,...,- and x G (xy-1,m,.xs),1F(x) - -- Mmnk f Fxj-jvm) - Mmnt

S Fnxj-k,m.q f Fn@.) S Fn'xjm.)j

S F%m-) + Mmnl f F(x) +-+

Mmnlk. (21.25)

That is to say, IFa@,) - F@) I f 3lm + Mmn for every x e R .

By pointwise strong convergence we may say that limn-yxl/knt) = 0 for finite m,and hence that limn-yxsupxlFnx, - F(x) l ; 1/-, for al1 (J) Cm*, where

c* = Otcxfn cx').n1 mj mj=1

(21.26)

But #tlimm-+xCk') = 1 by 3.6(iii), and this completes the proof. .

Another, quite separate problem calling for unifonn convergence is when a samplestatistic is not merely a stochastic function of parameters, but is to be eval-uated at a random point in the parameter space. Estimates of covariance matricesof estimators generally have this character, for example. One way such estimates

are obtained is as the inverted negative Hessian matrix of the associated samplelog-likelihood function, evaluated at estimated parameter values. The problem ofprovingconsistencyinvolvestwodistinctstochasticconvergencephenomena, anditdoes not suffice to appeal to an ordina/ 1aw of large numbers to establishconvrgence to the true function evaluated at the true point. The followingtheorem gives suftkient conditions for the double convergence to hold.

21.6 Theorem Let (L,5,P4 be a probability space and (O-,p)

a metric space, andlet Qn10-x F-+ R be T/f-measurable for each 8 e 0. If

(a) 0n*-E'2 %, and(b) :s(0) -.EI.+:(0) unifonnly on an open set Bj containing %,where :(0) is

a nonstochastic function continuous at %,

Page 354: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

334

;...s.o

rj()(j;.then Qn(n)

Proof Uniform convergence in probability of Qnon Bo implies that, for any E > 0and 8 > 0, there exists Nj k 1 large enough that, for n k Nj ,

P Sup I:,,(0) - Q(0)I < r1E 2 1-,1.

0E5Bz

Aldo, since 0n*-T-r-> 0o, there exists N2 such that, for zl 2 N1,

(21.27)

(21.28)

To consider the joint occurrence of these two events, use the elementary relation

P(A /'7 #) k PA) +PB4 - 1. (21.29)Since

The zzw of -wrgc Numbers

#(0 e #0) 2 1 - 14.

(0n*e #ol f''h sup I:a(0) - :(0()1< Y c l l:,,(0) - Q(0n*)l< Y), (21.30)t)E5BQ

for n k max(N1,N2),

#1 lQn(0n*)- Q(0,1)l < /1 2(1 - t) - 1 = 1-/.

(21.31)Using continuity at % and 18.10(ii), there exists Nz large enough that, for n k

/$r3,#(I:(0n*)- Q(0())I < /) 2 1 -

. (21.32)By the triangle inequality,

IQn(M,,*)- :(%*)l + l:(0,,*)- Q(0c)l l:,/0,1)- :()0()281 (21.33)and hence

l I:,;(0n*)- :(0n*)l < el fa ( l:(%*)- :(00)l < El

c ( I:a(0n*)- p(0t))l < : l .

Applying (21.29)again gives, for n k max(NI,N2,Na),

#(I:n(0n*)- :(041 < E) 1 -

.

The theorem follows since 8 and e:are arbitrary. .

Notice why we need unifonn convergence here. Pointwise convergence wold notallow us to assert (21.27)for a single 11 which works for a1I0 s B). There wouldbe the lisk of a sequence of points existing in Bo on which Nj is diverging,Suppose % = 0 and Gn) = :s(0) - Q(0)in 21.4. A sequence approaching %, say( 1/m, m e IN), has this property; we should have

#( IQnlm) - :(1/,/7)j < /) k 1 - 1:4 (21.36)for arbitrary 6: > 0 and 45 > 0, only for n > m. Therefore we would not be able to

(2 l.34)

(21.35)

Page 355: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Unrm Stochastic Convergence 335

claim the existence of a finite n for which (21.31)holds, and the proofcollapses.

In this example, the sequence of functions (G,,(0) ) is continuous'for each n, butthe continuity breaks down in the limit. This points to a link between uniform

convergence alzd continuity. We had no need of continuity to prove the Glivenko-Cantelli theorem, but the c.d.f. is rather a special type of function, with itsbehaviour at discontinuities (andelsewhere) subject to tight limitations. In thewider class of functions, not necessarily bounded and monotone, continuity is thecondition that has generally been exploited to get uniform convergence results.

21.3 Stochastic EquicontinuityExample 21.4 is characterized by the breakdown of continuity in the limit of thesequence of continuous functions. We may conjecture that to impose continuityunifonnly over the sequence would suffice to eliminate failures of uniform conver-gence. A natural comparison to draw is with the uniform integrability property ofsequepces, but we have to be careful with our terminology because, of course,uniform continuity is a well-established tenu for sometlling completely different.

The concept we require is equicontinuity, or, to be more precise, asymptoticIkplt/b/-/?/ equicontinuity', see (5.47).Our results will be based on the followingversion of the Arzel-Ascoli theorem (5.28).21.7 Theorem Let (A(0),n q (N) be sequence of (nonstochastic)functions on atotally bounded parameter space (O,p). Then, supeee l

.fn(0)

1 --> 0 if and only iffno) ---> 0 for a11 0 e 0-e, where 0-t) is a dense subset of 0, and (/ l isasymptotically unifonnly equicontinuous. u

The set F = t/'n,n e IN) QJ (01,endowed with the uniform metric, is a subspace of(Co,Jg), and by definition, convergence of fn to 0 in the unifonn metlic is thesame thing as unifonn convergence on 0-. According to 5.12, compactness of F isequivalent to the property that every sequence in F has a cluster point. ln viewof the pointwise convergence, the cluster point must be unique and equal to 0, sothat the conclusion of this theorem is really identical with the Arzel-Ascolitheorem, although the method of proof will be adapted to the present case.

Where convenient, we shall use the notation

w(/k,)= sup sup IJn(0')- fn) 1.

0 (5 O 0/G 510,8)(21.37)

The function w(A,.):R+ F-+ R+ is called the modulus ofcontinuity of fn.Asymp-totic unifonn equicntinuity of the sequence (Jn) is the property thatlimsupnwt/k,8) 1 0 as

.1'

0.

Proof of 21.7 To proveiif'

: given e:> 0, there exists by assumption : > 0 tosatisfy

limsup w(/,) < e. (21.38)n--x

Page 356: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

336

since 0- is totally bounded, it has a cover f5(f,&2), i = 1,...,mJ.For each i,choosep e 0-nsuch that p(0sp) < /2 (possiblebecause 0-() is dense in 0-) and

ad

note that (5'(0f,8),i = 1,...,mJis also a cover for 0-. Every 0 0--is containedN

in 5'(0/,) for some i, and for this i,

The fww of fzzr'c Numbers

lJn(0)1f sup 1Jn(0')le's

-s('4'j,)

S sup Ifne'q - .f(pf)I+ IJ(1)I.

e'<.s'(;J,)(21.39)

We can therefore write

supIfnet I < max sup.&(e')

- J(pf)I+ max IJ(pf)I()ce liir?z 0'e

.('j,)

l<<??z

s w(.&-) + max I.f(4f)I.

l Ki< m

(21.40)

Sufficiency follows on taking the limsup of both sides of this inequality.donly ito follows simply from the facts that unifonn convergence entails point-

wise convergence, and that

w(.&,)S 2 sup 1.f(0)1. . (21.41)0 e e

To apply this result to the stochastic convergence problem. we must defineconcepts of stochastic equicontinuity. Several such definitions can be devised, ofwhich we shall give only two: respectively, a weak convergence (inpr.) and astrong convergence (a.s.)variant. Let (O,p) be a metric space and L'j,5,P4 aprobability space, and 1et (Gu(0,)), n e IN) be a sequence of stochastic functionsGn O x f F-> R, F/f-measurable for each 0 e 0. The sequence is said to be asymp-totically I/nt/brl/y stochastically equicontinuous (in pr.) if for a1l e > 0 H 8 >

0 such that

limsup PwGnbn) 2 :) < s. (21.42)n-x

And it is said to be strongly asymptotically rfnt/br/nfy stochastically equicontin-

uous if for all 6: > 0 R > 0 such that

P limsup w(tL,) 2 : = 0. (21.43)n-@x

Clearly, there is a bit of a terminology problem here! The qualifiers asymptotic'and unifonn' will be adopted in al1 the applications in this chapter, so 1etthese be understood, and let us speak simply of stochastic equicontinuity andstrong stochastic equicontinuity. The abbreviations s.e. and s.s.e. will sometimesbe used.

Page 357: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Unrm Stochastic Convergence 337

21.4 Generic Uniform ConvergenceUniform convergence results and their application in econometrics have beenresearched by several authors including Hoadley (1971),Bierens (1989),Andrews(1987a, 1992),Newey (lggll,andptscherandpruchatlg8g,lgg4l.Thematerialinthe remainder of this chapter is drawn mainly from the work of Andrews andPtscher and Prucha, who have pioneered alternative approaches to derivingtgeneric' uniform convergence theorems, applicable in a variety of modellingsituations.

These methods rely on establishing a stochastic equicontinuity condition. Thus,once we have 21.7, the proof of unifonn almost sure convergence is direct.

21.8 Theorem Let (Gn(0), n e EN) be a sequence of stochastic real-valued functionson a totally bounded metric space (O,p). Then

suplG,,(0)I-f.-

0 (21.44)()EEO

if and only if(a) Gn)

--f.1..:

0 for each 0 q 0-0, where 0-t) is a dense subset of 0-,

(b) (Gn) is strongly stochastically equicontinuous.

Proof Because (O-,p) is totally bounded it is separable (5.7)and (% can be chosento be a countable set, say 0-() = (01,k G EN) . Condition (a) means that for k =

1,2,... there is a set Ck with Pc = 1 such that Gnk,-->

0 for (.t) e Ck.Condition (b)means that the sequences (Gn Jare asymptotically equicontinuousforall ) e C, with PC') = 1. By the suftkiency part of 21.7, supaso IGn(0,tt))I.-- 0 for (l) e C* = O';=lC fa C. PC*) = 1 by 3.6(iii), proving if' .

<only if' follows from the necessity part of 21.7 applied to (Gn())J for each (t)

c* .

The corresponding tin probability' result follows very similar lines. The proofcannot exploit 21.7 quite so directly, but the family resemblance in the argumentswill be noted.

21.9 Theorem Let tGn(0), n e ENl be asequence of stochastic real-valued functions

on a totally bounded metric space (O,p). Then

supIGn(e) I .-J'-'.'-> 0 (21.45)()s O

if and only if(a) Gn) .-C1.+0 for each 0 l 0-(), where 0-() is a dense subset of 0-,

(b) fGnl is stochastically equicontinuous.

Proof To show 'if'

, let (5'(4f,8), = 1,...,mJwith 4fG 0-() be a finite cover for().This exists by the assumption of total boundedness and the argument used in theprpofof 21.7. Then,

Page 358: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

338

PtsuspolGnt l 22E)

Fe Law of fzlrge Numbers

; P max sup (lG,t@') - Gni) I+ !Gn(0f)I) k 2:Lsism ezs.('4f,)

< #(w(Gn,) E) + P max lGnlui) I e1S Km

< P(wGn,&) k e) + P U(iG,/VI e.)f=1

< #(w(Gn,) :) + X#41tA(4f)l :),

izz,1(21.46)

where we used the fact that

fx-y k 2EJc (x k e) l..p (y 2 :) (21.47)for real numbers x and y, to get the third inequality. Taking the limsup of bothsides of (21.46),(a) and (b) imply that

limsup # sup IGn(0) j k 2: (21.48)n-kcn 0e O

To provetonly if', pointwise convergence follows immediately from uniform

convergence, so it remains to show that s.e. holds; but this follows easily inview of the fact (see(21.41))that

#(w(Gn,) k :) S P sup lGnl) 12 :/2 . w8 e O

(21.49)

There is no loss of generality in considering the case Gn-->

0 in these theorems.We canjust as easily apply them to the case where Gn(0) = @n(0)- :n(0) and Qnisa nonstoehastie function which may really depend on n, or just be a limit func-tion so that '-Q=

-:

In the fonner case there is no need for Qnto converge, as5 n

l -

-

does. Applying the triangle inequality and taking complements inong as Qn Qn(21.47), we obtain

) < eJfa (w(-p ) < EJ c fwtpu--

) < 2e). (21.50)lwton, n, n,

This means that (Qn- -Qn) is s.e., or s.s.e. as the case may be, provided that(:?:) is s.e., or s.s.e., and f-:n) is asymptotically equicontinuous in the ordinary

senseof j5.5. This extesion of 21.8 is obvious, and in 21.9 we can insert the

step#(w(Qn- Ua,8)2 2E) S #(w(:,,,8) 2 E) + 1(w(7n,)E) (21.51)

into (21.46),where the second tenu on the right is 0 or 1 depending on whetherthe indicated nonstochastic condition holds, and this tenn will vanish when n 2 N

Page 359: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Unbrm Stochastic Convergence

for some N k 1, by assumption.The s.e. and s.s.e. conditions may not be particularly easy to verify directly,

and the existence of Lipschitz-type sufficient conditions could then be veryconvenient. Andrews (1992)suggests conditions of the following sort.

21.10 Theorem Suppose there exists N 2 1 such that

IQn(6')- Qn(0)l K #n(p(0,0')), a.s. (21.52)holds for all 0,0' e 0- and n k N, where h is nonstochastic and hx) 1 0 as x 1 0,and (Bn l is a stochastic sequence not depending on 0. Then

(i) (Qn) is s.e. if Bn = 0r,(1).(ii) (:,,) is s.s.e. if limsupsla < x, a.s.

Proof The definitions imply that w(Qn,6)< Bnhln) a.s. for n 2 N. To prove (i),note that, for any s > 0 and 8 > 0,

limsup #(w(:n,6) k :) < limsup PBn k E//l(8)). (21.53)n--)x n-ex

By definition of Op(1), the right-hand side can be made arbitrarily small bychoosing Elh&) large enough. ln particular, fix e > 0, and then by definition ofh we may take 8 small enough that limsupn--pxf'tf,l k :/48)) < :.

For (ii), we have in the same way that, for small enough 8,

P limsup w(Qn,8)2 is f P limsup Bn k :/(6) < E. . (21.54)n--oo N'-/*

A sufticient condition for Bn = Op(1) is to have Bn unifonnly bounded in f,1 norm,i.e., supnElBn) < x (see 12.11), and it is sufticient for limsupnfn to be a.s.bounded if, in addition to this, Bn - E(Bn)

.-61-.:

0.The conditions of 21.10 offer a striking contrast in restrictiveness. Think of

(21.52) as a continuity condition, which says that Qn(0')must be close to Qn(0)when 0' is close to 0. When Qnis stochastic these conditions are vel'y hard tosatisfy fojxed Bn, because random changes of scale may lead the condition to beviolated from time to time even if Q,;(6,(t))is a continuous function for a1l (l) and

n. The purpose of the factor Bn is to allow for such random scale variations.Under s.e., we require tat the probability of large variations declines as

their magnitude increases', this is what Op(l) means. But in the s.s.e. case, therequirement that (#n)be bounded a.s. except for at most a finite number of termsimplies that fQn) must satisfy the same condition. This is very restrictive. ltmeans for example that Q,,(0)cannot be Gaussian, nor have any other distributionwith infinite support. ln such a case, no matter what (#,,)and h were chosen, theconditiop in (21.52)would be violated eventually. It does not matter that theprobability of large deviations might be extremely small, because over an infinitenumber of sequence coordinates they will still arise with probability 1.

Thus, strong uniform convergence is a phenomenon confined, as far as we areable to show, to a.s. bounded sequences. Although (21.52)is only a sufficientcondition, it can be verified that this feature of s.s.e. is implicit in the

Page 360: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

340

definition. This fact puts the relative merits of working with strong and weaklaws of large numbers in a new light. The former are simply not available in manyimportant cases. Fortunately, in probability' results are often sufficient forthe purpose at hand, for example, detennining the limits in distribution ofestimators and sample statistics', see j25. 1 for more details.

f rther that Q (M)is differentiable a.s. atSupposing (O-,p)

c ('R ,dE), suppose u veach point of 0-; to be precise, we must specify differentiability a.s. at each

* i in 0.. (A set B c R is said to be convex ifpoint of an open convex set 0- conta n g .

x e B and y e B imply kr + (1- h)y e B for e (0,1).) The mean value theoremz o,-

* 22yields the result that, at a pair of points 0,0 e ,

The frw of fzlr'e Numbers

k QnQ (O) - Qn(O') = 77 (0 - 0r,)a.s.,n y@yjsj;vf=1

(21.55)

where e*e 0-* is a point on the line segment joining 8 and O', which exists byconvexity of 0-*. Applying the Cauchy-schwartz inequality, we get

k j Qnj jjo:.,r

jlQa(0)- Qn(O') l K Y-%.

.e.. j aefj ().(j . li=1

< fnlle - e'll a.s., (21.56)

wherejj;E)jk;pr?: j' jj.Bn = sup PO e=e.()* G O *

(21.57)

Here 41. jldenotes the Euclidean length, and onlB is the gradient vector whoseelements are the partials of Qn with respect to the 0j. Clearly, (21.52) issatisfied by taking h as the identity function, and Bn defined in (21.57) is arandom variable for all rl. Subject to this condition, and Bn satisfying theconditions specified in 21.10, a.s. differentiability emerges as a sufficientcondition for s.e..

21.5 Uniform Laws of Large Numbers

In the last section it was shown that stochastic equicontinuity (strongor in pr.)is a necessary and sufficient condition to go from pointwise to uniformconvergence (strongor in pr.). The next task is to find sufficient conditions forstochastic equicontinuity when (:n(0)l is a sequence of partial sums, and hence toderive uniform laws of large numbers. There are several possible approaches tothis problem, of which perhaps the simplest is to establish the Lipschitzcondition of 21.10.

21.11 Theorem Let ((tsX(.t),0)l:=1)''J.1denote a triangular array of real stochasticfunctions with domain (O-,p), satisfying, for N k 1,

Page 361: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Unform Stochastic Convergence 341

1qntl - ?n,(0) l S #n/(p(0,0')), a.s., (21.58)for all 0,0' e O and n k N, where h is nonstochastic and hx)

.1,

0 as.x 1 0, and

(#n,l is a stochastic array not depending on 0 with 1=kEBnt) = /(1). If :n(0) =

ZVlnf(0)'then(i) Qnis s.e.;

(ii) Qnis s.s.e. if lzzkBnt- ElBntjj-61--;/

0.

Proof For (i) it is only necessary by 21.10(i) and the triangle inequality toestablish that XlctBnt = 0,(1). This follows from the stated condition by theMarkov inequality. Likewise, (ii) follows directly from 21.10(ii). w

A second class of conditions is obtained by applying a form of s.e. to thesummands. For these results we need to specify Gnto be an unweighted average of nfunctions, since the conditions to be imposed take the form of Cesro summabilityof certain related sequences. It is convenient to confine attention to the case

1 nGn(,0) = - F7(tp(X,(),0) - F(/X,,0))), (21.59)

>1

where Xt e X is a randpm element drawn from the probability space (X,T,gf).Typically, though not necessarily, Xt is a vector of real r.v.s with X a subset ofR''', m k 1, t' being the restriction of Bm to X. The point here is not to restrictthe fonn of the functional relation between qt and , but to specify the existenceof marginal derived measures g,f,with gXA) = P(Xt 6 A) for A e T. The usual

context will have Gn the sample average of functions that are stochastic throughtheir dependence on some kind of data set, indexed on t. The functions themselves,not just their arguments, can be different for different t.

We must find conditions on both the functions qt(.,.) and the p.m.s j whichyield the s.e. condition on Gn. The first stage of the argument is to establishconditions on the stochastic functions tp(0) which have to be satisfied for s.e.to hold. Andrews (1992)gives the following result.

21.12 Theorem lf(a) there exists a positive stochastic sequence fdt) satisfying

supIqtt I S dt, all t (21.60): s (.)

and

1 n

limsup - TEdt 1(>v))--> 0 as M

-->

x;

nn--yx >1

(b) for every : > 0, there exists > 0 such that

(21.61)

1 n

limsup- 77#(w(tp,6)> :) < e',nn-yx >1

(21.62)

Page 362: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

342

then Gn is s.e. l:n

Condition (21.61)is an interesting Cesro-sum variation on unifonn integrability,and actual uniform integrability of (4) is sufticient, although not necessary.Condition (a)is a domination condition, while condition (b)is called by Andrewstermwise stochastic equicontinuity.

The fxzw of fxzrgc Numbers

-1' F(2J 1 u>>)) < :1:2 andProof Given : > 0, choose Msuch that limsupn-oxn ?=I t ( , ,

then6 such that

1-n

limsup--

77#(w(t.p,)> E2) < jM-1:2.n

n-+x >1(21.63)

The first thing to note is that

w((t.n-'(f.n)),)

S w(,,) + w(F(tp),6)

S w(,,) +F(w(,,)), (21.64)

where the last inequality is an application of 21.3. Applying (21.64)and usingMarkov's inequality,

1 .-Q#(w(tA,) > e) S #.-wttp

- Eqt),nj > 6r>1

s,(!,y)

(w(o,)+s(w(.))) > j&>1

2 n

s -Xs(w(,,))nc>1

o n

=-

77e-(w(.p,)(1fw(t,,,)xc2s)ns>1

+ 1fs2/6<w(s8)<AzI+ 1(w(o,)>&/')).1, (21.65)

where the indicator functions in the last member add up to 1. Using the fact thatw(t.?s8)S ldt, and hence fwttsl > AfI (24> AJ'),and taking the limsup, we nowobtain

2 1 n p2 elimsup#(w(Ga,) > E) < - -j- + Mlimsup - X# w(t?f,8) > -j-

n-+x n-x >1

1 n

+ limsup -XF(2J,1(>,>>j)nn-yco /=1

(21.66)

in viesv of the values chosen for 3T and . .

Page 363: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Unrm Stochastic Convergence

Clearly, whether condition 21.12(a) is satisfied depends on both thedistribution of Xt and functional form of qt(.4. But something relatively generalcan be said about tenuwise s.e. (condition21.12(b)). Assume, following Ptscherand Prucha (1989),that

p.?,(x-e)= >7rk,(x),1,@,Q),=1

(21.67)

where rktkX -- R, and.k,4.,0):

X --> R for fixed 0, are T/f-measurable functions.The idea here is that we can be more liberal in the behaviour allowed to thefactors rkt as functions of Xt than to the factors skt; discontinuities arepermitted, for example. To be exact, we shall be content to have the rkt uniformlyfvl-bounded in Cesro mean:

1 n

sup - 77FIrktxt) I s B < cs k = 1,...,p.n n >1

(21.68)

As to the factors Jkr@,0), we need these to be asymptotically equicontinuous fora sufficiently large set of x values. Assume there is a sequence of sets (Km 6 T,m = 1,2,...J,such that

1 n

limsup--

Xp,rtA1)--+ 0 as m

--.>

x,

nn-+x >1(21.69)

and that for each m 2 1 and s > 0, there exists 8 > 0 such that

limsup sup w(sjz(x,.),8) < :, k = 1,...,p.n-koo A G Km

Notice that (21.70)is a nnstochastic equicontinuity condition, but under condi-tion (21.69)it holds (asone might say) almost surely, on average' when the r.v.Xt is substituted into the formula.

These conditions suffice to give tennwise s.e., and hence can be used to proves.e. of Gn by application of 21.12.

(21.70)

21.13 Theorem If qt(Xt,Q)is defined by (21.67),and (21.68),(21.69),and (21.70)hold, then for every : > 0 there exists 6 > 0 such that

1 n

limsup - XP(w(t.p,8) > E) < e. (21.71)n-yx >1

Proof Fix : > 0, and first note that

#(wtr-l> :) S P (F-Iq,l wtsal > j&1

s,(

fIr,-Iw(,,,,)>s/,I)

&1

Page 364: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The fzzw of fxzr'c Numbers

p

<77#(1rkrl w(.k,,) > zlp4&1

sx'-g,(Ir,,1w(,.,,)1,u > ss)&1J

+P (1qll w(.%,,)1rj, > lp . (21.72)

Consider any one of these p tenns. Choose m large enough that

1 n :limsup-Xw(#kc)< ,

n l-pn-x >1

(21.73)

and for this m choose 8 small enough that

2limsupsup w(x@,.),) < .

2x xs Km 4Bpn-

(21.74)

Then, by the Markov inequality,

n j n vz :1 :limsup- X P lrkt1w(.q,,)1Au > s-.- K limsup -- X# 1qJl

z>

rgn ,= 1I'Z-

n --+ x'*

,= 1 4Bpn->x

1 n :s limsup - EFI q,l

zppn--yx >1

f -,

2#(21.75)

and by (21.73),1 n : 1 n

limsup - X# Irpl wt-qsllrj, > yp f limsup -a XPCXt Km4n-/x >1 n-+x >1

<--;*

#(21.76)

Substituting these bounds into (21.72)yields the result. w

Page 365: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

V

THE CENTRAL LIMIT THEOREM

Page 366: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 367: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

22Weak Convergence of Distributions

22. 1 Basic ConceptsThe objects we examine in this pal't of the book are not sequences of random vari-ables, but sequences of marginal distribution functions. There will of course beassociated sequences of nv.s generated from these distributions, but the conceptof convergence arising here is quite distinct. Formally, if fFnJTis a sequence ofc.d.f.s, we say that the sequence converges weakly to a limit F if Fn@) .-.A F@)pointwise for each x e C, where C ? R is the set of points at which F is continu-ous. Then, if Xn has c.d.f. Fn and X has c.d.f. F, we say that Xn converges indistribution to X. These terms are in practice used more or less interchangeablyfor the distributions and associated l'.v.s.

Equivalent notations for weak convergence are Fn = F, and Xn.-C->X.Althoughthe latter notation is customary, it is also slightly irregular, since to say asequence of nv.s converges in distribution means only that the limiting r.v. hasthe given distribution. If both X and F have the distribution specitied by F, thenXn -P->X and Xn-P-> F are equivalent statements. Moreover, we write things likeXn -P- N(0,1) to indicate that the limiting distribution is standard Gaussian,although #(0,1)' is shorthand for Ea

r.v. having the standard Gaussian distribu-tion' ; it does not denote a particular r.v.. Also used by some authors is thenotation

i-1-+'

standing for convergence in probability law', but we avoid thisform because of possible confusion with convergence in fp-norm.

Pointwise convergence of the distribution functions is all that is needed,rememberig that F is non-decreasing, bounded by 0 and 1, and that every point iseither a cpntinuity point or a jump point. lt is possible that F could possess >jump at a point zb which is a continuity point of Fn for all finite n, and inthese cases Fnxo) dpes not have a unique limit since any point between F(.zb-) andF(ab) is a candidate. But the jump points of F are at most countable in number,and according to 8.4 the true F can be constructed by assigning the value Fxo) atevery jump point xo; hence, the above definition is adequate.

If g. represents the corresponding probability measure such that F@) =

g,((-x, x1) for each .x

e R, we know (seej8.2) that g.and F am equivalent repre-sentations of the same measure, and similarly for g. and Fn. Hence, the statementgn cuu) g,is equivalent to Fn ::uuh F. The corresponding notion of weak convergence forthe sequence of measures (jtnlis given by the following theorem.

22.1 Theorem g. = p,iff gn(A)-->

p,(A)for every A e B for which j@A) = 0. rl

The proof of this theorem is postponed to a later point in the development. Note

Page 368: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

348

meanwhile thatthe exclusion of events whose boundary points have positive proba-bility corresponds to the exclusion of jump points of F, where the events inquestion have the form ((-x,x11.

Just as the theol'y of the expectation is an application of the general theory ofintegrals, so the theory of weak convergence is a general theory for sequences offinite measures. The results below do not generally depend upon the conditiongz(El = 1 for their validity, provided definitions , are adjusted appropriately.However, a serious concern of the theory is whether a sequence of distributionfunctions has a distribution function as its limit; more specifically, should itfollow because gw(R) = 1 for evel'y n that g,@) = 1:7This is a question that istaken up in j22.5. Meanwhile, the reader should not be distracted by the use ofthe convenient notations f(.) and #(.) from appreciating the generality of thetheory.

22.2 Example Consider the sequence of binomial distributions fBn,onls n =

1,2,3,...), where the probability of x successes in n Bernoulli trials is given by

S x n-x jPxn = x) = (bn) (1 - Vn4 , x = 0,...,n (22. )x

The Central Limit Theorem

(see 8.7). Here, lv is a constant parameter, so that the probability of a successfalls linearly as the number of trials increases. Note that Exn) = for every n.For fixed .x,

(x'')n-A-- 1/x! as n

--)

x, and taking the binomial expansion ofn h that (1 - hlnln --

e-Zas n

-->

=, whereas (1 - 1/rI)-* -->

1. We(1 - bn) s owsmay therefore conclude that

V -kPxn = x)-.->

-e

, x = 0, 1,2,...,!'

.X .

(22.2)

and accordingly,

xFu(c) = :j2P:ln = x) -+ ,-l ;2 -.

1'.06x<J 0KAfJ

(22.3)

at all points a < x. Thus the limit (andhence the weak limit) of the sequencefBn,hln) l is the Poisson distribution with parameter . E1

22.3 Example A sequence of discrete distributions on g0,1) is defined by

1/?z, x = ilnPxn = x) = , i = 1,...,n. (22.4)

0, otherwise

Tls sequence actually converges weakly to Lebesguemeasure m on (0,1), althoughthis fact may be less than obvious; it will be demonstrated below. For any x e(0,1), g,n((0,xq)= (nxl/n

-->

x = pl(E0,.t)),where r/lxldenotes the largest integerless than nx. There are sets for which convergence fails, notably the set :(0,1)of all rationals in r0,1), in view of the fact that gw(Qp,1)) = 1 for every n,and rrl(Qp,1)) = 0. But Q((),1j= 0,1J and ??l(:(Q(n,1y))= 1, thus the definition ofweak convergence in 22.1 is not violated. D

Page 369: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Nctzk Convergence of Distributions 349

Although convergence in disibution is fundamentally different from converg-ence a.s. and in pr., the latter imply the former. In the next result, :-.14'

canbe substituted for *-P-%%

, by 18.5.

22.4 Theorem If Xn -T-r-->X, then Xn -P-> X.

Proof For E > 0, we have

+ Plfxn < x) f-h (I.'L-xl

> :))

< PX < x+ :) + #(1Ak - 71 (22.5)

where the events whose probabilities appear on the right-hand side of the inequal-ity contain (andhence are at least as probable as) the corresponding events onthe left. #( IXn - -Yl > E) -.-/ 0 by hypothesis, and hence

limsup Pxn S x) S PX f x + 0. (22.6)N.->=

Similarly,

#(X f x- E) = #( (X f x - :) f-'h ( lXn - XI S el)

+ #((.Y < x- 6:) f-h l 1-L--Yl > El)

f Pxn < x) + #( 4Xn - -Yl > E), (22.7)and so

PX f x - :) S liminf Pxn S x).n-+x

(22.8)

Since 6: is arbitrary, it follows that limn--joofax'af x) = PX < x) at evel'y point xfor which PX = x) = 0, such that limozoptx f x - E) = #(X S x). This condition isequivalent to weak convergence. w

The converse of 22.4 is not true in general, but the two conditions are equiva-lent when the probability limit in question is a constant. A degenerate distribu-tion has the form

0, x < aF(X =

1, x 2 a(22.9)

If a random variable is converging to a constant, its c.d.f. converges to the stepfunction (22.94,through a sequence of the sort illustrated in Fig. 22.1.

22.5 Theorem Xn converges in probability to a constant a iff its c.d.f. convergesto a step function with jump at a.

Proof For any E > 0

#( jXn - a l < E) = Pa - : S Xn f a + E)

Page 370: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

350

= Fna + e) - Fna - e)-). (22.10)Convergence to a step function withjump at a implies limpi-.-yx/kttz+ :) = Fa + :) =

1, and similarly limn-yxFnttc - :)-) = Fa - E)-) = 0 for a1l e > 0. The suffi-ciency pal't follows from (22.10)and the definition of convergence in probability.For the necessity, let the left-hand side of (22.10)have a limit of 1 as n

-->

x,

for all er> 0. This implies ,

Ihe Central Limit Theorem

1imLFna + :) - Fnla - e)-)j = 1. (22.11)n'M

Since 0 f F S 1, (22.11) will be satistied for all er> 0 only if Fa) = 1 andF(J-) = 0, which detines the function in (22.9).K

Fn

1 n=x iIl N@ n = n jE

n=nzX E l1 < nl < nz < ...

!INxs

= n,I0 '

-x

J X

Fig. 22.1.

22.2 The Skorokhod Representation Theorem

Notwithstanding the fact that Xn -P-> X does not imply Xn -S1..y X, whenever asequence of distributions fFn ) converges weakly to Fone can construct a sequenceof r.v.s with distributions Fn, which converges almost surely to a limit havingdistribution F. Shown by Skorokhod (1956)in a more general context (seej26.6),this is an immensely useful fact for proving results about weak convergence.

Consider the sequence LFnl converging to F. Each of these functions is a mono-tone mapping from Vto the interval (0,1J.The idea is to invert this mapping. Leta random variable (l) be defined on the probability space ((0,1),fp,1J,??z), whereSp,1J is the Borel field on the unit interval and m is the Lebesgue measure.Define for (t e (0,11

L() = inftx: (l) S Fn@)). (22.12)In words, Fa is the random variable obtained by using the inverse distributionfunction to map from the uniform distribution on (0,1Jonto 2-,taking care of any

1 ding to intervals with zero probability massdiscontinuities in % () (corresponunder Fn) by taking the intimum of the eligible values. l'n is therefore a non-decreasing, lzft-continuous function. Fig. 22.2 illustrates the construction,

Page 371: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

k '''

.)r. W'ctzkConvergence of Distributions 351

.:.. t .' essentially the same as used in the proof of 8.5 (compareFig. 8.2). When Fn hasiscontinuities it is only possible to assert (by light-continuity) that Fn(Fn()))

tl ) whereas Fn(Fn(x)) K x, by left-continuity of Fn.,j.... ).

..

.

.

j j.j

C : - The tirst important feature of the Skorokhod construction is that, for anyconstant a e R,

PYnu) < a) = #( S Fnaj) = Fna), (22.13)

where the last equality follows from the fact that ) is uniformly distributed on23 d f and F the r.v. corres-(0,1q. Thus, Fn is the c.d.f. of L. Letting F be a c. . .

ponding to F according to (22.12), the second important feature of the construc-tion is contained in the following result.

1

Fig. 22.2

22.6 Theorem If Fn = F then Fu ---) F a.s.g/? as n -- x. (:1

ln working through the proof, it may be helpful to check each assertion about thefunctions F and i' against the example in Fig. 22.3. This represents the extreme

case where F, and hence also F, is a step function; of course, if F is everywherecontinuouj and increasing, the mappings are 1-1 nd the problem becomes trivial.

Proof Let ) be any continuity point of 1', excluding the end points 0 and 1. Forany e > 0, choose x as a continuity point of F satisfying l'4l - E < x < F()).Given the countability of the discontinuities of F, such a point will alwaysexist, and according to the definition of F, it must have the property F(x) < (t).

lf Fnlx) --> F(x), there will be n large enough that Fn(x) < ), and hence x <

L(), by definition. We therefore have

F()) - 6: < x < L(). (22.14)Without presuming that limn-yxlv) exists, since e is arbitrary (22.14) allows usto conclude that liminfn-xFnt) F4(l)).

Next, choose y as a continuity point of F satisfying F() < y < F()) + E. Theproperties of F give f F(F()) f FV).For lalge enough rlwe mustalso have <

Page 372: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

352

&@), and hence, again by definition of l%,

1%() f A'< i'()l + C.

The Central Limit Theorem

(22.15)

In the same way as before, we may conclude that limsupa--jxFat) S F(). Thesuperior and inferior limits are therefore equal, and limn-oxikt)) = i'().

Fytjrp/ gj, (j)

'--------.---.------.--.------.---.--.-------.----.----..--

l 1 !l,j!j!

12.*

(j) ..-.-------...---.-.-.-----.-...-----.-------....-----------'j

j j24 i

I I ' I#jsrtx) -.-...----- j j !I j I j I! ! jl I ! l l

-1-! l l i i !: : : 1. I

.1-

1'4) - ir X 1,4), A' F4)) + ir 1'u()')

1Xa7)

Fig. 22.3

This result only holds for continuity points of K However, there is a 1-1correspondence between the discontinuity points of F and intervals having zeroprobability under g, in R. A collection of disjoint intervals on the line is atmost countable (1.11),and hence the discontinuities of i' (plusthe points 0 and1) are countable, and have Lebesgue measure zero. Hence, L

,--.k

F w.p. 1 grzlj,asasserted. w

In Fig. 22.3, notice how both functions take their values at the discontinuitiesat the points marked A and B. Thus, F(F((0)) =

'

> . lnequality (22.15) holds for), but need not hold for (Y, a discontinuity point. A counter-example is the

sequence of functions Fa obtained by vertical translations of the fixed graph f'rombelow, as illustrated. In this case L()') > F(') + : for every n.

22.7 Corollary Detine random variables Fs',so that Fn'() = 1%()) at each wherethe function is continuous, and Fj() = 0 at discontinuity points and at = 0 and1. Define Y' similarly. If Fx = F then L'() --> F/()) for every (t) e 0,1), andFn and F are the distribution functions of Fa' and F'.

Proof The convergence for every (l) is immediate. The equivalence of the distribu-tions follows from 8.4, since the discontinuity points are countable and theircomplement is dense in g0,1j,by 2.10. w

k f rln the form given, 22.6 does not generalize very easily to distributions in R ok > 1, although a generalization does exist. This can be deduced as a special caseof 26.25, which derives the Skorokhod representation for distributions on generalmetric spaces of suitable type.

A tinal pbitlt to observe about Skorokhod's representation is its generalization

Page 373: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gdtzk Convergence of Distributions 353

to any finite measure. If Fn is a non-decreasing right-continuous function withcodomain gt7,:j,(22.12) defines a function 1%() on a measure space ((tz,:1,

gu,x,m), where m is Lebesgue measure as before. With appropriate modifications,all the foregoing remarks continue to apply in this case.

The following application of the Skorokhod representation yields a different,but equivalent, characterization of weak convergence.

22.8 Theorem Xn -P-->X iff

limElfxnll = F(J(m) (22.16)n-jx

for every bounded, continuous real function f. n

The necessity half of this result is known as the Helly-Bray theorem.

Proof To prove sufficiency, constnlct an example. For a G R and > 0, let

1, x < a - 6

fx) = a-x)/8,

a - 8 < x f c (22.17)0,

We call this the tsmoothed indicator' of the set (-x, t7l. (See Fig. 22.4.) It is acontinuous function with the properties

Fna - ) S jfdlS Fna), a11z7,

Fa - 8) < JJJF S Fa).

(22.18)

(22.19)

By hypothesis, fdt-n--> fdF, and hence

limsupFna - 8) S jfdF S liminf Fna).n-A= n-A*

(22.20)

Letting --- 0, combining (22.19)and (22.20)yields

limjup Fna-) K F(J), (22.21)n-/x

F(J-) f liminf Fn(J). (22.22)

These inequalities show that limnFntll exists and is equal to F(J) whenever F(a-4

= Fa), that is, Fn = F.To prove necessity, let f be a bounded function whose points of discontinuity

are contained in a set Dy, where gtDyl = 0, g, being the p.m. such that F@) =

g,((-x,x1). When Fn = F (& being the c.d.f. of Xn and F that of Ar)1%/((t).-+ F/())for every (l) s (0,1J,where Fn'((t))and F'((1))are the Skorolhod variables defined in22.7. Since r?z(): F'((l)) 6 Dy) = g.(Dy) = 0, f(Yn') --> f(Y'4 a.s.lg) by 18.8(i).The bounded convergence theorem then implies Efl) --> AU(F')), or

Page 374: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

354

jflgdvmLb')-- jfyldbk, (22.23)

where g.a is the p.m. corresponding to Fn. But 9.6 allows us to write

jfll'tdbtnll't= jydbn-yb= jxdbnf-'= Efxntt' (22.24)

The Central Limit Theorem

.i

with a similar equality for ElfX)l. (The trivial change of dummy argument from yto

.x is just to emphasize the equivalence of the two fonnulations.) Hence we haveE(f%44 -->

AU(m).The result certainly holds for the case Dy = 0, soKonly it/

is proved. .

1 -. -

'

1tll1!7

)'

t-

tz- a x

Fig. 22.4

Notice how the proof cleverly substitutes (E0,1),Sp,11,>?1)for the fundamentalprobability space (L1,5,P4 generating f-L), exploiting the fact that the deriveddistributions are the same. This result does not say that the expectationsconverge only for bounded continuous functions; it is simply that convergence isimplied at least for al1 members of this large class of functions. The theoremalso holds if we substitute any subclass of the class of bounded continuousfunctions which contains at least the smoothed indicator functions of half-lines,for example the bounded unifonnly continuous functions.

22.9 Example We now give the promised proof of weak convergence for 22.3.Clearly, in that example,

1 n

.fdg.n= - Tfiln).i=1

(22.25)

The limit of the expression on the right of (22.25)as n-->

c,o is by definition theRiemann integral of f on the unit interval. Since this agrees with the Lebesgueintegral, we have a proof of weak convergence in this case. In

We shall subsequently require the generalization of Theorem 22.8 to general finite

measures. This will be stated as a corollary, the modifications to the proof beingleft to the reader to supply; it is mainly a matter pf modifying the notation tosuit.

Page 375: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gctzk Convergence of Distributions 355

22.10 Corollary Let (&) be a sequence of bounded, non-decreasing, right-continuous functions. Fn = F if and only if

jl#Fn--

JJJF(22.26)

for every bounded, continuous real function J. E1

Another proof which was deferred earlier can now be given.

Proof of 22.1To shtw sufficiency, considerA = (-c,o,.x1,

for which JA = (xJ.Weakconvergence is detined by the condition gw(((-x, Aj )) -- p,(((-x,xj J) whenever

g,((xJ)= 0. To show necessity, consider in the necessity pal't of 22.8 the caseJ(x) = 1x@) for any A e :B. The discontinuity points of this function arecontained in the set A, and if jt(A) = 0, we have jtntAl

---)

g,(A) as a case of(22.16), when Fn = F. .

22.3 Weak Convergence and Transformations

The next result might be thought of as the weak convergence counterpart of 18.8.

22.11 Continuous mapping theorem Let : R F- R be Borel-measurable with-1

discontinuity points confined to a set Dl. where blDR) = 0. If gz = g,,then gw/l-1

z:qhyt .

Proof By the argument used to prove the Helly-Bray theorem, hYn') --> hY')a.s.ljLl. It follows from 22.4 that h-n') -G (F'). Since mll Fa/() e A) =

h1n(X),

' F/4) e-1(4))

= jt,/-1(4) (22.27)rrlt: /1(1%()) e A) = zrl((l): n

-- . .- . p. -

-1

for each A e B, using 3.21. Simtlarly, mlhY ) e A) = jt/z *(A).According to thedefinition of weak convergence, hYn') -P-+hY') is equivalent to g,,;/l-l= gW-l .

22.12 Corollary If h is the function of 22.11 and Xn -P->X, then hxn) -P-+ 4m.Proof Imfnediate from the theorem, given that Xn - Fn and X - F. w

2213 Example If Xn -P- X(0,l), then X .-P-+x2(1).u@

Our second result on transformations is from Cramr (1946),and is sometimescalled Cramr's theorem:

22.14 Cramr's theorem If Xn -P-> X and L -E(-> a (constant),theni X + i' ) -P-> X+ J).( ) ( n ;(ii) Ynxn --- aX.

(iii) XnlYn-P-> Xla, for a :#: 0.

Proof This is by an extension of the type of argument used in 22.4.

Pxn + L f.%

= P(Xn + L f x, IL - a I

Page 376: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

356 The Central Limit Theorem

+ Pxn + l'n < .x, IL - a I k e)

S Pxn S x - a + :) + #( IFn - a l k E). (22.28)

Similarly,

(22.29)

and putting these inequalities together, we have

< Pxn < .x

- a + :) + #(Irn- tzI :). (22.30)Let Fxn and Fxn+yndenote the c.d.f.s of Xn and Xn+ F,yrespectively, and let Fxbe the c.d.f. of X, such that Fx = limn--jxFktx) at a1l continuity points of Fx.Since limn-yxPt IL - a I 2 E) = 0 for a1l 6: > 0 by assumption, (22.30)implies

Fxtx - a - :) ; liminf Fxn+ynxjN.-e

; limsup Fxn.vvnlx)f Fxtx - a + 0.K.->co

(22.31)

Taking 6: arbitrarily close to zero shows that

1imFxn+rntx) = Fxtx - J) = Fxnxjn.->x

(22.32)

whenever x - a is a continuity point of Fx. This proves (i).To prove (ii),suppose first that a > 0. By taking 6: > 0 small enough we can

ensure a - e > 0, and applying the type of argument used in (i) with obviousvariations, we obtain the inequalities

Plxna + E) S x) - #( IL -

a l E) S Pxn'n S a)

f P(Xn(a - :) ; x) + #( IFs -

a I e). (22.33)

Tnking limits gives

Fxtx/tl + E)) f liminf Fxnynx)n->x

f limsup Fxnynxj f Fx@/(J - E)),n--hx

(22.34)

and thus1im Fxnynx) = Fx(x/J) = Fa/x).

n-)x

(22.35)

If a < 0, replace i'n by -Fn and a by -J, repeat the preceding argument, and thenapply 22.12. And if a = 0, (22.33)becomes

Pxnz S x) - #( IFnl k e) S Pxn'n f x)

Page 377: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'ctz/cConvergence of Distributions 357

S P- f x) + #( Ii'nI :). (22.36)

For x > 0, this yields in the limit Fxayatx) = 1, and for x < 0, Fxnvnlx) = 0,which defines the degenerate distribution with the mass concentrated at 0. ln thiscase XnYn-CC->0 in view of 22.5. (Alternatively, see 18.12.)

To prove (iii) it suftkes to note by 18.10(ii) that plim 1/L = l/J if a # 0.Replacing L by 1/L in (ii) yields the result directly. .

22.4 Convergence of Moments and Characteristic FunctionsParalleling the sequence of distribution functions, there may be sequences ofmoments. If Xn -P-> X where the c.d.f. of X is F, then F(m = JItF@), where itexists, is sometimes called the asymptotic expectation of Xn. There is a tempta-tion to write A'(m = limn-oxxn), but there are cases where Ex does not existfor any finite n wlle F(m exists, and also cases where E(Xn) exists for every nbut F(m does not. This usage is therefore best avoided except in specific circm-stances when the convergence is known to obtain.

Theorem 22.8 assures us that expectations of boundedrandom variables convergeunder weak convergence of the corresponding measures. The following theoremsindicate how far this result can be extended to more general cases. Recall thatFIaYl is defined for every X, although it may take the value +x.

If X .-P-.I X then FI.Yl < liminfn-yxflAkl .22.15Theorem n

Proof The function a(x) = Ixl 1(Ixlua) is real and bounded. If #( IXI = a) = 0, itfollows by 22.11 that hqxn) -P-> atm, and from 22.8 (lettingf be the identityfunction which is bounded in this case) that

F(Jla(m) = lim Ehaxnl) f liminf FlXn I. (22.37)n--x n->x

The result follows on letting a approach +x through continuity points of thedistribution. .

The following theorem gives a sufficient condition for F(m to exist, given thatE X ) exists for each n.( n

22.16 Theorem If Xn-P-) X and I-YnIis uniformly integrable, then F1A-1< c. andE (Xn) -->

E(A9.

Proof Let L and J' be the Skorokhod variables of (22.12),so that Fa--61..:

K SinceXn and Fn have the same distribution, unifonn integrability of (aL) implies thatof (Fn1.Hence we can invoke 12.8 to show that E(Yn) --- F(l'), F being integrable.Reverjing the argument then gives FIXI< x and Exn)

.-.4 F(m as required. x

Uniform integrability is a sufticient condition, and although where it fails theexistence of EX4 may not be ruled out, 12.7 showed that its interpretation isquestionable in these circumstances.

it):pA sequence of complex r.v.s which is always uniformlv intecrable is fe 1.for

Page 378: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

358

in l = 1 Given the sequence of characteristic func-any sequence (-Yn). since le .

tions f#xa(f)J,we therefore know that if Fn = F, then

#xa(/)-->

9x(f) (22.38)

The Central Limit Theorem

(pointwise on R), where the indicated limit should be the characteristic functionassociated with F. In view of the inversion theorem (11.12),we could then saythat Xn -P-> X only if (22.38)holds, where (),(fl is tlte ch.f. of X. However, it isthe Eif' rather than the 'only ifX that is the point of interest here. If asequence of characteristic functions converges pointwise to a limit, under whatcircumstances can we be sure that the limit is a ch.f., in the sense that invert-ing yields a c.d.f.? A sufficient condition for this is provided by the so-calledLvy continuity theorem:

22.17 Lvy continuity theorem Suppose that fFn ) is a sequence of c.d.f.s and Fn=F, whereFisany non-negativerbounded, non-decreasing,Hght-continuouslnc-tion. If

+x

9s(/) = J eitxdb -->

9(f), (22.39)

and (j(/) is continuous at the point t = 0, then F is a c.d.f. (i.e.,JJF = 1) and(1)is its ch.f. u

The fct that the conditions imposed on the limit F in this theorem are not un-reasonable will be established by the Helly selection theorem, to be discussed inthe next section.

Proof Note that 4)s(0)= dl = 1 for any n, by (22.39)and the fact that Fn is ac-d.f. For v > 0,

v +x j v +x fvx !-1gtntvt = f --eitxdt (w.n = g e sx- dn, (22.40)VJ () J ..x VJ () J ..x

the change in order of integration being permitted by 9.32. By 22.10, wlaichextends to complex-valued functions by linearity of the integral, we have, asn '- x,

+x fv.x +x fv.x ve - 1 e - 1 1dFn .--A dF = - (dt,fvx fvx v ()

(22.41)

where the equality is by (22.40)and the definition of #.Since () is continuous att = 0, limv--yxv-llxt/llf = ((0) and since (n

--> (h we must have ((0) = 1. Itfollows from (22.41)that

+x ivx +xe - 1

1 = lim . JF = dF.Jvx

-= V-40-=

(22.42)

In view of the other conditions imposed, this means F is a c.d.f. It follows bytX h X is a random variable having c.d.f. F. This22.8 that (/) = Ee ) w ere

completesthe proof. w

Page 379: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'ctzkConvergence of Distributions 359

The continuity theorem provides the basic justificationfor investigating limit-ing distributions by evaluating the limits of sequences of ch.f.s, and then usingthe inversion theorem of jl 1.5. The next two chapters are devoted to developingthese methods. Here we will take the opportunity to mention one useful applica-tion, a result similar to 22.4 which may also be proved as a corollal'y.

22.18 Theorem If lXn - ZnI -E+ 0 and lXnJ converges in distribution, then fZn)converges in distribution to the same limit.

in i'Zn I-'.%

0 by 18.9(ii). Since Ierxl = l these functions areProof le - ein itzn I) is uniformly integrable. So byfa-bounded,and the sequence ( Ie - e

in i'Zn l .--I--:Z 0 However, the complex modulus inequality (11.3)18.14, Ie -

e .

givesitxn itzn s

itxn yeitzn j (a,z43)FIe - e I l e ) - ) , .

so that a further consequence is l9xa(/)- Yz.(/)I-..h 0 as n

-->

oo, pointwise on R.Given the assumption of weak convergence, the conclusion now follows from theinversion theorem. .

To get the alternative proof of 22.4, set Zn = X for each n.

22.5 Criteria for Weak ConvergenceNot every sequence of c.d.f.s has a c.d.f. as its limit. Counter-examples are easyto construct.

22.19 Example Consider the uniform distribution on the interval (-n,n1,such thatFnla) = 1a(1+ c/a), -n S a S n. Then Fa(c) --A z1for all a G R . n

22.20 Example Consider the degenerate r.v., Xn = n w.p.1. The c.d.f. is a stepfunction with jump at n. Fna) --) 0, all a e R . n

Although Fn is a c.d.f. for all n, in neither of these cases is the limit F ac.d.f., in the sense that Fa)

-....:

1 (0)as a -- cxa (-=). Nor does intuition suggestto us that the limiting distributions are well defined. The difficulty in thefirst example is that the probability mass is getting smeared out evenly over alinfinite support, so that the density is tending everywhere to zero. It does notmake sense to detine a random variable which can take any value in R with equalprobability, any more than it does to make a random variable infinite almostsurely, which is the limiting case of the second example.

In view of these pathological cases, it is important to establish the conditionsunder whichasequenceof measures canbe expectedtoconvergeweakly. The condi-tion that ensures the limit is well-defined is called uniform tightness. A se-quence lgzl of p.m.s on R is uniformly tight if there exists a tinite interval(tz,:Jsuch that, for any : > 0, sup,,pzttl,hll > 1 - :. Equivalently, if (Fn J isthe sequence of c.d.f.s corresponding to lg.nl,uniform tightness is the conditionthat for 6: > 0 3 a,b with b - a < x and

Page 380: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

360 The Central Limit Theorem

sup (Fb) - Fnaj ) > 1 - E. (22.44)l

It is easy to see that examples 22.19 and 22.20 both fail to satisfy the unifonntightness condition. However, we can show that, provided a sequence of p,m.s tgzJis uniformly tight, it does converge to a limit jt which is a p.m. This terminologyderives from the designation tight for a measure with jthe

property that for everyE > 0 there is a compact set A-esuch that g(A1) S e. Every p.m. on (E,f) is tight,although this is not necessarily the case in more general probability spaces. See926.5 for details on this.

An essential ingredient in this argument is a classic result in analysis,Helly's selection theorem.

22.21 Helly's selection theorem If (Fn) is any sequence of c.d.f.s, there exists asubsequence fnk,k = 1,2,...1 such that Fnk= F, where F is bounded, non-decreas-ing and right-continuous, and 0 K F K 1.

Proof Consider the bounded array (fFnxij, n G ENJ, i e N1,where (a),i e EN) is anenumeration of the rationals. By 2.36, this array converges on a subsequence, sothat limk-yx/ktxj) = F*x for every . Note that F'xik) S F'txfa) whenever .:/l

< Aa, since this property is satisfied by Fn for every n. Hence consider the non-decreasing function on R,

F(xl = inf F*(xj). (22.45)Xi >.T

Clearly 0 K F*xij f 1 for a11i, since the Fnjxij have this property for everyk. By definition of F, for x e R 3 xi > x such that F(x4 S F'xi) < Flxj + 8 forany e:> 0, showing that F is right-continuous since F'xi) = Fxi). Further, forcontinuity points x of F, there exist xfj < x and xiz > A7 such that

F(x) - : < F*(xjj) f F'xiz) < Fx) + t. (22.46)The following inequalities hold in respect of these points for every kl

F'bxiL) = lim Fnklxik)f liminf Fnkx)k-yx k-yx

f limsup Fnkx) < lim Fnklxiz) = F'xi.k-4x k-x

(22,47)

Combining (22.46)with (22.47),F@) - s < liminf Fnkx) f limsup Fnjxj < F(x) + :.

k-x k->x(22.48)

Since s is arbitrary, limk-yx/ktx) = F@) at all continuity points of F. .

The only problem here is that F need not be a c.d.f., as in 22.19 and 22.20. Weneed to ensure that F(x)

-->

1 (0) as x .--A x (-x), and tightness is the required

property.

22.22 Theorem Let (Fn ) be a sequence of c.d.f.s. If

Page 381: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

JFetzk Convergence of Distributions 361

(a) Fnk u:zh F for every convergent subsequence (rlk),and(b) the sequence is uniformly tight,

then Fn = F, where F is a c.d.f. Condition (b) is also necessary. I:n

Helly' s theorem tells us that (Fn ) has a cluster point F. Condition (a) requiresthat this F be the unique cluster point, regardless of the subsequence chosen, andthe argument of 2.13 applied pointwise to (Fn ) implies that F is the actual limitof the sequence. Uniform tightness is necessary and sufficient for this limit F tobe a c.d.f.

Proof of 22.22 Let x be a continuity point of F, and suppose Fn(x) zz-y F@).Then lFn@) - F(x) I : > 0 for an infinite subsequence of integers, say (rls k e(Nl . Define a sequence of c.d.f.s by FJ = Fnk, k = 1,2,... According to Helly'stheorem, this sequence contains a convergent subsequence, lk, i G (N), say, suchthat FLi= F. But by (a),F' = F, and we have a contradiction, given how the sub-sequence fkfl was constructed. Hence, Fn = F.

Since Fn is a c.d.f. for every n, Fnb) - Fn(J) > l - for some b - a < x, forany t: > 0. Since Fn .-- F at continuity points, increase b and reduce a as neces-sary to make them continuity points of F. Assuming unifonn tightness, we have by(22.44) that Fb) - Fa) > 1 - :, as required. lt follows that limx-jxfx) = 1 andlimx-o-xfx) = 0. Given the monotonicity and right continuity of F established byHelly' s theorem, this means that F is a c.d.f.

On the other hand, if the sequence is not uniformly tight, Fb) - Fa) < 1 - sfor some : > 0, and every b > a. Letting b -- +cxn and a --

-x,

we haveF(+=) - F(-x) f 1 - : < 1. Hence, either F(+x) < 1 or F(-x) > 0 or both, and F isnot a c.d.f. .

The role of thecontinuity theorem (22.17)should now be apparent. Helly's theoremensures that the limit F of a sequence of c.d.f.s has all the properties of ac.d.f. except possibly that of JJF = 1. Uniform tightness ensures this property,and the continuity of the limiting ch.f. at the origin can now be interpreted as asufficient condition for tightness of the sequence. lt is of interest to note whathappens in the case of our counter-examples. The ch.f. corresponding to example22.19 is

n sinvrl-1 j iu vxlts =.

G(V) = (2n) (cOs Mx+ s..n Mn

(22.49)

We may show (usel'Hpital's rulel that (n(0)= 1 for every n, whereas n(v) --) 0for a1l v # 0. ln the case of 22.20 we get

#u(v) = cos s'n + i sinvn, (22.50)which fails to converge except at the point v = 0.

22.6 Convergence of Random SumsMost of the important weak convergence results concern sequences of partial sumsof a random array fxnt,t = 1,...,n, n e (NJ. Let

Page 382: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

362 The Central Limit Theorem

Sn = XXnp-1

(22.51)

and consider the distributions of the sequence (&J as n --A cxn. The array notation(double indexing) permits a normalization depending on n to be introduced. Central

-1/2 (j s converges tolimit theorems, the cases in which typically Xnt = n X(, an n

the Gaussian distibution, are to be examined in detail in the following chapters,but these are not the only possibility.

22.23 Example The Blnshln) distribution is the distribution of the sum of n inde-pendent Bernoulli random variables Xnt, where Pxnt = 1) = hln and Plxnt = 0) =

1 - hln. From 22.2 we know that in this case

Sn = XXnf ---/D Poisson with mean ,. n>1

(22.52)

From 11.1 we know that the distribution of a sum of independent r.v.s is givenby the convolution of the distributions of the summands. The weak limits ofindependent sum distributions therefore have to be expressible as infinite con-volutions. The class of distributions that have such a representation is necessar-i1y fairly limited. A distribution F is called infinitely divisible if for every ne INthere exists a distribution Fn such that F has a representation as the n-foldconvolution

F = Fn*Fu*...*Fn. (22.53)ln view of (11.33), infinite divisibility implies a corresponding multiplicativerule for the ch.f.s.

24 Example For the Poisson distribution, #x(l;D= expl/f - 1) from (11.34),22. ,

and

9x(?;)= (expf(=n)(:'-1)J)n= @x(f;/n))n. (22.54)The sum of n independent Poisson variates having parameter hln is therefore aPoisson variate with parameter :. n

In certain intinitely divisible distributions, Fn and F have a special relation-ship, expressed through their characteristic functions. A distribution with ch.f.#x is called stable, with index p, if for each n,

n . etbn) (sl/p/; () < p x z, (gg.55)(9x(p) tx ,

where bln) is some function of n. According to (11.30),the right-hand side of3IPX that is, the sum of n independent(22.55) is the ch.f. of the nv. bln) + n ;

drawings from the distribution is a drawing from the same distribution apart froma change of scale and origin. If a stable distribution is also symmetric aboutzero, it can be shown that the ch.f. must belong to the family of real-valuedfunctions having the fonn

Page 383: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'ctzkConvergence of Distributions

91/) = expl -.a I/1,), a k 0.

363

(22.56)

22.25 Example The Cauchy distribution is stable with p = 1 and bnj = 0, havingch.f. #x(/;v,8) = exptf/v - 61fI), from (11.38).lf Xt - C(V,6) fOr t = 1,...,n,then kn - C(v,6). This result retlects the fact already noted (see 18.18) thatCauchy variates fail to observe the 1aw of large numbers. (:1

22.26Example For Gaussian X, (xff',g,,c2) = expt g./-zlt/zj

by (11.37).Thisi table with index p = 2 and bn = n - n1/2)g, withg,= 0 we have symmetry aboutSS .

1 2 if X - N(0 c2) for t =0, and obtain the formula in (22.56)with a = zc . Thus, t ,

1 n then n-ll; j,4 - N(O, 0.24.u

It turns out that a stable distribution with index p < l possesses absolutemoments of order at most r < p. Thus, the Gaussian is the only stable 1aw forwhich a variance exists.

This last fact is most important in motivating the central limit theorem. Eachof the foregoing examples illustrates the possibility of a limit 1aw for sums. Astable distribution naturally has the potential to act as an attractor for rbi-trarily distributed sums. While stable convergence laws do operate in cases wherethe summands possess absolute moments only up to order r < 2, the key result, tobe studied in detail in the following chapters, is that the Gaussian acts as the

f h distributions of scaled sums of arbitra r.v.s havingunique attractor or t ezero mean and finite variances.

Page 384: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

23The Classical Central Linzit Theorem

23. 1 The i.i.d. CaseThe tnormal law of error' is justly the most famous result in statistics, and tothe susceptible mind has an almost mystical fascination. If a sequence of randomvariables (#/)Thave means of zero, and the partial sums Z:=1m,n = 1,2,3,...

2 din to infinity with n although finite for each finite n,have variances sn ten gthen, subject to rather mild additional conditions on the distributions and thesampling process,

1 n

sn = - Tx -P-+ #(0, 1).Sn >1

(23.1)

This is the central limit theorem (CLT). Establishing sets of suftkient condi-tions is the main business of this chapter and the next, but before getting intothe fofmal results it might be of interest to illustrate the operation of the CLTas an approximation theorem. Particularly if the distribution of the Xt is symmet-ric, the approach to the limit can be very rapid.

23.1 Example In 11.2 we derived the distribution of the sum of two independent(40, 12 drawings. Similarly, the sum of three such drawings has densty

(23.2)

which is plotted in Fig. 23.1. This function is actually piecewise quadratic (thetllree segments are on (0,1j, g1,21and (2,31respectively), but lies remarkablyclose to the density of the Gaussian nv. having the same mean and variance asX+ F+ Z (alsoplotted). The sum of 10 or 12 independent uniform r.v.s is almostindistinguishable from a Gaussian variate; indeed, the formula S = Xl=21#j- 6,which has mean 0 and variance 1 when Xi - ULO,11and independent, provides asimple and perfectly adequate device for simulating a standard Gaussian variate incomputer modelling exercises. n

23.2 Example For a contrast in the manner of convergence consider the Bn,p)distribution, the sum of n Bernoulli) variates for fixed p e (0,1).The proba-bilities for p = 1aand n = 20 are plotted in Fig. 23.2, together with the Gaussiandensity with matching mean and variance. These distributions are of course dis-crete for every finite n, and continuous only in the limit. The correspondence ofthe ordinatej is remarkably close, although remember that for p # ,1 the binomialdistributiop is not symmetric and the convergence is correspondingly slower. This

f-l(.1/k+I'+w) = J J(lltw-c-l,w-zldAdz,o

Page 385: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Iheorem 365

example should be compared with 22.2, the non-Gaussian limit in the latter casebeing obtained by having p decline as a function of h. n

sumof 3 &g0,1js Nq, 14)

z N

0.4

z NN

00 l.5 3

Fig. 23.1

0.2

IBin(20, 0.5) x x(1(),5)?'

/ $

3.0. 1?'

?'

?'&

J' x

O --.

O 10 20

Fig. 23.2

Proofs of the CLT, like proofs of stochastic convergence, depend on establishingproperties for certain ncnstochastic sequences. Previously we considered samplepoints lXnk - X()) I for e C with PC4 = 1, probabilities #( IxL- #1 > E), andmoments FIXn - XIP, as different sequences to be shown to converge to 0 to estab-lish the convergence of Xn to X, respectively a.s., in pr., or in Lv. In thepresent case we consider the expectations of certain functions of the Sn; th keyresult is Theorem 22.8. The practical trick is to find functions that will finger-print the limiting distribution conclusively. The characteristic function is by

Page 386: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

366

common consent the convenient choice, since we can exploit the multiplicative

property for independent sums. This is not the only possible method though, andthe reader can tind an alternative approach in Pollard (1984:H1.4), for example.

The simplest case is where the sequence lA) is both stationary and indepen-dently drawn.

'he Central Limit Theorem

23.3 Lindeberg-Lvy theorem If (XTITis an i.i.d. sequence having zero mean and2variance c ,

l

s = n-llTx /c -.P-+x(o,1). orl t

>124are identical for al1 /, so

(23.3)

Proof The ch.f.s #x(1)of Xt(1 1.33),

(X) =((VVC-'X-1Y))S

p-a .

Applying 11.6 with k = 2 yields the expansion

2 zx j3h-k) I t- 1 - 1/2 j -

ljgyt j x s mij,j j j1t/x(,c n ) -

a w( o'V 6c n

whichmakes it possible to write, for tixed 1,

-1-1/2

yzjz,t + o(,,-a/2pxtlc n ) = 1 - ).

from (11.30) and

(23.4)

(23.5)

(23.6)

Applying the binomial expansion, (1+ aln? =Y:=/:)(J/n)/ --+

;=tg/./!

= ea asIl2 + ()(n-1/2 j'indn

-- x, and setting a = -, ), we2/2lim 9.sa(D= e- . (23.7)

n-+x

Comparing this formula with (11.36), the limiting ch.f. is revealed as that of theN(0, 1) variable. We then appeal to the inversion theorem (11.12)to establishthat the limiting distribution is necessarily Gaussian. .

3 i t necessary for the expansionBe careful to note how the existence of E IXl s noin (23.6)to hold. The Kmin' function whose expectation appears on the majorant

-3/2 j teggable for eachside of (23.5)is unquestionably of On ), but also clearly nn on the assumption of finite variance.

The Lindeberg-l-vy theorem imposes strong assumptions, but offers the benetitof a simple and transparent proof. Al1 the key features of the central limit

- 1/2property are discernible. In (23.6), the expansion of the ch.f. of n Xtcnsists either of terms common to every centred distibution with tinitevariance, or of tenns that can be neglected asymptotically, a fact that ensuresthat the ltmiting sum distribution is invariant to the component distributions.

-1-1/2

j y.t wjchThe imaginary part of xt,c n ) is of smaller order than the rea pa ,

would appear to require a symmetric limit by the remarks following (11.31). Thecoincidence of these facts with the fact that the centred Gaussian is the onlystable symmetric distribution having a second moment appears to r'ule out any

Page 387: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem

alternative to the central limit property under the specified conditions, that is,zero mean and finite variance. The earlier remark that symmetry of the

-1/2 i the rate of convergence to the limit can be alsodistribution of n -Y) mproves3 0 the expansion in (23.5)can be takenappreciated here. If we can assume Ext) =

,-2

to third order, and the remainder in (23.6)is of On ).On the other hand, if the variance does not exist the expansion of (23.6)fails.

Indeed, in the specific case we know of, in which the Akare centred Cauchy.1/22 ()(nl/2); the sequence of distributions of (n3IlX j is not tight, andn n= n

there is no weak convergence. The limit 1awfor the sum would itself be Cauchy-1

under the appropriate scaling of n .

Thedistinction between convergenceindistribution andconvergenceinprobabil-ity, and in particular the fact that the former does not imply the latter, can bedemonstrated here by means of a counter-exnmple. Consider the sequence (A)!'defined in the statement of the Lindeberg-Lvy theorem, and the corresponding Snin (23.3).23.4 Theorem Sn does not converge in probability.

Proof If it was true that plims-oxs'n = Z, it would also be the case that plims--yxu&a,,

= Z, implying

plim hn- Sn) = 0. (23.8)n-4x

We will show that (23.8)is false. We have

n ln

x%u= (2rl)-1/2F'').;&r/c + X kj /c = (Sn+ sp,t

>1 p=l+1

(23.9)

'- l/2 2n j.jwhereSn = n Z,=+1A7c, ence

Szn - Sn =x%41//1'

- 1) + Sn'IV. (23.10)1 2 2According to the Lindeberg-l-vy theorem, 4x()

.-.: exptx, c )and 4x()

-->

expf-r1,2c2j.

sinceno Xt is contained in both sums, Sn and Sn'are independent,

eachwith mean zero and variance n1. Noting g1/A- 1)J2+(1//jj2 = 2 - /j' andapplying the properties of ch.f.s,

(x-snh)

-->

exp (-r12(2

- Wlcz j . (23.11)2n

In other words,

v%zn- sn -P->N(0, c2(2 - /j')), (23.12)whichis the required contradiction of (23.8)..Compate the sequence (&((z))JTwith, say, (xY4))+ Ykln JT, ) G f1, whereXand Fare random variables. For given , the latter sequence converges to a t'ixedlimit,X(). On the other hand, each new contribution to Sn has equal weight Qth theothers, ensured by re-scaling the sum as the sample size increases. For given (,

- 1/2'nn) = n Z)=1m())is not a convergent sequence, for as 23.4 showj Szn is

Page 388: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

368

not neessarily close to Sn no matter how large n becomes. Weak convergence ofthe distribution functions does not imply convergence of the random sequence.

Characteristic function-based arguments can also be used to show convergence indistribution to a degenerate limit. The following is a well-known proof of theweak law of large numbers fori.i.d. sequences, which circumvents theneed to show

f,! convetgence.

Ihe Central Limit Theorem

23.5 Khinchine's theorem If (XIJTis an identically and independently distributede' with tinitemean jt, then kn = rI-17=I.Y) -ZL> g,.sequenc

Proof The characteristic function of kn has the form

kt) = (xlvnqjn. (23.13)

wherye by application of the argument used for 23.3, (xvnj = 1+ ihpln2 Letting n

-->

x we find, by analogy with (23.7),+ 0h ln ).

lim4,,:() = eikk/1

*

n-,>oo

(23.14)

'f fB l for the case where X = jt with probability 1. The distribu-But Ee ) = e on ytion is degenerate, and convergence in probability follows from 22.5. .

23.2 lndependent Heterogeneous SequencesThe Lindeberg-Lvy theorem imposes conditions which are too strong for theresultto have wide practical applications in econometrics. In the remainder of thischapter we retain the assumption of independence (tobe relaxed in Chapter 24),but allow the summands to have different distributions.

In this theory it is convenient to work with normalized variables, such that thepartial sums always have unit variance. Tlis entails a double indexing scheme.Define the triangular array

fxnt, t = 1,...,n, n e EN),2 h that ifthe elements having zero mean and variances nnt, suc

Sn =XXnC,

>1(23.15)

then (underindependence)

Esnh = 57cjf = 1;

->1(23.16)

Typically we would have Xnt= (Fr - gfl/ln where (F,Jis the Kraw'

sequence under2 l EY - w)2.In this case c2 =study, with means (w), and ss = .1 f nt

2 2 h t these variances sum to unity by constnlction. It is alsoE (F, - g. Isn, S0 t apossible to have Xnt = (F,a - bknlsn, the double indexing of the mean arising insituations where the sequence depends on a parameter whose value in turn dependson n. This case arises, for example, in the study of the limiting distributions of

Page 389: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem 369

test statistics under a sequence of 'local' deviations from the null hypothesis,the device known as Pitman drift.

The existence of each variance cjf is going to be a necessary baseline conditionin all the theorems, just as the existence of the common variance o.2was requiredin the Lindeberg-Lvy theorem. However, with heterogeneity, not even unifonnlybounded variances are sufficient to get a central limit result. If the Ff areidentically distributed, we do not have to worry about a small (i.e. finite)number of members of the sequence exhibiting such extreme behaviour as to in-fluence the distribution of the sum as a whole, even in the limit. But in a heter-ogeneous sequence this is possible, and could interfere with convergence to thenormal, which usually depends onthecontribution of eachindividual memberof thesequence being negligible.

The standard result for independent, non-identically distributed sequences isthe Lindeberg-Feller theorem, which establishes that a certain condition on thedistributions of the summands is sufficient, and in some circumstances also neces-sal'y. Lindeberg is credited with the sufficiency pal't, and Feller the necessity',

we look at the latter in the next section.

23.6 Lindeberg theorem Let the array Lxnt) be independent with zero mean andvariance sequence (cjr) satisfying (23.16).Then, Sn -P-+ N(0, 1) if

n j Xlt dp = 0, for all e > 0. EI (23.17)1im X n

1 l IXnt I> e.lnOx >

Equation (23.17)is known as the Lindeberg condition.The proof of the Lindeberg theorem requires a couple of purely mechanical

lemmas.

23.7 Lemma lf x1,...,.v and y1,...,yn are collections of complex numbers with la1

f 1 and lyrlK 1 for t = 1,...,n, thenn l n

1-lxt - 1-1y, K 77Ixt-y,1

.

>1 >1 >1(23.18)

Proof For n = 2,

I.xl.x2- yly2 l = l(.z'1-z1)av

+ (x2- y2)y1IK Im-y1 11x2I+ 1.x2-y2I1y1I< 1x1-y1 I+ Ix2-y2I. (23.19)

The general case follows easily by induction. .

1 h ICZ- 1 -zI < IzI2.23.8 Lemma If z is a complex number and IzI f ;, t en

Proof Using the triangle inequality,

x j x j x j j.jz z 2 z 2 zIe - 1 -

z I =

s.= z .

!S IzI .

a)!.

J. + 2) +j=z j=0 j'

(23.20)

Page 390: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

37O

T 2-7 = 2 the infinite series on the light-hand side cannot exceed 1. .Since Zy=.f) ,

:2/2Proof of 23.6 We have to show that 4x() -- e- as n -- =. The difference isbounded by

The Central Limit Theorem

2 n n 2

I9.s-(1)- e- /2 j= 1-1#xa,(v)- 1-1e-' o'l

>1 >1

.;!

f 119xa,(D- 11(1- xzc2ntl>1 >1

n 2 @ n'- %t12 - jj(j - j,zgzup, gg

.g

j)+ 11e>1 >1

2where the equality is by definition, using the fact that Z)=1cnr= 1, and theinequality is the triangle inequality. The proof will be complete if we can showthat each of the right-hand side terms converges to zero.

2 dThe integrals in (23.17) may be expressed in the form F(1 Xnt), an( IXntI> e:l

0-2= F(1( jxa,!<:)Xn2;J+ F(1( 1xng1>:)X2at)nt

s + F(1f Ix.I>:)xn2,)

2 111 s t < n,M f2 as n--)

x, a (23.22)

since the Lindeberg condition implies that the second term on the right-hand sideof the inequality (whichis positive) goes to zero; since ir can be chosen arbit-rarily small, this shows that

2 () (aa aa)max ot-->

. .

1Kt f n

In the t'irstof the two terms on the majorant side of (23.21),the ch.f.s are al11z2 2 j < jless than 1 in modulus, and by taking n large enough we can make l1 -

g cn/for any tixed value of 1. Hence by 23.7,

n n l

I-IIVnJXI- gJ(1- 1z,2cu2p < X jjxa/) - (1 - xzcznpj.

>1 >1 >1(23.24)

To break down the terms on the majorant side of (23.24),note that 11.6 for thecase k = 2, combined with (11.29),yields

) Ak,() - (1-1.2X2c2,jJjS FtminftkYnJz,j ILALI3j)

2f(1xI>:)X2?:,)+ jj j3:c2?g. (23.25)K (1 nt

2 1Hence, recalling Z:=1ou=,

Ylj(Vr,,(X)- (1 - X2cl) !>1 f 2'XA'(1( IAk,I>s)X,$)+ 161/,13:

>1

Page 391: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Thc Classical Central Limit Theorem 371

(23.26)

since the first majorant-side term vanishes by the Lindeberg condition. Since : isarbitrary, this limit can be made as small as desired.

Similarly, for the second tenn of (23.21),take n large enough so that

n g a n n j aj'j e-).

c#2. jj(1 - j:2c2uj) < y''qje

- oV2. j - lr/czu, j.

>1 >1 /= 1

11113:as n-0

x,0:

(23.27)

12 2 1 ber actually) in 23.8 and applying the result toSetting z = -; Gnt (a rea num ,

the majorant side of (23.27)gives

X 22 X Y

j-j o-).

cn#2. j'j(j . 1yzo.jy; u 1414y g4us

>1 >1 >1(23.28)

But,n N

4 < (maxd/)Xc2u,= max c2u,--y 0 as n--y

x,Xcn,/=l tzz1

(23.29)

by(23.23).The proof is therefore complete. .

The Lindeberg condition is subtle, and its implications for the behaviour of

randomsequences can for careful interpretation. Tis will be easier if we look

t the case xnt= xtlsn,where xt has mean O and variance c2, and.2u

= 17.102,.aThen the Lindeberg condition becomes

1 n

lim X xltdp= 0, for a11: > 0.-i' x jso:)Sn >1 t l tN-x

(23.30)

One point easily verified is that, when the summands are identically distributed,'Zc7 and (23.30)reduces to limn-yx sn-lEx; 1( jxj j>o.G:)) = 0. The LindebergSn =

,

condition then holds if and only if X1 has tinite variance, so that the Lindebergtheorem contains the Lindeberg-l-vy as a special case.

The problematical cases the Lindeberg condition is designed to exclude are thosewhere the behaviour of a finite subset of sequence elements dominates a1l theothers, even in the limit. This can occur either by the sequence becoming exces-sively disorderly in the limit, or (the other side of the same coin, really) byits being not disorderly enough, beyond a certain point.

Thus, the condition clearly fails if the variance sequence (c2f) is tending to2 2 ja jaand if

.2

.-+

zero in such a way that sn = )7=lcr is bounded in n. On the ot er , n

=, then sne -- x for any fixed positive :, and the Lindeberg condition resembles a' ' iform integrability of (-Y2). The sum of the termscondition of average un t

2 l fast than,n2

no matter how close E is to zero.f (1(lx,!>,ns)X,) must grow ess ,

The following is a counter-example (compare12.7).-223.9 Example Let Xt = F, - F(Ft) where 1$= 0 with probability 1 - t , and t with

-2 Thus E(Yt) = f-1 and Xt is converging to a degenerate r.v., equ/lprobability f .

Page 392: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

372 The Central Limit Theorem

-2 f t The Lindebergto 0 with probability 1, although Vartr = 1 - t or every .

2-2

d for 0 < r f 1 we certainly have t >condition fails here. sn = n - 7.1/ , an1/2

sne whenever t > n . Therefore,

1 n

X X26IP- t

'n >1 t lxrl>'nf1

n-1

2-2

(/ - t ) t 01n

-2

,GJ+ln -ZJ=1 tr-r

(23.31)

as n -- x. And indeed, we can show rather easily that the CLT fails here. For anyno 2 1, if we put Bo = )721J, then

n o <x

supP 77F, > Bo < # U l Ii',1 > 0) K X f-2, (23.32)n ,...1 f=x+l r=na+l

where the majorant side can be made as small as desired by choosing no largeenough. lt follows that Z:.1J'f = 07,(1) and hence, since we also have L1=jEYt) =

Otlog n), that xqxtlsn.TJ.-> 0, confinning that the CLT does not operate in thisCaSe. El

Uniform square-integrability is neither sufficient nor necessary for the Lindebergcondition, soparallels mustbe drawn withcaution. However, thefollowingtheoremgives a simple sufficient condition.

2 is uniformly integrable, and slln B > 0 for a11 n, then23.10 Theorem If (.;V) n

(23.30) holds.

Proof For any n and e > 0, the latter assumption implies

1 n 1 277e1( (x,$>&,,eJ.Y2f)< y maxFtlt 1x,l>,ao)Xr)j.-Ekn >1 l:g//n

(23.33)

Hence

1 n 1 a1imaF7F(1( Ix,I>os)Xl) K ylimsupmaxtftlt Ixjl>xuclxrlj

a--)x an >1 n--yx 1S/Kn

1 zs-sup lim F(1( Ix,I>.so)-Y,)B

t N--co

= 0, (23.34)

where the last equality follows by uniform integrability. .

There is no assumption here that the sequence is independent. The conditionsinvolve only the sequence of marginal distributions of the variables. None theless, the conditions are stronger than necessary, and a counter-example as well asfurther discussion of the conditions appears in j23.4.

The following is a popular version of the CLT for independent processes.

23.11 Liapunov's theorem A sufficient condition for (23.17)is

Page 393: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem

N

1imXFIAk,I2+8= 0 for some 6 > 0.

n-yx >1(23.35)

Proof For 8 > 0 and 6r > 0,

2+ 2+E IXntI 2 F( 1f Ixa,I> i;) IXntI ) (23.36)

j.x2

)2 e:E ( Ixa,I>:) nt .

olim k1 lF(1f Ixa,I>:)Xl) = 0 for fixed tr > 0,The theorem follows since, if s ,,-,x =

1 d b 1 .then the same holds with : rep ace y .

Condition (23.35)is called the Liapunov condition, although this tenn is alsoused to refer to Liapunov's original result, in which the condition was cast interms of integer moments, i.e.

n

lim7-)A'lxk,I3 = ().a-+x >1

(23.37)

Although stronger than necessary, the Liapunov condition has the advantage ofbeing more easily checkable, at least in principle, than the Lindeberg condition,

as the following example illustrates.

' dition holds if snhn > Ouniformly in n and A'IX/I2+823.12Theorem Liapunov s con< cxo unifonnly in /, 6 > 0.

Proof Under the stated conditions,

1 n - nBz /1E I.Y,I2+ < < < xX c T-,uSn >1 Sn

(23.38)

2+8 d B < inf snlln. Thenfor all n, where #1 k supe'l AI an 2 n

1 n

lim y7Flxfl 2+= o2+

n-yx Sn -1(23.39)

follows immediately. .

Note that these conditions imply those of 23.10, by 12.10. It is sufficient toavoid the tknife-edge' condition in which variances exist but no moments evehfractionally higher, provided the sum of those variances is also On4.

23.3 Feller's Theorem and Asymptotic Negligibility

We said above that the Lindeberg condition is sufficient and sometimes also neces-sary. The following result specifies the side condition which implies ecessity.

23.13 Feller's theorem Let fxnt) be an indepndent sequence with zero mean andiance sequence (cn2f). If Sn -P-> N, 1) andVar

Page 394: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

374 The Central Limit Theorem

max #( IXarl > E) --> 0 as n-->

x, any e:> 0,1f t S n

(23.40)

the Lindeberg condition must hold. n

The proof of Feller' s theorem is rather fiddly and mechanical, but necessaryconditions are rare and difficult to obtain in this theory, and it is worth alittle study for that reason alone. Several of the arguments are of the type usedalready in the sufticiency part.

Proof Since c2nf--> 0 for evel'y /, the series expansion of the ch.f. suggests that19xn,(%)- 1l converges to zero for each t. ln fact, we can show that the sum ofthe squares of these terms converges, and this is the first step in the proof.Applying 11.6 for k = 0 and k = 1 respectively, we can assert that

l9xa,()- 1I < Ftminlz, I ,;L,I1) < 2#( 1Ak,1> E) + eI11 (23.41)and

I(l)x (l) - 1I S ftmintz IlAk l,l2A$j) < hlzl (23.42)nt N n'*

In each case the second inequality is by (11.29), setting E = 0 for (23.42).Squaring l9xa,()- 1I, adding up over t, and substituting from the inequalities

2 1 btainremembering E:=1cnf= , we on j j n

:7 I#x,,,(l).-1l2 < max jpxa,()- 1 j 77l9xa,()- 1IJ= 1 1f; t < n /= 1

2< 2max #( lXntI > e) + e I,1

'

1S t S n

-->1:13:as n-->

x, (23.43)

using (23.40)and (23.41).This result is used to show that 9Ja(:) = 117=19xu,()can be approximated by

ie d s 1 then Icz-' I -'rcose-l

x 1exp (X:=1#xn,(X)- 1l . Note that if z = re an r ,

using (11.12). Lemma 23.7 can therefore be applied, and for some n large enough,

n n n

expX(9xn,(l)- 1) - I-lqlxnrt'l< 77lexplyxrtl) - 1) - 4x,(l)I>1 >1 >1

M

S 771ttx /1,)- 1l2,n

>1(23.44)

where the second inequality is an application of 23.8 with z = 9xa,(D- 1. Thecondition of the lemma can be satisfied for large enough n according to (23.42)

-2/2 b hoosing e arbitrarily small inand (23.23).By hypothesis (<a(l)-->

e , so y c23 43), (23.44)implies that exp (Z:=1@xn/)- 1)) ---) c-2/2 The limit being a( . .

positive real number, this is equivalent to

Page 395: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem 375

logexptA-l--,(x--(l) -

l)j

= Xcos hxnt - 1) ---l2,

,-1(23.45)

using (11.12)and (11.13)to get the equality.Taking the real part of the expansion in (11.24) up to k = 2 gives cos x =

1 - l 2ty-x for 0 S a S 1, so that / 2

- 1 + cos.x k 0 for any x. Fix 6: > 0 andzx cos

choose > 2/:, so that the contents of the parentheses on the minorant side belowis positive. Then we have

2 o n ?. nX-j--

H.s

X I Xn2t dp f X ( :1hlxnlt- 2 dp

E=' tczj J ( IAk/ I>e:)p.j J (Ik I> c)

n j jzyxsj .j

+ cos tjt dpKX r1 l IXntl> E)

/=

n

S XEtlt,z.;Lz,- 1+ cos hxnt) -- 0,>1

(23.46)

where the last inequality holds since the integrand is positive by constructionfOr evel'y Xnt, and the convergence is from (23.45)after substituting Efczn,= 1.Since E is arbitrary, the Lindeberg condition must hold according to (23.46).w

Condition (23.40)is a condition of Easymptotic negligibility', under which nosingle summand may be so influential as to dominate the sum as a whole. The chief

,-:2/2

ithout the Lindeberg condition, unlessreason why we could have ()sst ) ---) e w(23.40) holds, is that a finite number of summands dominating all the others couldhappen to be individually Gaussian. The following example illustrates.

2 h c2 = 2/ Note that sl = E).12'= 2?JE)-12-'=23.14 Example Let Xt - NnGt) w ere t . n=()

2n+1- 2 and Xnn = Xnlsn - N, lnllln+l - 2)). Clearly Sn - N(0, 1) for every n,7

by the linearity property of the Gaussian, but condition (23.40)fails. TheLindeberg condition also fails, since

nXl dp k Xl dpX n, nn

j ( 1XntI> E l ( IXnnI> E l

--A F(l tIzl>4Is)Z2) > 0 (23.47)

where Z is a standard Gaussian variate. u

A condition related to (23.40)is

P max I.L,I > :--) 0 as n

-->

x, any1f t f n

(23.48)

which says that the largest Xnt converges in probability to zero.

23.15 Theorem (23.48)implies (23.40).

Page 396: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

376

Proof (23.48)is the same as Ptmaxlx/un lA',;/IS :) --+ 1. But

# max 1Aul < : = z' ;)f Ix,,,1; :)ls/fn >1

The Central Limit Theorem

s min #( j-L,l < E)

lslfn

= 1 - max #( I#n,I> e),1K t < n

(23.49)

where the inequality is by monotonicity of #. If the tirst member of (23.49)converges to 1, so does the last. .

Also, interestingly enough, we have the following.

23.16 Theorem The Lindeberg condition implies (23.48).Proof Another way to write (23.17)(interchangingthe order of summation andintegration) is

n

N-%1 x2 f,l () all : > 0..e. - ((-r,,l> e:) nt

--->

,

>1(23 . 50)

2 .-#-r->

0 or equivalently,Accprding to 18.13 this implies Z7.11(jxarjosla'n, , ,

n

P X 1 xI>c)Xl > :2 --> 0 as n -- x,t I nt

>1(23.51)

for any E > 0. But notice that

7) 1 ((t))-Y2((t))> el = (o: max laLf())I> : ,0: ( lA'af1> ::1 nt

>1 1</:n

so (23.51)is equivalent to (23.48).wNote that the last two results hold generally, and do not impose independence onthe sequence.

The foregoing theorems establish a network of implications which it may behelpful to summarize symbolically. Let

L = the Lindeberg condition;I = independence of the sequence;

,-2/2

AG = asymptotic Gaussianity @k( ) -- e );AN = asymptotic negligibility (condition(23.40));andPM = max 1XntI -.E(.-y0 (condition(23.48)).

Then we have the implications

L +1 = AG +PM +1 = AG +AN +1 = L + 1, (23.53)

(23.52)

where the tirst implication is the Lindeberg theorem and 23.16, the second is by23.15, and the third is by the Feller theorem. Under independence, conditions L,AG + PM, and AG + AN are therefore equivalent to one another.

Page 397: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem 377

However, this is not quite the end of the story. The following example shows thepossibility of a true CLT operating under asymptotic negligibility, without theLindeberg condition.

l Let X = 1 and - with probabilities ,1(1 - /-2) each and t and-.t

23.17 Examp e t z ,

ith probabilities ,1/-2 each, so that Ext) = 0 and Var(XJ= j- t, and Jn2=W

-1 jtjajrl + 0(1). This case has similar characteristics to 23.9. Since l-Y,,rl= tsn w-2 d 1

-1

therwise, we have for any E > 0 that, whenever n isprobability t an gsn olarge enough that Esn > ,

-2#( IX//I > E) ; ns (23.54)-' Since nz = O(rl1/2) (23.40)where ne is the smallest integer such that nzsn > E. ,

holds in this case. However, the argument used in 23.9 shows that the Lindebergcondition is not satisfied. F(1( Ixa,I>:).Y$) sn-l for t k ng, and hence

&

X xltdpz n - nsn-l .-+ 4 > 0. (23.55)n 51 l Ivn/I> E,lr=

However, consider the random sequence (W'f), where Gr = Xt when ImI = ,1andW'r= 0 otherwise. As t increases, Gf tends to a centred Bernoulli variate with p =

1, and defining ll'n, = Wtlsn, it is certainly the case thatn

X 1#'n/-P- NQ, 15). (23.56)>1

However, lX, - W',Iis distributed like I'r in 23.9, and applying (23.32)shows thatE7=1IXq- lP,1 = Op(1), and hence l 7=1Xn,- Z'/=1W'n,l f :-1IXnt- W%,1 =

O (n-1/e)tt follows that xtxnt-P- 1(0, j), according to 22.18. n# *

A CLT therefore does operate in this case. Feller's theorem is not contradictedbecause the limit is not the standard Gaussian. The clue to this apparent paradoxis that the sequence is not unifonnly square-integrable, having a component whichcontributes to the variance asymptotically in spite of vanishing in probability.ln these circumstances Sn can have a

tvariance' of 1 for every n despite the factthat its limiting distribution has a variance of 15!

23.4 The Case of Trending VariancesThe Lindeerg-Feller theorems do not imposeuniform integrability or Lr-bounded-

ness conditions, for any r. A trending vatiance sequence, with no uniform bound,is compatible with the Lindeberg condition. lt would be sufficient if, for

2 for each finite t and the unit-variance sequence tXf IGtj isexample, o't < c,o

uniformly square-integrable, provided that the variances do not grow so fast thatthe largest of them dominates the Cesro sum of the sequence. The following is anextension of the sufficient condition of 23.10.

2/ 2 is uniformly integrable, where (c,) is a sequence23.18 Theorem Suppose (aYfct )of positive constants. Then fA) satisties the Lindeberg condition; (23.30),if

Page 398: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

378 The Central Limit Theorem

2 2sup nMnlsn = C < x,

n(23.57)

where Mn = maxlsfxacr. u

One way to construct the ct might be as maxf 1,c/) . The variances of the trans-fonned sequence are ten bounded by 1, but czf= 0 is not rtlled out for some f.

Proof of 23.18 The inequality of (23.33)extends to&

177e-(1(Ix,I>.c)x:l) < u''max czfe'(1(xictb>sncctbxlcljj

--i l sn 1x,x,,Sn f=

< c max te'(1( Ixgctlx,wctl xtlctthj.1f t f n

The analogous moditkation of (23.34)then gives

(23.58)

suplim .E'(1(txtlctt:t-snv-lctt.xctql)= 0. .( n-+txz

(23.59)

Notice how (23.57)restricts the growth of the variances whether this be positive1 jsrmjyor negative. Regardless of the choice of (cr),it requires that snln > 0 un

in rl. It permits the ct to grow without limit so long as they are finite for allt, so the variances can do the same; but the rate of increase must not be so rapid

as to have a single coordinate dominate the whole sequence. If we 1et ct =

2 ctmaxt 1,cf) as above, (23.57)is satisfied (accordingto 2.27) when c, - t for anyG k 0 but not when gzf - lt.

ln fact, the conditions of 23.18 are stronger than necessazy in the case ofdecreasing variances. The variance sequence may actually decline to zero withoutviolating the Lindeberg condition, but in this case it is not possible to state a

1sufficient condition on the sequence. If cl - t with-1

< a < 0, we wouldgenerahave to replace (23.33)by the condition'

n 1 a177F(1( Ix,I>xa:)Xl) Segy

ymsyasxsFtltjxtjxsasl-vlj,

-i' 1sn =

(23.60)

2 1+a 2 1+Gwhere B = infnsnln ) > 0 by assumption (note,sn - n under independence).Convergence of the majorant side of (23.60)to zero as n

-->

cxo is not luled out,but depends on the distribution of the Xt.

The following example illustrates both possibilities.

23.19 Example Let (X,) be a zero-mean independent sequence with Xt - Ilt-ftx,falf 1 (x such that c2 =

lG either growing with f (a > 0) or decliningor some rea , t !. ,

with t (a < 0). However, Xt is Ax-bounded for tinitef (see8.13). The integralsin (23.30)each take the fonn

1 tz21(jtI>,ac)( tt,

zta t,,-t

(23.61)

Page 399: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

The Classical Central Limit Theorem 379

2 1 n 2a n 2a 2a+1 1 1where sn = !.Zz=1T. Now, Zz=l'r = On ) for a > -r and Otlog n) for a = -z

(2.27). Condition (23.57)is satisfied when (x 0. Note that (23.61)is zero ifn 2a)1n a. n

,r2a)1/2

jaster than n for all G 0, and hence(Yw1T E, > t , ( wl gCOWS

(23.61) vanishes in the limit for every t in these cases, and the Lindebergg t p,n ja te ascondition is satisfied. But if Xt - &(- ,2 1, grows at t e same ra

n 2231/2 the above argument does nOt apply, and the Lindeberg condition(Zw1 '

fails. Note how condition (23.57)is violated in this case.However, the fact that condition (23.57)is not necessal'y is evident from the

fact that the variance sum diverges at a positive rate when Xt - ug-ftxltX) for any1

(x - even though the variance sequence itself goes to zero. lt can be verifiedthat (23.61)vanishes in the limit, and accordingly the Lindeberg conditiop holds,

1 2 i bounded in the limit andfor these cases too. On the other hand, if (x < -,, sn s(23.17) becomes

:3 '' 1 tc'

lim:7 1(ItI>4s,x)t2d,- a/a (,n-yx >1 -.t

(23.62)

2 db choice of small enoughy, (23.62)canbe madewheres = limn-yxzkl/ < x, an yarbitrarily close to 1. This is the other extreme at which the Lindeberg conditionfails. n

Page 400: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

24CLTS for Dependent Processes

24. 1 A General Convergence TheoremThe results of this chapter are derived from the following fundamental theorem,due to Mctaeish (1974).24.1 Theorem Let (Z,,j, i = 1,...,rn, n e IN) denote a zero-mean stochastic array,

25where rn is a positive, increasing integer-valued function of n, and 1et

rn

Trn = 11(1+ i i), > 0.f=1

(24.1)

Then S = Z%1Z,,f ,--OD /40, 1) if the following conditions hold:' ra I

(a) Fra is uniformly integrable,(b) E lhj

-->

1 as n-->

=,

ra

z2 .TI.A 1 as n-->

x,(c)X ni=1

(d)maxlxjsrnlzsjl .-E(-> 0 as n-->

c.,. n

There are a number of features requiring explanation here, regarding both thetheorem and the way it has been expressed. This is a generic result in which theelements of the array need not be data points in the conventional way, so thattheirnumberrn does notalways correspond withthenumberof sampleobservations,

n. rn = n is a leading case, but see 24.6 for another possibility.lt is interesting to note that the Lindeberg condition is not imposed in 24.1,

nor is anything specific assumed about the dependence of the sequence. Condition24.1(d) is condition PM defined in (23.48),and by 23.16 it follows from theLindeberg condition. We noted in (23.53)that under independence the condition isequivalent to the Lindeberg condition in eases whete (aswe shall prove here) thecentral lirnit theorem holds. But without idependence, conditions 24.1(a)-(d) neednot imply the Lindeberg condition.

Proof of 24.1 Consider the series expansion of the logarithmic function, definedfor 1x1< 1,

l (1 + x) = x - / 2 + 1x3

-Og A ...

Although a complex number does not possess a unique logarithm, the arithmeticidentity obtained by taking the exponential of both sides of this equation iswell-detined whep x is complex. The formula yields

Page 401: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 381

1+ ihzni = expt Qjlexp(,zzazj

+ rihzni) j, (24.2)3 for 1x1< 1. Multiplying up the termswhere the remainder satisfies I6x)I f Ixl

for i = 1,...,rn yields

exp(ih'rn ) = TrnUrn,

where Fra is detined in (24.1) and

rn rn1k2 z2 - rtjz j)

.Ur = exp -

r ni nn

i=1 f=1(24.3)

Taking expectations produces

kln-:2/2

(srnk)= ETrnurn) = e- El-rn) + F(Frn(&rn - e )), (24.4)

i b) of the theorem, t1). (D--> e-hlll ifso given condit on ( rn

lim E ITrnurn -

e-khl) jn-jx

(24.5)

The sequence

(24.6)

is unifonnly integrable in view of condition (a),the first term on the right-handside having unit modulus. So in view of 18.14, it suffices to show that

jhl-:2/2

Trnurn - e- ) = exp fihsrn l - Tne

2/2-1plim Fr urn- e ) = 0.n

n.-/co

(24.7)

Since Trn is clearly 0p(1), the problem reduces, by 18.12, to showing that-klll d for this in turn it suffices, by condition (c), ifplimn-pxt/o = e , an

rn

plim rihzn = 0.l->:xa f=1

(24.8)

To show that this convergence obtains, we have by the triangle inequality

/ rn

,I3 '

max lzniI 77zlni.l $.1f i f n f=1

The result now follows from conditions (c) and (d), and 18.12. w

lt is instructive to compare this proof with that of the Lindeberg theorem. Adifferent series approximation of the ch.f. is used, and the assumption fromindependence, that

(b- = flyzn,# n

is avoided. Of course, we have yet to show that conditions 24.1(a) and 24.1(b)hold under convenient and plausible assumptions about the sequence. The rest of

rn rn

77rihzn f I l377 IzrlfI3 szz:1 izzz1

(24.9)

Page 402: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

382 The Central Limit Theorem

this chapter is devoted to this question. M.1(b) will turn out to result fromsuitable restrictions on the dependence. M.1(a) can be shown to follow from amore primitive moment condition by an argument bsed on the Kequivalent

sequences' idea.

24.2 Theorem For an array fZni), 1et

2 i = znjltzkf--lznzls 2). (24.10)n

i) The sequence T = H:1(1 + iLZ- ) is unifonnly integrable if( ra , nf

2 :,4 j j)sup E max Zni < x. ( .

n j j < rs

And if Etnlzzf-.E(..y1, thenf n

G..

,-2

pr(11) Znf 1;f=1

rn* N

-

(iii) Sr. =.

.zi

has the same limiting distribution as Srn.i'=1

Proof Let

min : E./f-lzn2j > 2 ), if E(I2 1zn2j > 2Jn = (24.12)

rn, otherwise

such that Zni = 0, if at all, from the point i = Jn + 1 onwards. Note that

rn Jn

- 2 j + szz-2 j + :2z2 )ITrnI = 11( ni4 = 1-1( ni .

f=1 f=1(24.13)

2z2 1and positive. The inequality 1 +x < ex for x > 0 impliesThe terms , nf are reathat 17j(1+xij S I'LeAf for xi > 0. Hence,

Jn - 1

T I2= j''j(1 + :2z2 )() + hlz )I rn ni n

=1

f exp (VV2-11Xj)j(1 + klz.,h)

22 2 2< e (1+ ,Z,,Jnl, (24.14)

where the last inequality is by definition of Jn. Then by (24.11),2 22 2 2

supFITrnI f e (1+ sup Ezunjj < x. (24.15)Uniform boundedness of Sl /raj2 is suftkient for unifonn integrability of lk,proving (i). Since by construction E;21ni ,

Page 403: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 383

# XZ-l # X Z2 = # XZn2j> 2ni ni

i i i

< P j7)z2aj- 1j> : , for 0 < s < 1,i

--) 0 as n ---> x, (24.16)

by assumption, which proves (ii). In addition,.

Pzni y, kni,some 1 < i < rn) = Pklr Izn2: > 2) --)

0 as n--->

x,

so Ilk - SnI -.E(-> 0, and by 22.18, j'n and Sn have the same limiting distribution,proving (iii). .

24.2 The Mmingale CaseAlthough it pennits a law of large numbers, uncorrelatedness is not a strongenough assumption to yield a central limit result. But the martingale differenceassumption is similar to uncorrelatedness for practical purposes, and is attrac-tive in other ways too. The next theorem shows how 24.1 applies to this case.

24.3 Theorem Let tXnf,8Lrlbe a martingale difference array with finite uncondi-tional variances (c$) , and XT=IIU,= 1. If

(a)77-Y,$ ...r...y

1, and>1

(b)maxlsrxnlAkl-r..+

0,

thensn = Xk, .---:D x(0,1).>1

Proof We use 24.1 and 24.2, setting rn = n, i = t, and Zni = Xnt. Conditions (a)and (b) are the same as (c) and (d)of 24.1, so it remains to show that the otherconditions of 24.1 are satisfied; not actually by Xnt, but by an equivalent

sequence in the sense of M.2(iii).If Tn = 17:.1(1+ ihxnt), we show that limn-pxErn) = 1 when (XnfJ is a m.d.

an-ay. By repeated multiplying out,n

Tn = 1j41+ ihxnt) = Fn-l + ihTn-qxnn

>1ZZ ...

= 1 + ay-lzk-lxsf.,.l

(24.17)

F = H'-141+ ikx ) is an Tf-l-measurable r.v., so by the LIE,t- 1.=1

?t

Page 404: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

384 'he Central Limit lheorem

/1

Evn) = 1 + ikTEvt-kxnt)>1

n

= 1 + fl,y7F(z',-IF(xn,l @s,,-l)) = 1.)>1

(24.18)

This is an exact result for any n, so certainly holds in the limit.If Xnt is a m.d., so is knt= Xnt1(Zl=-lX2kf 2), and this satisfies 24.1(b) as

above, and certainly also 24.1(d) according to condition (b)of the theorem. Sincel 3EXlnt) = 1, condition (24.11)holds for Xnt. Hence, kntsatisfies 24.1(a) and=

24.1(c) according to 24.2(1) and (ii),and so obeys the CLT. The theorem nowfollows by 24.2(iii). .

This theorem holds for independent sequences as a special case of m.d. sequences,but the conditions are slightly stronger than those of the Lindeberg theorem.Underindependence,we know by (23.53)thatM.3tb) is equivalenttothe Lindebergcondition when the CLT holds. However, M.3(a) is notAconsequence of the Linde-

2berg condition. For the purpose of discussion, assume that Xnt = Xlsn, where sn =

2 d 2 EXl) Under independence, a sufficient extra condition forX7=1cfan cf =t .

2/ 2 be uniformly integrable. In this case, inde-24.3(a) is that the sequence (-Y,Gt )dependence of (.Y2,), (-Y2,- g2,, @,j is a m.d., and 19.8pendence of (Xr1 implies in

2 d b = s2) gives sufficient conditions for Js-2E).j(V- o2f)..T-(.y0.(put an = sn an t tThis is equivalent to 24.3(a). But of course, it is not the case that (Xl - czf

,

@,) is a m.d. merely because tatsFf l is a m.d. lf 24.3(a) cannot be imposed in anyher manner, we should have to require Exlt j@r-1) = o'zf

, a significant strengthen-oting of the assumptions.

On the other hand, the theorem does not rule out trending variances. Followingthe approach of j23.5, we can obtain such a case as follows.

24 4 Theorem lf (.&,?h)is a square-integrable m.d. sequence and E(XltI@,-j)= 0'2,@

2y2 jsa.s., and there exists a sequence of positive constants fcf) such that fXf cf )unifonnly integrable and

2 2 :4 1q)sup nMnlsn < x ( .

n2 2where Mn = maxlxruncr, conditions 24.34a) and 24.3*) hold for Xnt = Xtlsn.

Proof By 23.18 the sequence (.L,) satisfies the Lindeberg condition, and hence,24.3(b) holds by 23.16. Note that neither of thes results imposes restrictions onthe dependence of the sequence. To get 24.3(a), apply 19.8 to the m.d. sequence

2 o'2 tting p = 1, bt = cl, and an =. The sequence ((V- c2)/c2j is(Xt - t4, pu t t

2 2uniformly integrable on the assumptions, and note that kl=kbt; nMn = Osn) andhat lcsbltS nsfj = t?(u4), both as consequences of (24.19).The conditions of 19.8t

are therefore satisfied, and the required result follows by 19.9. w

Page 405: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 385

24.3 Stationary Ergodic Sequences

It is easy to see that any stationary ergodic mmingale difference having finitevariance satisfies the conditions of 24.3. Under stationarity, finite variance issufficient for the Lindeberg condition, which ensures 24.3*) by 23.16, and24.3(a) follows from the ergodicity by 13.12.

The interest of this case stems from the following result, atibuted by Halland Heyde (1980:137) to unpublished work of M. 1. Gordin.

24.5 Theorem Let (A,@r) be a stationary ergodic Al-mixingale of size-1,

and ifSn = Z:=IX, assume that

-1/2 s j < x (g4.z())limsup n FI n .

N.->oo

-1/2 2. 0 f 2. < oo. If , > 0 n-1IlS .-2-.I #(0 1zG2) uThen n E lSn1 -->

, , n , ; .

2Notice that the assumptions for this CLT do not include EXj) < x.

Proof Let XBt= .Y,1(Ix,Ixs). The centred sequence

YBt=

X'Bt-

ECXBt I1,-1) is a2 2#2 and hence n-llll J'Pstationaryergodic m.d. with bounded variance cs < ,

.1

t-P-->N cs2),by 24.3. Further, the m.d. orthogonality property implies1

n lF s-1/2y-q YB

= cs2 (24.21)t .

>1

The sequence (n-1/21E'/=1F( lJis therefore uniformly integrable (12.11),and bythecontinuous mapping theorem it converges in distribution to the half-Gaussianlimit(see 8.16)., hence, by 9.8,

-1/2 B wa)1/2n E X F t-->

GB .

>1(24.22)

Now define Ff = Xt - F(.&IF/-1), corresponding to 'Bt for B = x, and apply thedecomposition of 16.6 to write Xt = Ff+ Zt - 4+1, where F, is a stationaryergodic m.d. and Zt is stationary with FIZ1I < oo. Hence Sn = Z:=1l'r + zl - zn+1,and by (24.20)there exists A < x such that

-1/2 jjmsup nIlE js%- zj + zu4.jjlimsup n E X l't =

n

n--yx >1 n--x

s limsup ,,-''2(e'1 -sr,1

+ 2FIz1I) < 4. (24.23)N-)oo

Noting that cs2= J(Ixjluslxzjlr (cs2,B = 1,2,...) is a monotone sequence converg-ing either to a finite limit c (say)or to +=. In the latter case, in view of

- 1/2(24.22)there would exist N such that n FI17=1FrI> A for N n, which2 T king B = x in (24.22)contradicts(24.23),so we can conclude that c < x. a

- 1/2yields, in view of the fact that n FIz1- Zn+1I -- 0,

Page 406: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

386 The Central Limit Theorem

l

-1/2 S 1 = ?1-1/2F y-')J' + zj - zux.j-.-y c(2/a)1/2n FI n , .

,--1(24.24)

1/2Hence, l = c(2/a) . Since I is now known to be a stationary ergodic m.d. with

ite variance c2, and c2 > 0, 24.3 and 22.18 imply that n-1/25' --P--yN(O,c2),fin n

whichcompletes the proof. w

This result can be thought of as the counterpart for dependent sequences of theLindeberg-l-evy theorem, but unlike that case, we do not have to assume explicitly

2 I de endence of (Xf) enforces the condition Zt = 0 for al1 t, andthat '(X1) < x. n phen the conditions of the theorem imply EXI)= nl. It might appear that thet

existence of dependence weakens the momentrestrictions required for the CLT, butthis gain is more technical than real, for it is not obvious how to construct astationary sequence such that Xl is not square-integrable but l'1 is. 'I'he mostuseful implication is that the independence assumption can be replaced by arbi-trary localdependence (controlledbythemixingale assumption) without weakening

any of the conclusions of the Lindeberg-Levy theorem.

24.4 The CLT for NED Functions of Strong Mixing ProcessesThe traditional approach to the problem of general dependence is the so-calledmethod of tBernstein sums' (Bernstein 1927). That is, break up Sn into blocks(partial sums), and consider the sequence of blocks. Each block must be so large,relative to the rate at which the memory of the sequence decays, that the degreeto which the next block can be predicted from current information is negligible;but at the same time, the number of blocks must increase with n so that a CLTargument can be applied to this derived sequence. It would suffice to require the

sequence of blocks to approach independence in the limit, but a result can also beobtained if it behaves asymptotically like a martingale difference. This is theapproach we adopt.

The theorem weprove (fromDavidson 1992, 1993b) is given in two versions. Thefirst, 24.6, is fully general but the conditions are complicated and not veryintuitive. The second, 24.7, is a special case, whose conditions are simpler but

cover almost al1 the possibilities. The excptional cases for which 24.6 is essen-tial are those in which the variances of the process tend to 0 as t increases.

24.6 Theorem Let tXnf, t = 1,...,n, n 1) be a triangular stochastic array, let(XFt,

-c., < t < x, n k 1) be a (possiblyvector-valued) stochastic array, and lett,5n ,

'l-m= c(F,u, t - m S s S t + m). Also, let Sn = Xlcjxnt.Suppose the following

assumptions hold:t kf-measurable, with Exnt) = 0 and ESn1) = 1.(a) Xnt is Fn,-x

(b) There exists a positive constant array (curlsuch that supngllAk/cnrllr < cxa

for r > 2.(c) Xnt is Q-NED of size

-1

on (J&fl with respect to the constants lcnf)speci-fied in (b), and (Fpgl is a-mixing of size

-(1

+ 20)r/(r - 2), 0 S 0 < 1a.

Page 407: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 387

1-a(d) Letting bn = grl 1 and rn = Lnlbnjfor some (x e (0,11,and defining Mni =

maxi-llbn<tsibnfcnrl for = 1,...,rn and sin,rn..j= maxrnu-crsnfcn/),the following conditions hold:

- 1/2max Mni = obn ),

1f K rn+ 1

rn+ 1()-.1/2Mni = Olbn ),

izz1

(24.25)

(24.26)

where 0 is given in (c), and

rn+l2

-1

Mni = Obn ).f=l

(24.27)

Then, sn--P- N(0,1). n

24.7 Corollary The conclusion of 24.6 holds if assumptions (c) and (d) arereplaced by(c') Xnt is AZ-NED of size

-1

on (J&,), which is G-mixing of size-r/(r

- 2).(d') Letting Mn = maxlxrxa (caf),

2 :,4 g8)sup nMn < x. I:a ( .

N

' i t 24 6(c) with 0 = 0. If (24.28)holds, Mni = 0(n-1/2) forNote that (c) s jus .

each i. Since rnbn - n, (24.25)and (24.27)hold in this case for any (x in (0,12.While (24.26)holds only for a strictly positive choice of 0, with 0 < a S20/(20 + 1), 24.7(c') entails satisfaction of 24.6(c) for some 0 > 0. Hence, 24.6contains 24.7 as a special case.

The adaptation assumption in 24.6(a) will be needed because of the asymptoticm.d. property the Bernstein blocks must possess', see 24.19 below for the applica-tion. This assumption says that Xntmust not depend onfuture values of the under-lying mixing process, F,u for s > t. In econometric applications at least, such anassumption would typically be innocuous. The remaining parts of condition (a)specify the assumed normalization.

The roles of the remaining conditions are not particularly transparent in eitherversion of the result, and in reviewing these it will be helpful to keep in mindthe leading case with Xnt = (1$- plsn where n

= F(J=1(Ff - w))2,although

more general interpretations are possible, as noted in j23.2. ln this case itwould often be legitimate to choose

cnt = maxtcsl )/Jn, (24.29)2 i the variance of 1$. The cnt have to be thought of as tending to zerowhereo't s

withn, although possibly growing or shrinking with t also, subject to 24.6(d) or' B tocorrelation of the sequence is not ruled out, J.2 is no24.74d). ecause au

longerjust the partial sum of the variances, but is defined as

Page 408: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

388 The Central Limit Theorem

n n f - l

= X c2,+ 2:7 X wt-k,,=1 ,-2 t--l

(24.30)

where okt-k= Cov(J',,l',-1).Assumptions 24.6(c) or 24.7(c') imply, by the norlp inequality, that

IIXn,- Exnt6 ?L'+,T-,,,)lIpS cnfvm, (24.31)for 0 < p < 2, where v,u is of size

-1.

The following lemma is an immediate

consequenceof 17.6.

z4.8l-zemmaunderassumptions 24.6(a), (b),and (c),(Xnr,?Lf,-x)isanfw-mixingaleof size

-mint

1, (1+ 20)(r-#)/p(r- 2)) for 1 K p S 2, with constants (cnfl.nIn particular, when p = 2 the size is

-(

+ 0), and when p = r/(r - 1) the size is-1. Under the assumptions of 24.7 the same conclusions apply, except that 0 = 0.

There is also a more subtle implication which happeps to be very convenient forthe present result. This is the following.

24.9 Lemma Under 24.6(a) and (b),plus either 24.6(c) or 24.7(c'),

Fl E(XntXn,t+k3F-'--''') - cnt/ul S cntcn,t-vklm (24.32)and

FIX X k- Exntxntnb F-'-++''')l S cntcn,t-vm..k (24.33)nt n,f+

-1- j : > () ufor each k e !N, where nnts = Exntxnsj, and (??:= Om ) Or .

These inequalities are convenient for the subsequent analysis, but in effect thelemma asserts that for each fixed k the products Xntxnvtwk,after centring, form

2 2fal-mixipgales of size-1

with constants given by maxtcnl,cn,-ll . One of thesemight be written as, say, (Unt,snt,-xt, where

Unt = Xn-t-fklzxn,t-vk-tuzj- c,,,,-(w2j,,+k.-Ep?2j.

The mixingale coefficients here are (p,'= V for m = 0,...,k/2), and (,,,'= (p,-w2J

for m > (/c/21.Proof of 24.9 The array (Xntxnvtwklis AI-NED on (Fnfl of size

-1

by 17.11. Theconclusion then follows by 17.6(i), noting that any constant factors generated informing the inequalities have been absorbed into (,, in (24.32)and (24.33)..

Npw consider M.6(d) and 24.7(d'). These assumptions permit global nonstation-arity (see j13.2). This is a fact worthy of emphasis, because in the mctionalCLT (seeChapters 27 and 29 for details) global stationarity is a requirement. lnthis respect the ordinary CLT is notably unrestrictive, provided we normalize by

sp as we do here. The following cases illustrate what is allowed.

24.10 Example Let (F,Jbe an independent sequence with variances (U- t for any13k 0 (compare13.6). It is straightforward to verify that assumption 24.7(d') is

Page 409: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 389

tisfied for Xnt = i'f lsn where cnt = cf lsn, and in this case.,:2

= Z)=1(U.It issahowever violated when (U- lt, a case that is incompatible with the asymptoticnegligibility of individual summands (compare23.19). lt is also violated when j3<

0 (seebelow). (n

24.11 Example Let (i',) be an independent sequence with variance sequence gen-erated by the scheme described in 13.7. Putting Xnt = l', lsn, 24.7(d') is satis-tied with cnt = 1/Jn for a11 t. u

Among the cases that 24.7(d') rules out are asymptotically degenerate sequences,having cl ..-y 0 as t

-->

x. In these cases, maxlslsacl = 041) as n-->

x, but' it will usually be the case that snlln --+ 0. It is certain of thesegiven 24.7/ ),

cases that assumption 24.6(d) is designed to allow. To see what is going on here,it is easiest to think in terms of the array (cnf) as varying regularly with n and

t, within certain limits to be determined. We have the following lemma, whoseconditions are somewhat more general than we require, but are easily specialized.

2 fG-Y-1 for j3y G R . Then, 24.6(d) holds iff24.12 Lemma Suppose cnt -

,

13f Y (24.34)and

;$ < 2g0 +y(1 + 0). ca (24.35)

Notice that 13and y can be of either sign, subject to the lndicatedconstraints.

Proof We establish that an a e (0, 11exists such that each of conditions (24.25)-2 ib )V-Y-1 for j k 0, or Mnli -..(24.27)are satisfied. We have either Mni - ( n

fs-:-1

for I < 0, but in both casesi - 1):n) nrn

l 1+pyf$-m1

(1+;$)1.(1-(y,)IJ-'y-1Mni - rn nn - n .

izu1(24.36)

Simplifying shows that condition (24.34)is necessary and sufticient for (24.27),independently of the value of a, note. Next, (24.25)is equivalent to

2 a-1 :,4 Ly)max cnt = on ), ( .

lfrsn

which (sincethe maximum is at t = n for 1)2 0, and t = 1 otherwise) imposes therequirement

maxlp,ol - '/ < a. (24.38)In view of (24.34)this constraint only binds if '/ < 0, and is equivalent to just

-y < a. (24.39)Also,

rn

M -rl+;V2yfV2

-('y+1)/2

ni n n nf=1

Page 410: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

390 The Central Limit Theorem

G(1+j/2)+((l-u)k-y-

1)/2-N , (24.40)

and (24.26)reduces to

(24.41)l

The existence of (x satisfying (24.41)and (24.39)requires two strict inequalitiesto be satistied by ,

'y, and 0, but since 0 is positive the first of these, wlichis 0 > (fJ - $, holds by (24.34).The second is -F1 + 20) < 20 + y- p,which isthe same as (24.35).lt follows that (24.34)and (24.36)are equivalent to24.6(d), as asserted. .

2 j) :4 aoln terms of the leading case with Xnj = (l'f- bjlsn, with sn given y ( . ,

ider first the case cl - /, j k 0. To have cnt monotone in t (notessential,consbut analytically convenient), we could often set

2: +y- jla f .

1+20

cnt = max (cxklsn.lix st

(24.42)

In this case, under 24.6*) and 24.7(c'), the conditions of 17.7 are satisfied.Note however that 11l'r - wllr- c, is required by 24.6(b). Since ((1)is of size

-1

we have, substituting into (24.30),n n t - 1

2 << 7'2fl5+ 2X//27)(f- k)F/2( = ()(n1+@). (24.43)Sn

>1 >1 k=1

This only provides an upper bound, and condition 24.7(c') alone does not exclude

h ibility that sl -a1+Y for some

'y

< p.However, compliance with conditiont e poss n' 2 2(24.34), which follows in tur from M.7(d ) (note,Mn = cnn in this case),

enforces X= f'1.This condition says that the variance of the sum must grow no moreslowly than the sum of the vmiances.

2 - t' with 13< 0. Here, we might be able to setNow consider the case cf

cnt = suptcxl/-n. (24.44)st

' d (2434) we would again have c2, - t G-p-1but here Ml =under24.7(c ) an . n , n

c2./5.2for some f* < =; hence Mnl- a-I5-1 and 24.7(d') ceases to hold. However,t n y

with ('J= y, condition (24.35)reduces to

-2ef5>1 +ae, (24.45)

and it is possible for the conditions of 24.12 to be satisfied, although only with0 > 0. As 0 is increased, limiting the dependence, a can be increased according to(24.36) and this allows smaller p, increasing the permitted rate of degeneration.As 0 approaches z1,so that the mixing size approaches

-2r/(r

- 2), ;$may approach-1 with (x also approaching .7,

These conclusions are summarized in the following corollary. Pal't (i) is a case

Page 411: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Depettdent Processes 391

of 24.7, and pal't (ii) a case of 24.6.

24 13 Corollary Let Xnt = (l'f- ptllsn where cl - t 15and u-

rI1+ If either@

(i) 0 ; I < x, and 24.6(b) and 24.7(c') hold with cnt detined in (24.42);or(ii) there exists 0 such that (24.45),24.6(b) and 24.6(c) hold with cnt

defined by (24.44);then Sn .-P->N0, 1). (:I

-1/2 jjmit on the permittedBy an apparent artefact of the proof, t represents arate of degeneration of the variances. We may conjecture that with mixing sizesexceeding 2r/(r - 2), the CLT can hold with 13f

-,1,

but a different method ofattack would be necessary to obtain this result. Also, both the above cases appearto require that (x e (0,r1),and it is not clear whether larger values of (x mightbe possible generally. The plausibility of both these conjectures is strengthenedby the existence of the following special case.

24.14 Corollary If conditions 24.6(a) and (b) hold, and also(c'') (aYt,5nt

-.x ) is a martingale difference array, andn ,

(d'') conditions (24.25)and (24.27)hold with bn = 1,then Sn -P- N(0,1). nln the case Xnt = (Ff - (Qlsn,where by the m.d. assumption Jn2

= E7=jo'2f, note

that (24.27)is satisfied with bn = 1 by constnlction, and (24.25)requires only

hat sl '1-

x, so that (U- / is permitted for any 1$-1

under 2.14(d''). Thist n

result may be compared with 24.4, whose conditions it extends rather as 24.6extends 24.7. The proof will be given below after the intermediate results for the

2proof of 24.6 have been established. As an example, 1et Fr - g,rbe a m.d. with nt= nllt where c2 is constant, so that Jn2

- log n. Corollary 24.14 establishes that-1/2) (F - w)-P-+ N, c2).(log n4 .1

t

The limit on the permissable rate of degeneration here is set by the requirement2 f the variances are summable, the central limit theorem surelysn

--y

cxa as n-->

x. Ifails. Here is a well-known case where the non-summability condition is violated.

24.15 Example Consider the sequence (Ff) of first differences, with F1 = Z1 and

Ff = Zt - Zt- 1, t = 2,3, ...,

where (4) is azero-mean, unifonnly Lr-bounded, independentsequence, with r> 2.2 V (Z ) = /41). nHere fFf) satisfies 24.6(a)-(c), but E:=1F, = Zn and sn = ar n

24.5 Proving the CLT by the Bernstein Blocling Method

We shall prove only 24.6, the arguments being at worst a mild extension of thoserequired for 24.7. We show that with a suitable Bernstein-type blocking scheme theblocks will satisfy the conditions of 24.1. In effect, we show thay the blocksbehave like martingale differences. In most applications of the Bernstein approchalternating big and small blocks are defined, with the small blocks containing of

Page 412: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

392 The Central Limit Theorem

15tl-t* ds for some ;'$e (0,1),small enough that their omis-the order of (/l 1summansion is negligible in the limit but increasing with n so that the big blpcks areasymptotically independent. Our martingale difference approximation method hasthe advantage that the small blocks can be dispensed with.

Define bn and rn aj in condition 24.6(d) for some e (0,1),and let

ibnZni = 77 Xntni = 1....,rn, (24.46)

,-(f-l)!u+1such that

n rn

Sn = Xnt = Zn + xn,rnbn-vb+ ... + Xnn).,..1 f=1

(24.47)

The final fragment has fewer than bn tenns, and is asymptotically negligible inthe sense that bnrnln --> 1.

Our method is to show the existence of (x > 0 for which an array lZnj) can beconstructed according to (24.46),such that 24.1(c) is satisfied, and such thatthe truncated sequence (Ql defined in (24.10)satisfies the other conditions of24.1. This will be sufficient to prove the theorem, since then E;212nf-T-r.->1according to 24.2(ii), and Zni and 2nfare equivalent sequences when 24.1(c)holds, by 24.2(iii).

Since lA%rl is a fmmixingale array of size-,1

according to M.8, we may apply16.14 to establish that the sequences

lmaxy<f:n(Y/>(/-1):n+1Xn/)2*2uj,n e ENj (24.48)2 ib l At least this followsare unifonnly integrable, where vnf = Zf=yf-lllpa+lcnl. ,

directly in the case i = 1, and it is also clear that the result generalizes toany , for, although the starting point i - 1):n + 1 is increasing with n, thenth coordinate of the sequence in (24.48)can be embedded in a sequence with fixedstarting point which is uniformly integrable by 16.14. This holds uniformly in n

l lvl 1 S i S r ra k 1)and f, and it follows in particular that the array (Znj ni, ,,,

is uniformly integrable.This result leads to the following theorem.

24.16 Theorem Under assumptions M.6(a)-(d), (Znf) satisfies the Lindebergcondition.

1 2Proof For any i, vnj S bnMni -- 0 as n-->

cxl by (24.25)and hence, for : > 0,

E(1(1z,,j1>s)Z2,jj) F(1( tzmifvnit>gvniizlni)max f max2 2

Lsisrn dhfn lsifura vnf

--> 0 as n--)

(>a (24.49)by unifonn integrability. The conclusion,

Page 413: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 393

rn

f(1tlzxI>s)Zn$) fj=1

ftlt Iz.fI>i:)Z1) ra

max 77bnulnj2lsxra bnsl'ni ./.l

-- 0 as n-->

x, (24.50)

now follows since the sum of rn terms on the majorant side is 0(1) from (24.27)..

This theorem leads in turn to two further crucial results. First, by way of 23.16we know that condition 24.1(d) is satistied (and if this holds for Zni it iscertainly true for Zj);and second, note that for any e: > 0

rn

2 2 2max Zni f E + 1(Izwl>i:)Znj.

1/ i Krn f=1(24.51)

Taking expectations of both sides of this inequality, and then the limit as n-..-h

x, shpws that (24.11) holds, and therefore that %nis uniformly integrable (thatis, condition 24.1(a) holds) by 24.2(i). This leaves two results yet to be proven,Ehnj -- 1 (correspondingto 24.1(b) for the truncated array), and 24.1(c). Bythe latter result we shall establish pal'ts (ii) and (iii) of 24.2, and then inview of 24.2(iii) the proof will be complete.

We tackle 24.1(c) first. Consider

rn rn ibn 22Zni - 1 = Xnt - 1 = Au - Bn,

f=1 =1 r=(f-1)&+1(24.52)

say, wherern

An = 77znli- Eznlil)f=l

rn ibn ibn-t2 2

= (Xnt- Gntj + 2 (Xntxn,t.k.k- Gnt,t- ,

f=1 r=(f-l)>+1 k=l, t<ibn(24.53)

and B = #' + #'' where

ra ibn n-t

B' = 2 c kn nr, 1+ ,

f=1 f=t- 1)!u+1 kzzzibn-t-j(24.54)

n-1 n-t

B'n'= X c2nr+ 2 X cn,,,+1 + cznn. (24.55)t=rnbn-k &1

Here, Gnts = Exntxns), and recall that ESnl4 = )=-l(c2r+ zlzg-vjototy)+ c2su=&

1. It may be helpful to visualize these expressions as made up of elements fromouter product and covariance matrices divided into n blocks of dimension bnx bn,with a border corresponding to the final n - rnbn terms, if any; see Fig. 24.1.

Page 414: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

394 Ihe Central Limit Theorem

The terms in An correspond to the rn diagonal blocks, and Bn'and B'n'contain theremaining covariances, those from the off-diagonal and final blocks.

1 2 3 . . . rn

l //z (!)bn

OZZ2 /?,, z' yyyFz3! /,,,'''z. Bn :

: l : : 2 : )

( l l ! I I I

r.

g y

'-jy-'t

Fig. 24.1

An is stochastic and so must be shown to converge in probability, whereas thecomponents of Bn are nonstochastic. The nonstochastic part of the problem is themore straightforward, so we start with this.

24.17 Theorem Under assumptions 24.6(a)Vd),Bn--> 0.

Proof Since r > 2, r/tr - 1) < 2 and conditions 24.6(b) and (c) imply by 17.7 that

InntsI f cngyxljf-.sl, (24.56)where ((,u) is a constant sequence of size

-1.

Hence,rn ra+ l bn bn

lBn'I f 277 777777 n.i-jlbn-vj,q-tlbn-vk

=1fw+1./=1

1=l

rn rn+ 1 bn bn

<< sinisl'nl j j t-ijbn-vk-jI .

=1l=f+l j=$ al(24.57)

To detennine the order of magnitude of the majorant expression, verify that the1-

-1-

j j = j + j ...yyk

+ ) j and sometenns in the parentheses are Obn l - ) ) or ,

5 > 0. Changing the order of summation and putting k = l - i allows us to write

rn rn+ 1-.k

z 1--1-

lBnI << bn MniMn,i-vkk .

al=1

(24.58)

But for every k in this sum, the Cauchy-schwartz inequality and (24.27)give

ra+ 1-.k

rn+ 1

77MniMn,i..kK XMS= &(:-n'), (24.59);r l izz1

Page 415: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 395

so that (24.58)implies, as required, that

' 1 = Ob-h = O(rl-Y1-*I#n n ). (24.60)

To complete the proof we can also show, by a similar kind of argumentbut applying(24.25), that

'' OMl b ) = /(1). . (24.61)Bn = a,ra+1 a

2 2To solve the stochastic part of the problem, decompose the terms Zni - Ezni) in(24.53) into individual mixingale components, each indexed by i = 1,...,rn. For apair of positive integers j and k, 1et

1PnjU,k)= Xn,(-Ipa+./X,,,(f-I)!u+.j+k- cn,(-1)u+./,(f-1pn+./+t. (24.62)

It is an immediate consequence of 24.9, specifically of (24.32)and (24.33),thatfor fixed j and k the triangular array

( Yij,kl, FafZx; 1 S i f rn, n k N(j.kq1, (24.63)where NU,k)= minln'. rn 1, bn k j + k) , is an Al-mixingale of size

-1,

withmixing coefficients tftl= (0and vp = (?-l)u+jfor p k 1, and constants

anij,k)= cn, (sl)?u+./c,,, i-tlbn-qn. (24.64)2 h tnixingales,Substituting from (24.62)into (24.53),we have the sum of bn suc

bn - 1 bn-.j

Zl - Ezlni) = X JLj(#,0)+ 277 Wnijnkj + Wuj(:u,0).nij=1 k=1

(24.65)

Although this definition entails considering k of order bn as n .-A x, note thatthe inter-block dependence of the summands does not dependon k. The designationof tmixingale' is convenient here, but it need not be taken more literally thanthe inequalities (24.32)and (24.33)require. The crucial conclusion is that aweak 1aw of large numbers can be applied to these terms.

24.18 Theorem Under assumptions 24.6(a)-(d), Au -E(-> 0. (:1

The object here is to show that the array

((Xj- EZlni)), Fjnx', 1 f i K rn, rn 2 1) (24.66)is an f-j-mixingale with constants ttzajlwhich satisfy conditions (a),(b),and (c)of 19.11. The proof could be simplified in the present case by using the fact that

5ib ble (by24.6(a)) so that the minorant side of (24.68)belowZni is n, Cx-measura

actually disappears identically for p k 0. But since the result could find otherapplications, we establish the mixingale property formally, without appealing tothis assumption.

2 d 1 ing 24-9 term by term, we obtainProof By multiplying out Zni an app y

E )Ezlni - Ftzazjlj@-(.f-*bn).

j

Page 416: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

396 The Central Limit Theorem

bn- 1

f 77 E jElWnii,ot l'-tx-*bn)

Ijzc1

bn-j

+ lTE jEwniqvk) ITf-f-*bn)

jk=1

+ E lEWnibn'ot lF.-f''pbbnt

jbn- 1

rxMnli y)(1+ lbn-./))j(p-jpu+j

+ jpx ,

j=3(24.67)

and similarly,

E lZnli- Eznli lFxl'+*bnj jbn - 1 bn

-.j

t<Mnli 77 )(;,+1)s-y+2F7((p+1)u-j-k+ Lgn, (24.68)j=1 k=1

-1-8 f 45 > 0 Write formally, tznjv; to denote the larger of thewhere (j = 0j ) or . ,

two majorant expressions in (24.67)and (24.68),such that vp*--> 0 and ani isfixed by setting v; = 1. Evaluating (24.68)at p = 1 and (24.68)at p = 0 respec-tively gives

bn - 1

E 1E(Z?lj- Ezln jTxf-1)9n)

j<<Mytj y'q(1 + zlbt-y))./-1

-8

+ y-u1-8j=1

lb<<sl'nin,

and also, putting j' = bn-j

and k' = j' - kbn- 1 j'- 1

2 - F(Z2 j5 i bn) <<M1-') .4-1-8

+ 27)/c'-1- + 1E Zni ni -- ni

.j'=l k'=.*

1<<sfnibn.

2b for some tinite constant B. SinceHence, ani = BMni n

rn rnXM4f f max MnliTMnli = olbn-lj (24.71)n

f=1 lKK rn+ 1 f=1

in view of (24.27)and (24.25),these constants satisfy conditions 19.11(b) and2Ib Ml < z2/v2 where Z2f/w2jis unifonnly integrable, they(c). And since Zni n ni ni ni n

f' 0 nd the proof isalso satisfy condition 19.11(a). It follows that An---1..+

, acomplete..

(24.69)

(24.70)

This brings us to the final step in the argument, establishing the asymptoticm.d. property of the Bernstein blocks.

24.19 Theorem Under 24.6(a)-(d), limn-x/ra) = 1..

Proof Applying (24.17),

Page 417: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

CLTS for Dependent Processes 397

rn rn

'-r = 1-I(1 + ,z- ) = l + i T- z-rn n i i - l n i ,

= 1 i=1(24.72)

- I7f-l(1+ ihk ) is an si- lxn-measurabler.v. by 24.6(a), and hencewhereFf-1 =

sl nj n,--

rn

e'twrn)= 1 + ily7Ftwf-lznj)f=1

rn

'/ E(k 15 tf- 1'bn )= 1 +i, E f-1 ni n,-x ) .

f=1(24.73)

By the Cauchy-schwartz inequality,

ra rn

E I 'i-tEkni I@nt,':1tbn)l / 11?-l lI2IIF(2nfI@,;t,f--1'bn)112,f=1 =1

(24.74)

where11h-1lI2is uniformly bounded by (24.15),which follows in turn by (24.11),so the result hinges on the rate of approach to zero of IlF(znfIsni---bbn)112.Thiscannot be less than that for Zni, so consider, more conveniently, the latter case.

ib l 1/2n

IIF(z,,flsy'ntf---''bn)112 = E 77 Exnt I1,$'---1'bn)1 '$

t-ni- 1)!u+1

il',t

f Yl E CEx I'rltf--''t

Xn))2

t--zi- 1)u+1

ibn- 1 ibn-t 1/2

2 :--) 7) IEExnt3 Tnt--l'bnjexnt-vk3 @nt'--1'n)) l (24.71)+

f=(f-1)!u+1 k=l

Applying 24.8,

E(E(Xnt3Fjf--l 'bnljl < c2uf((,-(j.j)s)2 (24.76)

and

E !EExnt IFntf,---1'bn4Exnt..kI@,$,f--1*n)) jIIF(-Yn,Isni---'bn,111 1I.E'(.kr+,,Isni---Xn)112A 1

S cn?(,-(f-1)&ncn,,+1(,-(f-l):n+z, (24.77)

h ( =O(j-1/2-B for g, > 0 for the 0 defined in 24.6(c). Hence,w ere j )

b bn- 1 bn 1/2n

IIA'(ZnfITntf--xlxnlllz<< Mni 7-7(J?+ 277 Lj77(k (24.78)A

/=1 /=1 P=7+1

i O(:1-2pl) Applying (24.26),where the sum of squares is 041) and the double sum s n .

assumption 24.6(d) implies

Page 418: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

398 The Central Limit Theorem

rn rnA'(/ Ek /1 sni-l *n44I- A7A'/',,tn(max(!?,,t'-2'z)'211)'Xl -1

n ,-x ,

f=1 =1

tl-g jy-l/z j; (g4yo)= Otmaxtha , n )) = ol . .

This ensures that 71F(h-1Z-- 0 as n-->

x, which is the desired result. .

Proof of 24.6 We have established that Erilznf= lbjnxnt-P-> N(0, 1). Thereremains the formality of extending the same conclusion to Sn, but this is easysince

bnX l = IX b +1 + ... + XnnI = Op(rn-1/2) (2480)lSn - E;t1 nt n'rn n , .

and Sn has the same limiting distribution by 22.18. The proof of 24.6 is thereforecomplete. *

It remains just to modify the proof for the martingale difference case, as waspromised above.

Proof of 24.14 It is easily seen that 24.16 holds for rn = n and bn = 1. ln 24.17,Bn = 0 identically since (k= 0 for k > 0 in (24.57).In 24.18 one may put Zni =

Xnt, and the conditions for 19.10 follow directly from the assumptions. Lastly,24.19 holds since the sum in (24.79)vanishes identically under the martingaledifference assumption. The proof is completed just as for 24.6. .

Page 419: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

25Some Extensions

The CLT with Estimated NormalizationThe results of the last two chapters, applied to the case Xnt = Xtlsn where EXt)

= 0 and.2n

= E(X3Xt)l, would not be particularly useful if it were necessary to2 f k 1 in order to apply them.know the sequences (cf ), and (cf,t-k) or ,

Obviously, the relevant normalizing constants must be estimated in practice.2Consider the independent case initially, and let Sn = Xlcjxtlsnwhere sn =

2 Also 1et l= E)=I.Yl, and we may writeE:-1c,. n

- 1 n

sn = - xt = dnsn,Sn >1

(25.1)

here dn = snln. If dn -T-+ 1, we could appeal to 22.14 to show that-%

W n

#(0,1) whenever Sn -P-> N(0,1). The interesting question is whether the minimalconditions sufficient for the CLT are also sufficient for the relevant convergencein probability.

If the sequence is stationary s well as independent, existence of the variance2 is sufficient for both the CLT (23.3)and for rl-1E')=1Xl..C..A c2 (applying23.5c

2 I the heterogeneous case, we do not have a weak law of large numbers forto Xt4. n2 b d solely on the Lindeberg condition. However, the various sufticientlXf l ase

conditions for the Lindeberg condition given in Chapter 23, based on unifonnintegrability, are sufficient for a WLLN. Without loss of generality, take thecase of possibly trending variances.

25.1 Theorem lf (XfJis an independent sequence satisfying the conditions of23.18, theh

(25.2)

2 - c2)/c2 By assumption this has zero mean, isProof Consider the sequence (Xt t t.independent (andhence an m.d.), and unifonnly integrable. The conditions of 19.8,

2 2 ga 5y)with p = 1, bt = ct, and an = sn, are satisfied since, by ( . ,

N

2S nM2 = O(sl)Xcr n n ,

>1(25.3)

where Mn = maxlxfxnf ctj . Hence

Page 420: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

400 The Central Limit Theorem

n Xl - c2) En IF),-1 , , ,- ,

E = E - 1 -- 0,2 2

sn sn(25.4)

which is sufticient for convergence in probability. .

When the sequence (.V)is a martingale difference, spplementary conditions are2 b WLLN but these turn out to be the same as are neededneeded for (X, ) to o ey a ,

for the mmingale CLTS of 24.3 and 24.4. In fact, condition 24.3(a) correspondsprecisely to the required result. We have, immediately, the following theorem.

25.2 Theorem Let (m,@t)be a m.d., and let the conditions of 24.3 or 24.4 besatisfied; then (25.2)holds. I:l

Although we have spoken of estimating the variance, there is clearly no neces-2 h art from fd ), to converge. Example 24.11sity for (snlnj,or any ot er sequence ap n

is a case in point. ln those globally covariance stationary cases (see j13.2)here snn converges to a finite positive constant, say -c2,the Eaverage variance'W

f the sequence, we conventionally refer tonlln

as a consistent estimator of d2.oBut more generally, the same terminology can always be applied to l with respectn

2 i the sense of (25.2).to sn, nAlternative variance estimators can sometimes be defined which exploit the

particular structure of the random sequence. ln regression analysis, we typicallyapply the CLT to sequences of the form Xt = Wtut where (Ut,5tIis assumed to be am.d. with fixed variance c2 (the disturbance), and where Bs (a regressor) is5

-l-measurable.

In this case, (A),Tr) is a m.d. with variances czf = c2F(Wl),t

hichsuggests the estimator :,,2= (rI-1E)jr2)z).j<2j, for 5k2. This is the usualW = tapproach in regression analysis, but of course the mthod is not robust to the

l re of the tixed-varianceassumption. By contrast, l= Z)=1<lJ2 possessesfai u n t

the property cited in (25.2) under the stated conditions, regardless of thedistributions of the sequence. The latter type of estimator is termed hetero-

26scedasticity-consistent.Now consider the case of general dependence. The complicating factor here is

hat 5.2 contains covariances as well as variances, and l is no longer a suitablet n n2 i lude terms of the fonn Xtxj for qjIkestimator. A sample analogue of sn must nc

1 as well as foj = 0, but the problem is to know how many of these to include.If we include all of them, in other words for j = 1 - J,...,n - t, the resultingsum is equal to (IJ=1m)2,and the ratio of this quantity to

Jn2 is converging, notl 1) For consistent estimation wein probability to 1, but in distribution to k ( .

must make use of the knowledge that a11but a finite number of the covariances arearbitrarily close to 0, and omit the corresponding sample products from the sum.

Similarly to the m.d. case, the conditions of 24.6 contain the required

convergence result. Consider (24.46)and (24.47)where Xnt = Xtlsn, but now write

ibnZr:j= snzni = F7 Xt, i k= 1,...,r,,. (25.5)

>(f-1)!u+1

Page 421: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Some Extensions 401

In the proof of the CLT the construction of the Bernstein blocks was purelyconceptual, but we might considerthe option of actually computing them. The sumf squares of the unweighted blocks,

.2n1

= Ehjzipj, is consistent for.%2

in theosense that

-2 rJn1 'N

2 pr 1= Zni ---+

,2Sn j=1

(25.6)

according to 24.17 and 24.18. An important rider to this proposal is that 24.17and 24.18 were proved using only (24.25)and (24.27),so that, as noted previ-ously, any e (0,1) will serve to construct the blocks. In the context of theconditions of 24.12 at least, the only constraint imposed by 24.6(d) is repre-sented by (24.39),which puts a possible lower bound on (x in the case of decreas-ing variances ('y> 0), but no upper bound strictly less than 1. It is sufficientfor consistency if bn goes to infinity at a positive rate, and we are not bound touse the that satisfies the conditions of the CLT to construct the blocks in(25.6).

But although consistent, J1jis not the obvious choice of estimator. It would bemore naturaltofollow authors suchasNeweyandWesttlg87) andGallantandWhite(1988), inter alia, who consider estimators based on all the cross-products Xtxt-kfor t = k + 1,...,n and k = 0,...,:,,. In terms of the array represention of Fig.24.1, these are the elements in the diagonal band of width lbn, rather than thediagonal blocks only. (In this context, bn is referred to as the band width.) Thesimplest such estimator is

n bn n

-2 = Xx2+ zy-l y-qxtxt-k.Snl t>1 &1 r=kml

(25.7)

25 3 Theorem Under the conditions of 24.6, applied to Xtlsn, nhlsnl -CJ-y 1.*

2 d teS the same sum constructedPrtmf Let Xnt = Xlsn in (24.53),so that sJn enofrom the Xt in place Of the Xnt. The difference between An and lz

- Elnhlljsnl

is the quantity

ra - 1 ibn14* = - 2n 2

Sn f=1 /=(/-1),n+1

ibn-t- 1

xtxt-k- okt-t=l

n bn- 1

77 (x2,- c2,) + 277 xtxt-k-c,,,-,)t-nrnbn-j k=1

(25.8)

The components of this sum correspond to the rn - 1 triangles' which separate thediagonal blocks in Fig. 24.1, each containing qbnbn- 1) tenns, plus the termsfrom the lower-right corner blocks. Reasoning closely analogous to the proof of

24 18 shows that An'-iL+ 0. The sums of the corresponding covariancesconvergeabsolutely to 0 by 24.17, since they are components of Bn' in (24.54),and itfolloyvs that

Page 422: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

402 The Central Limit Theorem

I l-

l jn2 nl-EL> 0.2

Sn

The theorem therefore follows from 24.18. .

Since this estimator uses sample data which are discarded in 21, there are( n

informal reasons for preferring it in small samples. But there is a problem inthat (thechosen notation notwithstanding) 9,,22is not a square, and nOt alwaysnon-negative except in the limit. This difficulty can be overcome by insertingtixed weights in the sum. as in

bn nn2a= XV + 2/7 wa: X Xtxt-k.tzz1 k,z1 t=k+ 1

(25.9)

Suppose wsk.--: 1 as n

-->

x for every k S K, for every fixed K < x. Then

l l-

l j bn nn2 n3 2

='-' X(1- wnk)X Xtxt-k -PI-> r(A3, (25.10)2Sn 'n k=1 r=k+l

wherebn n1

r(A') = 2 plim -j X (1 - wnk) X Xtxt-k .

n-yx Sn t=#+1 /=k+1(25.11)

Since r(A') can be made as small as desired by taking K large enough, in view ofl i istent when the weights have this property. lt remains25.3 and 24.18, na s cons

to see if they can also be chosen in such a way that (25.9)is a sum of squares.Following Gallant and White (1988),choose :n+ 1 real numbers Jn1,...,tzn,u+1,

isfying Zyktlty = 1, and consider the n + bn variablessatF,,l = tzl-Y,1'n2= J,,lX2 + Jn2X1,

Fnb +1 = anjxbn..t + ... + an,bn-vjxj,'N

Fnbnn = anjxbnn + ... + ann-vtxb

l',,n= antxn + ... + an,yn-vjxn-bn,Fnn+1 = anlxn + ... + anbnxn-bn-j,

F,l,,+,n = anbnxn.

Observe thatn+&a ha+1 n :n+1 n

X F2 = X anlj V + 2 X ayanj-j mAgjnt

/=1 j=1 >1 j=2 >2

:a+l n n

+ 2 77anjansi-z 77.Y,1,-2 + ... + zJa,u+llnl Xtx-bn, (25.12)j-a ,.-:! ,-0-+1

Page 423: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Some Extensions 403

hich shows that any weights of the form wn1 = Z/+z+ljanianj-k, k = 0,...,&W

impose non-negativity, and also give wuo = 1. A case that fulfils the consistency-1/2 lJ j which yields wuk = 1 - kllbn + 1).requirement is anj = bn + 1) , a ,

Variance estimators having the general formn l

'X77wt lt - JIlbnbxtxs2...=1

.=l

areknown as kernel estimators. The function w@) = 1 - Ixl for lxl < 1, 0 other-

ise,which corresponds to 923,is the Bartlett kernel. The estimator 2z, byW n

contrast, uses the truncated kernel, w@) = 1(Ix!u1). Other possibilities exist;see Andrews (1991)for details.

One other point to note is that much of the literature on covariance matrixestimation relies on fu-convergence results for the sums and products, andaccordingly requires fv-boundedness of the variables for r k 4. The presentresults hold under the milder moment conditions sufficient for the CLT, by using aweak law of large numbers based on 19.11. See Hansen (1992b)for a comparableapproach to the problem.

25.2 The CLT with Random NormingHere is a problem not unconnected with the last one. We will discuss it in thecontext of the m.d. case for simplicity, but it clearly exists more generally.

Consider a m.d. sequence (XtJwhich instead of (25.2)has the property

n 2Ztxlxt z.L> n ,2Sn

(25.13)

2i dom variable. This might arise in the following manner. Let Xt =whereq s a ranBs&rwhere ff/f,lifltx'-xisam.d. and fW'f

lO-xasequenceof

r.v.s which aremeasurablewith respect to the rmote c-field S = O=,=-x;x.The implication is that W'fis<strongly exogenous' (see Engle et al. 1983). with respect to the generationmechanism of &t. Then

Ext l;r-1) = WtEut I;r-1) = 0 a.s., (25.14)

sinceR c 9,-1 for every t, hence Xt is a m.d. Provided the J'I are distributed in2 2 2 .-tr-.y

1 the analysis can proceed exactly as in j24.2.sucha way that Z:=1W'fUtlsn ,

There is no practical need to draw a distinction between nonstochastic Gf andS-measurable JK.But this need not be the case, as the following example shows.

25.4 Example Anticipating some distributional results from Part VI, 1et W'f =

t J/ here f&',Jis a stationary m.d. sequence with F(72) = s2 for a11s. AlsoZ,=1 x w x2 2 2 2 2

assume, for simplicity, that Eut) = c , al1 /. Then, A'(W',) = /T and sn =

1 1)'c2 If we further assume that (J'',) satisties the conditions of 24-3, itgnn + .

will be shown below (see27.14) that

Page 424: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

404 The Central Limit Theorem

1 n 1

XWQf -G j Blrlldr,2 2T n f= 1 0

(25.15)

where #(r) is a Brownian motion process. Under the distribution conditional uponR, we may treat the sequence (F) ) as nonstochastic and apply the weak LLN (byanargument paralleling 25.2) to say that

2 ) WlUl :7 +212:-l7r, -1t , -1

t 2plim = 2 plim = 2 lim = n , (25.16)2 Ilglnln + 1) u-yx zlnla-+x Sn n.->oo

2 l denotes the limit indicated. Under the joint distribution of (UtIwheren mere yl 2 d iu (J,5 1j). oand (JK),z'q is a drawing from the lirniting distribution specifie .

The application of the CLT to (XfJcan proceed under the conditional distribution2(defined according to 10.30), replacing each expectation by E. lS). Let Gt =

Eult IS), defining an S-measurable random sequence, so thatn 11

E XX2 R = X+2c2t t t.

>1 >1

(25.17)

WC C2X then aPP1y a result fOr heterogeneously distributed sequences, such as24 4 ltting ct = W(>. Assuming that E(Xl j5t- 1) = Fl(U,and that condition

@'

1

(24.19) is almost surely satisfied, the conditional CLT takes the fonn

ik n 2

1imE exp Txt S =e-X /2

a s.X 2 2 1/2 ' *

(ZJ-1W-Jc,),-1

(25.18)

But if we normalize by the J/nconditional variance, the situation is quite2 5 ) :y: EXl) so the condi-different. Bt must be treated as stochastic and Ext I ?-1 f ,

tions of 24.4 are violated. However, if (25.13)holds with n an S-measurable r.v.,then according to 22.14(ii) the conditional distribution has the property

1 n

->7x, R -P-+ x(O, n2),a.s.>1

(25.19)

(see j10.6 for the relevant theory). This result can also be expressed in the fonnl

ih n 2 ulimE exp-77x,

R =e-R n a.s.Sn

n-px >1

Hence, the limiting unconditional distribution is given by

(25.20)

ih n 2 halimE exp -Ex, = Ee-k n ).Sn>1

(25.21)

This is a novel central limit result, because we have established that kl=txtlsnis not asymptotically Gaussian. The right-hand side of (25.21)is the ch.f. of amixed Gaussian distribution. One may visualize this distribution by noting that

Page 425: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

405

2fr the appropri-drawings can be generated in the following way. First, draw n omate distribution, for example the functional of Brownian motion defined in(25.15)., then draw a standard Gaussian variate and multiply it by n.If X is mixedGaussian with respect to a marginal c.d.f. G(n) (say),and (n is the Gaussian

2 h ts of the distribution are easilydensity with mean 0 and variance n , t e momen3 () t'intjcomputed, and as well as F(Ar) = EX ) = we

2) = jj*xzjqtxllxt/Gtql =jXnQG(q)

= F(q2).EX. p xx 0

However, the kurtosis is non-Gaussian, for

Some Extensions

(25.22)

-- 1 - s(q4)Ef) = j f x4tktxldxgstnl - j.jtlnoctn) -

a, (25.23)

0 -x

where the right-hand side is in general different from A'(q2)2 (see9.7).

25.3 The Multivariate CLT

An array (Akf) of p-vectors of random variables is said to satisfy the CLT if thejoint distribution of S,i = L1=3Xntconverges weakly to the multivariate Gaussian.ln its multivariate version, the central limit theorem contributes a new andpowerful approximation result. Given a vector of stochastic processes exhibitingarbitrary (contemporaneousldependenceamongthemselves, wecan show thatthereexist different linear combinations of the processes which are asymptoticallyindependent of one another (uncorrelatedGaussian variables being of courseindependent). This is a fundamental result in the general theory of asymptoticinference in econometrics.

The main step in the solution to the multivariate problem sometimes goes by thename of the dcramr-Wold device' .

k25.5 Cramr-Wold theorem A vector random sequnce (&JT,Sn e R , con-verges in distribution to a random vector S if and only if 'Sn -P-> 'S for everytixed k-vector # 0.

Proof For given the characteristic function of the scalar 'Sn is Eexplih'sn ))= #s,a(). By the Lvy continuity theorem (22.17), 'Sn .-P-> 'S if and only if9u,(x(,)

--> a(D and a is continuous at , = 0. Since is arbitrary, we can put t= tx, and obtain

Eexvtit'sn 1) --) Ftexpf it's) = v(/), (25.24)k B (1139), the(say) where by assumption the convergence is pointwise on R . y .

left-hand side of (25.24)is the ch.f. of Sn, and the right-hand side is the ch.f .

of S. The continuity of v at the origin is ensured by the continuity of ju at 0for a11 , and it follows that Sn -P-y S. w

Now 1et (41 be a sequence of random vectors, and let En be the variance matrixof 17=1,X).Being symmetric and positive iemidefinite by construction, this matrixossesses the factorization

'

P

Page 426: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

406

(25.25)1/2c d A being respectively the eigenvector matrix (satisfyingwhereLn = Cnhn , n an n

cncn'= Cn'Cn= Ip) and the diagonal, non-negative matrix of eigenvalues.-X here if E has full rank then L- = h-*C' so that Z-E Z-' =Let Xnt = Ln ,, w n n n n n n n

&. However, I:a need not have full rank for every / n. If it is singular with

An= g1n (,0j,

let Z-' =(f%1A-In1/2' 01 Where Cnj is the appropriate submatrix of Cn. In thisl @

case, L-nXnL-n'has either ones or zeros on the diagonal. We do however require Ento be asymptotically of f'ull rank, in the sense that LxJu-n' --> Iv. lf Sn =

' 1 have E's 41---y 1 If thisXzxlA%f,then for any p-vector with =, we n .

'S 41-- 0 the asymptoticcondition fails, and there exists # 0 such that E( n ,

distribution of Sn is said to be singular. In this case, some elements of thelimiting vector are linear combinations of the remainder. Their distribution istherefore determined, and nothing is lost by dropping these variables from theanalysis.

To obtain the multivriate CLT it is necessary to show that the scalar sequencesf 'Xnt satisfy the ordinary scalar CLT, for any . If sufficient conditions holdfor 'Sn -P-) Nl, 1), the Cramr-Wold theorem allows us to say that Sn -P- S, andit remains to determine the distribution of S. For any , the ch.f. of 'S is

:2/2I() = fexpf fW)) = e- . (25.26)

E = Cnhncn' = Lnl,N

The Central Limit Theorem

But letting t = , be a vector of length , it follows from (11.41) that (25.26)is the ch.f. bf a standard multi-Gaussian vector. (Recall that '

= 1.) By theinversion theorem we get the required result that S - N(0, Ip. We have thereforeproved the following theorem.

25.6 Theorem Let (.Yf) be a stochastic sequence of p-vectors and let En =

F((Z:=lAQ(17=laV)').lf L-nxnL-n' --> Ip and k1=3'L-nXt -P-> N(0,1) for everysatisfying '

= 1, thenn

Xz,-n.Y, -P-> x(0,&).n>1

(25.27)

ln this result the elements of En need not have the same orders of magnitude in n.The variances can be tending to intinity for some elements of Xt, and to zero forothers, within the bounds set by the Lindeberg condition. However, in the case

15f 6when all of the elements of En have the same order of magnitude, say n or some- fi ite constant matrix, it is easy to manipulate> 0 such that n En -- E, a n

(25.27) into the formn

-&2Nxy..P..yxtj)jx;.f=1

(25.28)

Techniques for estimating the nonnalization factors generalize naturally from

Page 427: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Some Extensions 4O7

the scalar case discussed in j25.1, just like the CLT itself. Consider the m.d.

case in which En = 1=jE(XtX't4.and assume this matrix has rankp asymptoticallyin the sense defined above. Under the assumptions of 25.2,

nt-j'xt tz'()(J-lA)-Y;)(z= ,

-T-r- 1a'E a El &(25.29)

for any with 'a= 1, where the ratio is always well defined on taking n lyrge

enough, by assumption. This suggests that the positive semidefinite matrix E,i =

ElalffAr; is the natural estimator for Xn.To be moreprecise: (25.29)says that #(I 'Xnl'xna - 1I)can be made as small

as desired for arbitrar.y # 0 by taking n large enough, since the normalization

to unit length cancels in the ratio. This is therefore true in the Iarticular caseG* = L-' and since tu*'En) = 1, we are able to conclude that 'L-X L-' -E'.%1.We can further deduce from this fact that L-nxnL-n'--CC->I . To show this, note that:if a matrix B (#xp4 is nonsingular, and g4 = 'Bl a = 1 for every # 0, ghas the gradient vector g'j = lBal' - lal', for any , and the system ofequations Bla' - I = 0 has the unique solution B = Ip. If Ln is the factor-ization of En, since Ln is asymptotically of rank p it follows by 18.10(ii) thatLnL-n -.C.L.:Iv, and we arrive at the desired conclusion, for comparison with(25.27):

i--nxxt-P-> x(0,z?,).>1

(25.30)

The extension to general dependence is a matter of estimating En by a general-ization of the consistent methods discussed in j25.1, either En1 = X)1Zn*jZn*;

* Xib X letting weights (wnll represent the Bartlettwhere Zni = p.f-l)u+l t. or,kernel,

a nnk = 77x,x,,+ y7wn177 xtx't-k..vxt-kx'.n3

>1 k=1 r=k+1(25.31)

The latter matrix is assuredly positive defnite, since a,'En3 k 0 with arbitraryby application of (25.12)with Xt = Xtq.

25.4 Error Estimation

There are a number of celebrated theorems in the classical probability literature

on the rates at which the deviations of distributions from their weak limits, andalso stochastic sequences from their almost sure limits, decline with n. Most arefor the independent case, and the extensions to general dependent sequences arenot much researched to date. These results will not be treated in detail, but itis useful to know of their existence. For details and proofs the reader isreferred to texts such as Chung (1974)and Love (1977).

If (F,:) is a sequence of c.d.f.s and Fn = * (theGaussian c.d.f.), the Berry-Essen theorem sets limits on the largest deviation of Fn from *. The setting for

Page 428: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

408

thisresult is the integer-moment case of the Liapunov CLT; see (23.37).

25.7Berry-Essen theorem Let (m) be a zero-mean, independent, f-a-bounded

domsequence, with variances fc2,), 1et Jn2= )7.1c2f,and let Fn be the c.d.f. ofran

sn = l=jxtlsn.There exists a constant c > 0 such that, for all n,

The Central Limit Theorem

n

IF @)- *@) l S C XE I.Y,I3Isn3.'nSUP n2 >1

(25.32)

The measure of distance between functions Fn andm appearing on the left-hand sideof (25.32)is the uniform metric (seej5.5). As was noted in j23.1, convergence tothe Gaussian limit can be very rapid, with favourable choice of Fn. The Berry-Essen bounds represent the worst case' scenario, the slowest rate of uniformconvergence of the c.d.f.s to be expected over all sampling distributions having

2 oln)third absolute moments. For the uniformly fo-bounded case in which sn =,

1/2inequality (25.32)establishes convergence at the rate n .

Another famous set of results on rates of convergence goes under the name of thelttw of the iterated logarithm (L1L). These results yield error bounds for thestrong law of large numbers although they tell us something important about therate of weak convergence as well. The best known is the following.

25.8 Hartman-Wintner theorem If (X,) is i.i.d. with mean g,and variance c2,then

n 1.Y,- jzl =

limsup = 1 a.s. I::Il/2c(2n log 1ogn4n-/oo

(25.33)

Notice the extraordinary delicacy of this result, being an equality. lt is equiva-lent to the condition that, for any E > 0, both

Ln lA-,- g,t=

P k 1 + :, i.o. = 0,1/2c(2n log log n4

Z:-1.X',- g,# k1/2c(2n 1oglog nj

and

(25.34)

(25.35)

In words, infinitely many of these sequence coordinates will come arbitrarilyctose to 1, but no more than a finite number will exceed it, almost surely. Bysyinmetl'y, there is a similar a.s. bound of

-1

on the liminf of the sequence.der these assumptions, n-1/2) jtaY,- g,l/c -P-y No, 1) according to theUn =

Lindeberg-l-vy theorem, and so is asymptotically supportedon the whole real line.-1 h law of large numbers. It isOn the other hand, n 7=1(Xf- g,l/c -->

0 a.s. by t e1/2 j the uknife-clear there is a function of n, lying between 1 and n , represent ng

edge' between the degenerate and non-degnerate asymptotic distributions, beingthe smallest scaling factor which frustrates a.s. convergence on being applied to

Page 429: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Some Extensions 409

the sequence of means. The Hartman-Wintner law tells us that the knife-edge is* l lOg log n)-1/2precisely n ( .

A feel for the precision involved can be grasped by trying some numbers: (2 log99 l?2 3 3! A check with the tabulation of the standard Gaussian1og10 ) =

.

probabilities will show that 3.3 is far enough into the tail that the probabilityof exceeding it is arbitrarily close to zero. What the Ll'ta reveals is that for the

99scaled partial sums this probability is zero for some n not exceeding 10 ,

although not for yet larger n. Be careful to note how this is true even if the Xtvariables have the whole real line as their support.

For nonstationary sequences there is the following version of the LIL.

zs.g-l-heoremtchung1974:th.7.5.1)Let t.YflbeindependentandAa-boundedwith

iancesequence fc2,),and let = ).joJ.Then (25.33)holds if for 6r > 0,var3 O(u3/(log Jn)1+6)

uZ7-1A'IAl =n .

Generalizations to mmingale differences also exist; see Stout (1974)and Halland Heyde (1980)inter alia for further details.

Page 430: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 431: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

V1

THE F'UNCTIONAL

CENTRAL LPWIT THEOREM

Page 432: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)
Page 433: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

26Weak Convergence in Metric Spaces

26. 1 Probability Measures on a Metric SpaceIn any topological space S, for which open and closed subsets are defihed, theBorel field of S is detined as the smallest c-field containing the open sets (andhence also the closed sets) of S. In this chapter we are concerned with theproperties of measurable spaces (S,.$), where S is a metric space endowed with ametric d, and

./

will always be taken to be the Borel tield of S.lf a probability measure p, is defined on the elements of

./,

we obtain a proba-bility space ((S,#),,$,p,), and an element x e S is referred to as a random element.As in the theory of random variables, it is often convenient to specify an under-lying probability space (f,@,#) and 1et ((S,#),,/,/) be a derived space with the

-1 4)) for each A e .T, wherepropel'ty jt(A) = Px (x: f F-> (S,#)

is a measurable mapping. We shall often write S for (S,#) when the choice ofmetric is understood, but it is important to keep in mind that d matters in thistheory, because

,/

is not invariant to the choice of metric; unless (/1 and dz areequivalent metrics, the open sets of (S,J1) are not the same as those of (S,J2).

A property of measure spaces that is sometimes useful to assume is regularity(yet another usage of an overworked word, not to be confused with regularity ofsequences etc.): (S,,/,p) is called a regular measure space (org a regular measurewith respect to (S,,/)) if for each A e #?and each E > 0 there exists an opn setOE and a closed set Cv such that

Ce X I % (26.1)and

g(O: - Q) < E. (26.2)Happily, as the following theorem shows, this condition can be relied upon when Sis a metric space.

26.1 Theorem On a metric space ((S,#),,$), every measure is regular.

Proof Call a set A G.9'

regular if it satisfies (26.1)and (26.2).The first stepis to show that pny closed set is regular. Let A, = fx: #(A,x) < 1/n1, n =

1,2,3,... denote a family of open sets. (Think of A with aEhalo' of width 1/p.)

WhenA is closed we may write A = O''J=1An,andAn-1e

A as n-->

=. By continuity ofthe measure this means g.(z4u-A)

-->

0. For any : > 0 there therefore exists N suchthat jttAx -A) < . Choosing Oe = AN and Q = A shows that :4 is regular.

Page 434: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

414

Since S is both open and closed, it is clearly regular. If a set-4

is regular,

so is its complement, since 44 is closed, f'sf is open, O c Ac (7k and Cc - OcE E:

= Ov - Q. If we can show that the class of regular sets is also closed undercount-able unions, we will have shown that every Borel set is regular, which isthe required result. Let A1, Aa,... be regular sets, and define A = U';=1An.Fixinge > 0, let Onv and Cnv be open and closed sets respectively, satisfying

Cne i Za I Onc (26.3)and

n+1 (2,64)ptton: - CnE) < E/2 . ( .

The Functional Central Limit Theorem

Let Ov = Uxn-lone,which is open, and A c O. Also let Q =UDaxlck,

where thek k c where k is tinite islatter set is not necessarily closed, but Ce = Un=I

n:

k A d since c) '1-

c continuity of the measure implies that kclosed,and Cv c ; an :,

k 11 For such a k,can be chosen large enough that g(Q - Cs) < s .

k O - c ) + gc: - Ck)41(0:-C S p,( : e sX

S Xp,(One - Cn + p,(Q- c'ek)< e.a=1

(26.5)

It follows that A is regular, and this completes the proof. w

Often the theory of random variables has a straightforward generalization to thecase of random elements. Consider the properties of mappings, for example. If(S,#) and (T,p) are metric spaces with Borel fields

./

and 5, and fl S F--y 'T is afunction, there is a natural extension of 3.32(i), as follows.

26.2 Theoem If f is continuous, it is Borel-measurable.

Proof Direct froms 5.19 and 3.22, and the fact that./

and 5 contain the open setsof S and 1' respectively. w

Let ((S,#),y) and ((T,p),J') be two measurable spaces, and let : S F-> -I' define a-1A) e

./;

then each measure g,measurablemapping, such thatA e 5 implies that (-1 defued byon S has the property that g, ,

-1A) = g,(-1(A)) A e 5, (26.6)g,/l ( ,

is a measure on ((T,p),T). This is just an application of 3.21, which does not usetopological properties of the spaces and deals solely with the set mappingsinvolved.

However, tlte theory also presents some novel difticulties. A fundamental oneconcerns measurability. It is not always possible to assign probabilities to theBorel sts of a metric space

- not, at least, without violating the axiom ofchoice.

26.3 Example Consider the space (Dp,1j,#u), the case of 5.27 with a = 0 andb = 1. Recall that each of the random elements f: specified by (5.43)are at amlltllnl fliqtnnce of 1 f'rom one afiother. Hence, the spheres B(fn, r1)are a1l

Page 435: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

WWJ: Convergence in Metric Spaces

disjoint, and any union of them is an open set (5.4).This means that the Borelfield Om,1jon (Dm,1j,Jg) contains all of these sets. Suppose we attempt toconstruct a probability space on ((Dp,1J,Jg), Dm,1j)which assigns a uniformdistribution to the A, such that g,((A:a < 0 f &1) = b - a for 0 f a < b f 1.Superficially this appears to be a perfectly reasonable project. The problem isformally identical to that of constructing the unifonn distribution on g0,1j.Butthere is one crucial difference:here, sets of Jefunctions corresponding to everysubset of the interval are elements of Ogo,1J. We know that there are subsets of(0,11that are not Lebesgue-measurable unless the axiom of choice is violated; see3.17. Hence, there is no consistent way of constructing the probability space((Dg0,l1,J&),Og0,1),g,),Where p, assigns the unifonu measure to sets of helements. This is merely a simple case, but any other scheme for assigningprobabilities to these events would founder in a similar way. u

There is no reason why we should not assign probabilities consistently to smallerc-fields which exclude such odd cases, and in the case of (Dm,1j,Jg) the so-called projection c-/efl will serve this purpose (see j28. 1 below for details).The point is that with spaces like this we have to move beyond the familiarintuitions of the random variable case to avoid contradictions.

The space (Dg(),I1,Jg)is of course nonseparable, and nonseparability is thesource of the diftkulty encountered in the last example. The characteristic of aseparable metl'ic space which matters most in the present theory is the following.

26.4 Theorem In a separable metric space, there exists a countable collection V of

open spheres, such that c(F) is the Borel field.

Proof This is direct from 5.6, V being any collection of spheres Sx, where xranges over a countable dense subset of S and r over the positive rationals. .

The possible failure of the extension of a p.m. to (S,,/) is avoided when there isa countable set which functionsas a detenuining class for the space. Measurabil-ity difficulties on R were avoided in Chapter 3 by sticking to the Borel sets(which are generated from countable collections of intervals, you may recall) andthis dictum extends to other metlic spaces so long as they are separable.

Another situation where separability is a useful property is the construction ofproductspaces.ln j3.4 someaspects of measures onproductspaces were discussed,but we can now extend the theory in the light of the additional structurecontributed by the product topology. Let (S,,/) and (7,T) be a pair of measurabletopological spaces, with

,8'

and 5 the respective Borel fields. If S denotes the setof open rectangles of S x7, and

,/

(&5 = c(S), we have the following result.

26.5 Theorem lf S and 1' are separable spaces,,/

@5 is the Borel field of S x7with the product topology.

Proof Under the product topology, R is a base for the open sets (seej6.5). SinceS x'r is separable by 6.16, any open set of S x'r can be generated as a countableunion of S-sets. It follows that any c-field containing S also contains the opensets of S x7, nd in pmicular,

./

@5 contains the Borel field. Since the jets

Page 436: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

416

of S are open, it is also true that any c-field containing the open sets of S x'Talso contains R, and it follows likewise that the Borel field contains

,/

@5. .

If either S or 'T are nonseparable, the last result does not generally hold. Acounter-example is easily exhibited.

The Functional Central Limit Theorem

26.6 Example Consider the space (Dg(),1jx D(o,1j,pg), where pg is the max metricdetined by (6.13)with du for each of the component

'metrics.

Let E denote theunion of the open balls #((ab,ye), ) over 0 G (0,11,where xe and ye are func-tions of the form Je in (5.43).In this metric the sets #((ab,ye), ) are mutuallydisjoint rectangles, of which E is the uncountable union; if R denotes the openrectangles of (Dgo,1jx Dm,1j, pg), E e c(S), even though E is in the Borel fieldOf D(o,1JX Dg0,l!, being an open set. I:a

The importance of this last result is shown by the following case. Given a proba-bility space L1,5,P4, let x and y be random elements of derived probability spaces((S,#),./,g,x) and ((S,#),,/,g,y). Implicitly, the pair (x,y)can always be thought of

as a random element of a product space of which the g.xand gy are the marginal

measures. Since x and y are points in the same metric space, for given ) e f' adistance J@(),y()) is a well-defined non-negative real number. The question ofobvious interest is whether d is also a measurable function on (f1,F). This we cananswer as follows.

26.7 Theorem lf (S,#) is a separable space, #@,y) is a random variable.

Proof The inverse image of a rectangle A x B under the mapping

@,y): D F- S XS-1 4) and y-1(#). The mapping islies in T, being the intersection of the T-sets x (

therefore 51$ @)'-measurable by 3.22. But under separability, #?(&,9'

is the Borelfield of GXS according to 26.5. Hence (x,y)()) = (x((l)),y()))is a F/Borel-measurable random element of S xS. If the spaces XS is endowed with the producttopology, the function

d: S XS 9-> R+

is continuous by construction, and this mapping is also Borel-measurable. Thecomposite mapping

x,ylod: f :-+ R

is therefore Wf-measurable, and the theorem follows. w

26k2 Measures and ExpectationsAs well as taking care to avoid measurability problems, we must learn to do with-out various analytical tools which proved fundamental in the study of randomvariables, in particular the c.d.f. and ch.f. as representations of the distrib-ution. These handy constructions are available only for r.v.s. However, if Us isthe set of bounded, uniformly continuous real functions

.f:

S 9-> R, the expect-ations

Page 437: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Nctz/c Convergence in Metric Spaces

Efq = jsfdbk'f e W

417

(26.7)

are always well defined. (From now on, the domain of integration will beunderstood to be S unless otherwise specitied.)

The theory makes use of this family of expectations to fingerprint a distrib-ution uniquely, a device that works regardless of the nature of the underlyingspace. While there is no single all-purpose function that will do this job, likeeikx in the case X e R, the expectations in (26.7)play a role in this theoryanalogous to that of the ch.f. in the earlier theol'y.

As a preliminary, we give here a pair of lemmas which establish the uniquerepresention of a measure on (S,.T)in terms of expectations of real functions onS. The t'irstestablishes the uniqueness of the representation by integrals.

26.8 Lemma If g, and v are measures on ((S,t),,$) (,$the Borel tield), and

jfdp= jfdv,a11f e Us , (26.8)

then g, = v.Proof We show that Us contains an element for which (26.8)directly yields theconclusion. Let B e

,/

be closed, and detine Sn = (x:dx,B) < 1/rl) . Think of Bn asB with an open halo of width 1/n. Bn 1 B as n

-->

x, B and BnCare closed and mutu-

ally disjoint, and infxssj,yesltx,y) 1/n for each n. Let geLveG Us be aseparating function such that x,s@)= 0 for x e

BnC and 1 for x e B (see6.13).Then

1(#) fjgli,Bdb

=jgBL,Bd%'

= Jsulj,stfvf V(#n), (26.9)

where the last inequality is because gBL,px)f 1. Letting n ...- oo, we have p,(#) <v(A). But jt and v can be interchanged, so g,(#) = v(#). This holds for all closedsets, which form a determining class for the space, so the theorem follows. .

Since Us i Cs, the set of all continuous functions on S, this result remains trueif we substitute Cs for Us ; the point is that Us is the smallest class of generalfunctions for which it holds, by virtue of the fact that it contains the requiredseparating function for each closed set.

The second result, although intuitively very plausible, is considerably deeper.Given a p.m. g,on a space S, detine A(.f) = fdbfo f e &s. We know that A is afunctional on Us with the following properties:

fx) 0, al1 x e S A(.f) k 0 (26.10)flx) = 1, all x e S A(.f) = 1 (26.11)

AtJ + bht = JA() + :A(.f2), , h E U, a,b e R, (26.12)

where(26.11)holds since dbk = 1, and (26.12)is the linearity property ofintegrals.The followinglemma states that on compact spaces the imptication also

Page 438: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

418

runs the other way.

26.9 Lemma Let S be a compact metric space, and let A(/): Us --> R define a func-tional satisfying (26.10)Y26.12).There exists a unique p.m. g on (S,#) satisfy-ing j'fdp= A(J),each J e &s. E1

The Functional Central Limit Theorem

In other words, functionals A and measures g,are uniqpely paired. At a later stagewe use this result to establish the existence of a measure (the limit of asequence) by exhibiting the corresponding A functional. We shall not attempt togive a proof of this result here; see Parthasarathy (1967:ch. 2.5) for thedetails. Note that because S is compact, Us and C's coincide here; see 5.21.

26.3 sveakConvergenceConsider IM,the set of a11probability measures on ((S,t),#7). As a matter of fact,we can extend our results to cover the set of a11finite measures, and there are acouple of cases in the sequel where we shall want to apply the results of thischapter to measures g,where Jdbk# 1. However, the modifications required for theextension are trivial. lt is helpful in the proofs to have an agreed normaliz-ation, and dbt = 1 is as good as any, so 1et EMbe the p.m.s, while keeping thepossibility of generalization in mind.

Weakconvergence concerns the properties of sequences in IX,and itis mathelnati-11 convenient to approach this problem by treating EMas a topological space.ca y

The natural means of doing this is to define a collection of real-valued functions

on N, and adopt the weak topology that they induce. And in view of (26.7),anatural class to consider are the integrals of bounded, continuous real-valuedfunctions with respect to the elements of >.

For a point g, e M, define the base setsa I

J'k(k,,...,.J,s) = v: v e N, fidv- hdp < E, i = 1,...,k , (26.13)

where . e f/s for each i, and 6: > 0. By ranging over a1l the possible h,...,hand % for each k e N, (26.13)defines a collection of open neighbourhoods of g,.The base collection V(k,J1,...,/k,E), g. 6 M, defines the weak topology on M.

The idea is that two measures are close to one another when the expectations ofvarious elements of Us are close to one another. The more functions this appliesto, and the closer they are, the closer are the measures. Tlzis is not the conse-quence of some more fundamental notion of closeness, but is the defining propel'tyitself. This simple yet remarkable application illustrates the power of the topo-logical ideas developed in Chapter 6. The weak topology is the basic trick whichallows distributions on general metric spaces to be handled by a single theory.

Given a concept of closeness, we have immediately a companion concept of

convergence. A sequence of measures fgz, n e INl is said to converge in the weaktopology, or converge weakly, to a limit g, written gz = g,, if, for every neigh-bourhood Ft, 3 N such that g,ns :% for a11n N. lf xn is a random element from aprobability space (S,,/.g,n), and g,n= g,, we shall say that xn converges in distri-

-P-> ,$

Essen-lultion to x nnd write au x. where x is a random element from (S, ,g).

Page 439: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

WretzkConvergence in Metric Spaces 419

tially, the same caveats noted in j22.1 apply in the use of this terminology.The following theorem shows that there are several ways to characterize weak

convergence.

26.10 Theorem The following conditions are equivalent to one another:(a) g,n= g,.(b) Il'dbk-- VdbkfOr every f e &s.(c) limsupnpwto f g(C) for every closed set C e 'Y'.(d) liminfsgwtf) g,(#) for every open set B e

,f.

(e) limngwtA) = g(A) for every A e./

for which g,(:A) = 0. (:1

The equivalence of (a) and (b),and of (a) and (e),were proved for the case of

measures on the line as 22.8 and 22.1 respectively', in that case weak convergencewas identified with the convergence of the sequence of c.d.f.s, but this charac-terization has no counterpm here. A noteworthy consequence of the theorem is thefact that the sets (26.13) are not the only way to generate the topology of weak

convergence. The alternative corresponding to part (e)of the theorem, for exam-ple, is the system of neighbourhoods,

J'%'tk,A1,..

.,z4k,:)

= v: v e M, jv(Aj) - g,tAjlj < E, i = 1,...,1 ,

where Ai e,$,

i = 1,...,/c and gtAjl = 0.

(26.14)

Proof of 26.10 This theorem is proved by showing the circular set of implications,(a) = (b) = (c) = (c),(d)= (e) =:::> (.a).The first is by definition. To show that(b) = (c),we can use the device of 26.8., let B be any closed set in

,/,

and put Bm

= (.x:dx,B4 < 1/pzJ,so that B and BmCare closed and infxssjywesttmy) k lm.

Letting ge,e e Us be the separating function defined above (26.9),we haveo o

limsupp.(#) limsup jgevedbkn= jgBvedbk = Jsmpsjk,stjt< g,(#,,), (26.15)n-'x n--oo

where the trst equality is by (b). (c) now follows on letting m .-- oo.

(c) = (d)is immediate since every closed set is the complement of an open setrelative to S, and g,(S) = 1.

To show (c)and (d)= (e):for any A e,/, AO IA W,where AO is open and Wis

closed, and :A = X -AO. From (c),

limsupjtntAl / limsup gwtz4-)< p,(4-) = jt(4), (26.16)n->x N--)=

and from (d),liminf gutAl liminf gw(AO) k g(4O)

= jt(4), (26.17)n'-x N'->x

hence limrlg.tdl = g,(A).The one relatively tricky step is to show (e) = (a).Let f e Us, and definehat is easily verified to be) a measure, gJ,on the real line (R,O) by(w

Page 440: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

420 The Functional Central Limit Theorem

1B) = g,((x: fx) c #)), B e S. (26.18)kt

f is bounded, so there exists an interval (a,b)such that a < fx) < !7,all .x l S.Recall that a distribution on @,f) has at most a countable number of atoms. Also,

a finite interval can be divided into a finite collection of disjoint subintervals

of width not exceeding E, for any E > 0. Therefore it is possible to choose mpoints tj, with a = to < fl < ... < tm = b, such that tj - /j-1 < :, and /((/j))= 0, for each j. Use these to construct a simple nv.

gmx) = fy-l 1A/x),

jzc1(26.19)

where Aj = (a7:#-1S flx) f 4j),and note that supxl j'x) - gmx) I < e. Thus,

jfdpn- jdbtf Jlf - 8mIJptn + JIT- 8mlJ/ + hmd- J8mdbS 2: + 77If./-l j 1gwtz4yl- g,(A./I.

j=1(26.20)

Since g,I:Aj) = 0 by the choice of j, so that limngwt/jl = jttyjl, for each j by(e),

limsup jfdvn- jfdpS 2:. (26.21)n--co

Since e:can be chosen arbitrarily small, (a)follows and the proof is complete. .

A convergence-determining class for (S,#) is a class of sets ft.t ,/

which satisfythe following condition: if g,ntdl --y g,(A) for evel'y A G

ft.t with jt(:A) = 0, then gwnz::>g,.This notion may be helpful for establishing weak convergence in cases wherethe conditions of 26.10 are difficult to show directly. The following theorem isjust such an exhple.

26.11 Theorem If U is a class of sets which is closed under finite intersections,and such that evel'y open set is a finite or countable union of U-set, then t.lisconvergence-determining.

Proof We first show that the measures gwconverge for a finite union of U-setsA1,...,A,,. Applying the inclusion-exclusion formula (3.4),

m2W-1

m UA = X +ptn(c.),/=1 1=1

(26.22)

where the sets Ck consist of the Aj and a1l their mutual tntersectionsand henceare in tt whenever th Aj are, and <+' indicates that the sign of the tenu is givenin accordance with (3.4).By hypothesis, therefore,

tnUX? M g, UX7 . (26.23)/=1 j=1

Page 441: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

421

To extend this result to a countable union B = U';xlAj, note that continuity of p,impliesg,IUT=lAj)

'1-

g,(#) as m-->

cx), so for any s > 0 a finite m may be chosenlargeenough that g,(#) - gIU'?J =1A;) < E. Then

Wetzk Convergence in Metric Spaces

liminf gz(#) k liminf p,n UAy = g, UXy > yt(#) - E.

n--yx a--yx j=3 j=1(26.24)

Since s is arbitrary and (26.24)holds for any open B e ,Yby hypothesis on U.,condition (d) of 26.10 is satisfied. w

A convergence-determining class must also be a determining class for the space(see j3.2). But caution is necessal'y since the converse does not hold, as thefollowing counter-example given by Billingsley (1968)shows.

26.12 Example Consider the family of p.m.s fjt,y) on the half-open unit intervalg0,1) with gw assigning unit measure to the singleton set (1 - 3In) . That is,pwtt1 - 1/n1) = 1. Evidently, tg.) does not have a weak limit. The collection C ofhalf-open intervals La,b)for 0 < a < b < 1 generate the Borel tield of (0,1), and

so are a detennining class. But pz(gJ,:)) -..) 0 for every fixed a > 0 and b < 1,and the p.m. g, for which g,tf01) = 1 has the property that g,((J,:)) = 0 for a1l a >

0. It is therefore valid to write

pztAl -- g,(A), all A e C, (26.25)even though gw u,;zy g, in this case, so C is not convergence-determining. n

'rhe last topic we need to consider in this section is the preservation of weakconvergenceundermappings fromonemetric spacetoanother. Sincem = jtmeansJfJw --> Ifdbkfor any f e &s, it is clear, since foh e Us when h is continuous,that Jf((x))#w(x)

--.y J.f(/l@))Jg,(x). Writing y for hx), we have the result

J/'@)#w-'(.T) --> jiybdbth-'. (26.26)

So much is direct, and relatively trivial. But what we can also show, and is oftenmuch more useful, is that mappings that are

talmost' continuous have the sameproperty. This is the continuous mapping theorem proper, the desired generaliz-ation of 22.11.

26.13 Continuous mapping theorem Let hl S F-> T'be a measurable function, andlet D g S be the set of discontinuity points of h. If gz = g, and g,(D,) zt 0, then

-1-1

bnh = g, .

Proof Let C be a closed subset of F. Recalling that (A)- denotes the closure of A,

limsupw-lto = limsupgzt-ltc)) s limsupgwtt-ltol-)n--joo n--/oo n-/oo

<p,((-1(c))-) S p,(-'(cD llht

<p,(-'(c))+p,(o,) - g.(-1(c)) - gz-lto, (26.27)

Page 442: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

422 The Functional Central Limit Theorem

-1 - /j-1 kp p . j e. a closurenoting for the third inequality that ( (C)) i (C) , . ,

-1 is either in /1-1(C7 or is not a continuity point of . The secondpoint of (&') ,

inequality is by 26.10(c), and the conclusion follows similarly. w

26.4 Metrizing the Space of Measures

We can now outline the strategy for determining the wek limit of a sequence ofmeasures tp.,,lon (S,#). The problem falls into two parts. One of these is todetennine the limits of the sequences (gw(A) l for each A e C, where C is a deter-mining class for the space. This part of the programme is specific to the particu-1ar space under consideration. The other pal't, which is quite general, is toverify conditions under which the sequence of measures as a whole has a weaklimit. Without this reassurance, the convergence of measures of elements of C isnot generally sufficient to ensure that the extensions to

'91

also converge. lt isthis second aspect of the problem that we focus on here.

lt is sufticient if every sequence of measures on the space is shown to have acluster point. lf a subsequence converges to a limit, this must agree with theunique ordinal'y limit we have (byassumption) established for the determiningclass. Our goal is achieved by finding conditions under which the relevant topo-lgical space of measures is sequentially compact (ee j6.2). This is similar towhat Billingsley (1968)calls trelative'

compactness, and the required results canbe derived in his framework. However, we shall follow Prokhorov (1956)andParthasarathy (1967)in making l a metric space which will under appropriatecircumstances be compact. The following theorem shows that this project is feasi-ble; the basic idea is an application of the embedding theorem (6.20/6.22).26.14 Theorem (Parthasarathy 1967: th. H.6.2) If and only if (S,#) is separable,IXcan be metrized as a separable space and embedded in g0,11*.

Proof Assume (S,#) is separable. The tirst task is to show that Us is also separa-ble. According to 6.22, S can be metrized as a totally bounded space (S,#') whereJ' is equivalent to d. Let Vdenote the completion of S under d' (includingthelimits of all Cauchy sequences on S) and then Vis a compact space (5.12).Thespaceof continuous functions t is accordingly separable undertheunifonn metric(5.26(ii)).

Now, every continuous function on a compact set is also uniformly continuous(5.21), so that &s = C:. Moreover, the spaces C7<and l/s are isometric (seej5.5)and if the fonner is separable so is the latter.

Let Lgm,m e ENl be a dense subset of &s, and define the mapping F: M --> R= by

F(g) = (J'1#g,,gldbkn...j. (26.28)The object is to show that Fembeds EMin R=. Suppose F(g) = F(v), so that mdp =

Igmdvfor all m. Since ('s,) is dense in &s, f e Us implies that

Ijfdbk-/,wg,

Is JI/' - gmIJg, s Js(.f,>) < e (26.29)

Page 443: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gctz/c Convergence in Metric Spaces 423

for some m, and every e:> 0. (The second inequality is because dbk= 1, note.) Thesame inequalities hold for v, and hence we may say that j'fdp= If6lM for al1 f E

&s. It follows by 26.8 that jt = v, so F is 1-1.Continuity of F follows from the equivalence of (a)and (b)in 26.10. To show

-1 i tinuous, let (gwlbe a sequence of measures and assume F(gn) --that F s conT(g). For f G i/s and any m k 1,

J/'#w- jfdbk= J(J - gmtdpn + jgm- Ddbk+ hmdbkn- hmdbS lduf,gm) + jgmdbkn- jgmdp. (26.30)

Since the second term of the majorant side converges to zero by assumption,

limsupnjjfdvn-

JNgj < lduf,gml < 2: (26.31)

of (26.29).Hencefor some m, and 6: > 0, by the right-hand inequality

limnlfdvn- J/WP,I= 0, and w = p. by 26.10(b).We have therefore shown that IMis homeomorphic with the set F@) c R*, and

R'='is homeomorphic to (0,11=as noted in 5.22. The distance Jx between the imagesof points of Munder Fdefines a metric on Mwhich induces the weak topology. Thespace F@) with the product topology is separable (see6.16), so applying 6.9(i)

-1 i lds the result that EMis separable. This completes the suftkiency partto F y eof the proof.

The necessity part requires a lemma, which will be needed again later on. Letpxe M be the degenerate p.m. with unit mass at x, that is, #x((x)) = 1 and#x(S - (x1) = 0, and so 1et D = (#x:x e S ) i lX.

26.15 Lemma The topological spaces S and D are homeomophic.

Proof The mapping p: S F-> D taking points x e S to points px e D is clearly 1-1,onto. For f e Cs, fdpx= fx), and xn

.--#

-'r implies fxnl-->

fx) and hence pxn =

px by 26.10, establishing continuity of p. Conversely, suppose xn z'r- x. There isthen an ope set A containing x, such that for every N E EN,xn e S -.A for some nk N. Let f be a separating fupction such that fx) = 0, J@)= 1 for y e S -A,and 0 S f f 1. Then jfdpxn= 1 and Ifdh= 0, so pxn * px. This establishes

-1 d is a homeomorphism, as required. wcontinuity of p , an p

Proof of 26.14, continued Now suppose (Mis a separable metric space. It can beembedded in a subset of 0,11=, and the subsets of M are homeomorphic to theirimages in (0,

1JX under the embedding, which are separable sets, and hence arethemselves separable (again,by 6.16 and 6.9(i)). Since D c M,D is separable andhence S must be separable since it is homeomorphic to D by 26.15. This provesnecessity. w

The last theorem showed that IMis metrizable, but did not exhibit a specificmetric on EM.Note that different collections of functions (%) yield differentmetrics, Miven how Jx is defined. Another aooroach to the oroblem is to construct

Page 444: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

424

such a metric directly, and one such was proposed by Prokhorov (1956).For a setA8 dlxA) < ), that is, <Awith a

-halo'

. Thee,/,

define the open set A = f.z':Prokhorov distance between measures p., v G M, is

8 k v(A), al1 A e (26.32)f,(g,v) = inf ( > 0: gtz4 )+

The Functional Central Limit Theorem

Since,/

contains complements and g,(S) = v(S) = 1, it must be the case, unless g, =

v, that g(A) k v(A) for some sets A e./,

and g,(A) < v(#) for others. The idea ofthe Prokhorov distance is to focus on the latter cases, and see how much has to beadded to both the sets and their pmmeasures,to reverse all the inequalities. Whenthe measures are close this amount should be small, but you might like to convinceyourself that both the adjustments are necessary to get the desired properties. Aswe show below, L is a metric, nd hence is symmetric in g,and v. The propertiesare most easily appreciated in the case of measures on the real line, in whichcase the metric has the representation in terms of the c.d.f.s,

f7(F1,F2) = inf ( > 0: F2(x - ) - < F1@) S F2(x+ )+ , tx e R ), (26.33)for c.d.f.s F1 and F2. This is also known as Lvy 's metric.

Fig. 26.1

Fig. 26.1 sketches this case, and F1 has been given a discontinuity, so that theform of the bounding functions Fzx.. 8) + and Fzx - ) -

can be easilydiscerned. Any c.d.f. lying wholly within the region defined by these extremes,such as the one shown, is within of Fz in the L* metric.

26.16 Theorem L is a metric.

Proof fatg,vl = f.(v,p,) is not obvious from the definition; but for any > 08 C If x e A then dx,y) for each y e #, whereas if x e #8consider B = (A ) . , ,

8 Ac If fatytv) f , then#@,y) < for some y e B; or in other words, B =. ,

c (5

g(A )+ = g,(# )+ 8 2 v(#). (26.34)Subtracting both sides of (26.34)from 1 gives

c = v(A)+ (26.35)g,(A) f v(# )+ ,

Page 445: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Getzk Convergence in Metric Spaces 425

andhence fvtvrg) f . This means there is no for which f.(g,,v) > f.(v,g,), nor,by symmetry, for which f.tv,jtl > k f.tg,,vl, and equality follows.

It is immediate that f.tg,,vl = 0 if g, = v. To show the converse holds, note that1/& $ln k v(A) for A e

,$,

and any n e N. If A is closed, A1/nif f.(p,,v) = 0, g,IA )+

1 A as n--+

x. By continuity of p,, g,(A) = limn(g,(Al/&)+ 1/n) v(A), and by

symmetry,v(A) =1imn(v(A1/N)

+ 1/n) k g,(A) likewise. It follows that g,(A) = v(A)for a1l closed A. Since the closed sets are a detennining class, g, = v.

Finally, for measures gov, and 'r let ftg,yvl = 6 and f.(v,T) = n.Then for anyA e $,

s <(A)n) + +n / <A&'n) + +n, (26.36)gA) f v(A )+

where the last inequality holds because

n Jtxz) < n) c (x: #txzl < +n)J = A&>n (26.37)(A ) = (x: ,

the inclusion being valid since d satisfies the triangle inequality. Hence f.(p,,T)

f 6 + n = fxg,,vl + fXv,T). w

W can also show that L induces the topology of weak convergence.

26.17 Theorem lf lg,nlis a sequence of measures in N, gw= g, if and only ifZ(/%,P')-- 0.

Proof To show if', suppose fxgz,p,l -- 0. For each closed setA e,/,

limsupngwtA)8 6 for every > 0, and hence, letting 1 0, limsupngwtA) S g,(A) byS B(X ) +

continuity. Weak convergence follows by (c)of 26.10. To show 4only ito, considerfor A e

,/

and fixed 6 the bounded function

dlxA)A(a) = max 0, 1 -

g. (26.38)

f 1 for x ez18 and fA(x) = 0 for x e

z18.Note that A(x) = 1 for x e z1, 0 < hx) ,

Since

1dxA, - dly,Al I cdx,y)

IA(x)- %@4I S : , (26.39)

independent of A, the family (A, A e,$)

is uniformly equicontinuous (seej5.5)and so is a subset of &s. lf pw= g,, then by 26.10(b),

Aa = sup jhdpn- jfxdp-- 0. (26.40)A .W

Hence, n can be chosen large enough that Aa f 6, for any 6 > 0. For this n orlarger,

p,n(A) f jhdbf JTAYB+An f jfdbt'b8 f B(X$ + 8, (26.41)

or, equivplently, fxgz,g,l f 6. lt follows that f,(g.,g)-->

0. .

lt is possible to establish the theory of convergence on Mby working explicitly

Page 446: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

426

in the metric space @,A). However, we will follow the approach of Varadarajan(1958), of working in the equivalent space derived in 26.14. The treatment in thissection and the following one draws principally on Parthasarathy (1967).TheProkhorov metric has an application in a different context, in j28.5.

The nexttheoremleads on from26.14 by answeringthecrucial question - when isIXcompact?

./

26.18 Theorem (Parthasarathy 1967: th. H.6.4) EMis compact if and only if S iscompact.

The Functional Central Limit Theorem

Proof First, 1etS be compact, and recall that in this case Cs = Us (5.21.),and Csis separable (5.26(ii)).For simplicity of notation write just C for Cs, and write0 for that element of C which takes the value 0 everywhere in S. Let Xcto,1)denote the closed unit sphere around 0 in C, such that supl f l ; 1 for all f eJV0,1),and let (.gpl,m e IN) be a sequence that is dense in #c(0,1). For thissequence of functions, the map Fdetined in (26.28)is a homeomorphism taking (Minto F@), a subset of the compact space (-1,14*. Tlzis follows by the argumentused in 26.14. It must be shown that F@) is closed and therefore compact. Let(pw) be a sequence of measures in IMsuch that F(gw) --ky

e g-1,11*.What we have-1 ito show to prove sufficiency is that y e F@). Since the mapping F onto !M s

continuous, this would imply (6.9(ii))that (Mitself is compact.Writ hnf) = J##w,and note that, since IJfll.kI f sup/l #(f)IS 1, this defines

a functional

An(f): Xc(0,1) F.-> (-1,11. (26.42)In this notation we have F(pQ= (Aa('1), hngzj,

...).

Since JVO, 1) is compactand (%) is dense in it, we can choose for every f e V0,1)a subsequence fgmk,ke IN) converging to f. Then, as in (26.30),

Ihnf) - An'(f) I S lduf,gmkj + Ihnlgm - hn'gmk) I. (26.43)The second term of the majorant side contains a coordinate of F(j%) - Ftg,aeland

converges to 0 as n and n' -- x by assumption. Letting k-->

=, we obtain, as in(26.31),

lim IAa(/) - A,'(J)I = 0. (26.44)'

zXyN -->X

This says that (Aa ) is a Cauchy sequence of real functionals on (-1,11, and somust have a limit A; in particular, y = (A(#1), A('2),...).

It is easy to verify that each An(#), and hence also hf), satisfy conditions(26.10)q26.12) for f e Xc(0,1).Since for every f e C there is a constant c > 0such that c.f e iV(0,1), we may further say, by (26.12),that hf) = cA*(.Fc)whereA*(.) is a functional on C which must also satisfy (26.10)-426.12).From 26.9,there exists a unique p,e Msuch that A*(f)= fdbkbf 6 C. Hence, we may write y =

F(g). lt follows that F(M) contains its limit points, and being also bounded is-1 i a homeomorphism, EMis also compact. This completescompact; and since F s

the proof of sufficiency.

Page 447: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Fctzk Convergence in Metric Spaces 427

To prove necessity, consider D = (px:x e S) >, the set shown to be homeo-morphic to S in 26.15. lf D is compact, then so is S. D is totally bounded whenIXis compact, so by 5.12 it suffices to show completeness. Every sequence in D isthe image of a sequence txne S ), and can be written as (pxal, so suppose pxn = qG N. If xn

---)

x G S, then q = px e D by 26.15, so it suffices to show that xn i'z+ xis impossible.

The possibility that (x,:) has two or more distinct cluster points in S is ruledout by the assumption pxn = q, so ak z,.-l x means that the sequence has no clusterpoints in S. We assume tltis, and obtain a contradiction. Let E = (x1,x2,...l c Sbe the set of the sequence coordinates, and let A'I be any infinite subset of E. Ifthe sequence has no cluster points, every point y G E 1 is isolated, in thatF1 rn5'(.y,e)- (y) is empty for some s > 0. Otherwise, there would have to exist asequence (y,,e FI l such that yn c S(y,1In) for every n, and y would be a clusterpoint of txn)contral'y to assumption. A set containing only isolated points isclosed, so

'1

is closed and, by 26.10(c),

qE'j) k limsup pxnEj) = 1, (26.45)n.Ox

where the equality must obtain since F1 contains xn for some n k N, for everyN G N. Since q e EM,this has to mean qEj) = 1. But clearly we can choose anothersubset from E. say E1s such that F1 and E1 are disjoint, and the same logic wouldgive qE1) = 1. This is impossible. The contradiction is shown, concluding theproof. w

2.5 Tightness and ConvergenceIn j22.5 we met the idea of a tight probability measure, as one whose mass isconcentrated on a compact subset of the sample space. Formally, a measure jt on aspace (S,./) is said to be tight if, for every 6: > 0, there exists a compact setKz e

,/

such that g,(A-) f s. Letl-l JIX denote any family of measures. The family FIis said to be unkformly tight if suppzsrIg,(AQS :.

Tightness is a property of general measures, although we shall concentrate hereon the case of p.m.s. ln the applications below, H typically represents thesequence of p.m.s associated with a stochastic sequence (Ak)T.If a p.m. g, istight, then of course g,(#J > 1 - e for compact Ke. In j22.5 uniform tightness ofa sequence of p.m.s on the line was shown to be a necessary condition for weak

convergence of the sequence, and here we shall obtain the same result for anymetric space that is separable and complete. The first result needed is thefollowing.

26.19 Theorem (Parthasarathy 1967: th. H.3.2) When S is separable and complete,

every p.m. on the space is tight. n

N i this proves the earlier assertion that every measure on (R,f) is tightOt ce, ,

given that R is a separable, complete space. Another lemma is needed for thproof, and also subsequently.

Page 448: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

428 The Functional Central Limit Theorem

26.20 Lemma Let S be a complete space, and 1et

x jnA- = f) U'nj,

n=1 f=1

where Sni is a sphere of radius In in S, Sniisinteger for each n, Then K is compact.

Proof Being covered by a finite collection of the Li for each n, K is totallybounded. If (.zy,j e IN) is a Cauchy sequence in K, completeness of S implies that

xj--->

x e S. For each n, since K i UillXr,sinfinitely many of the sequence coor-dinates must 1ie in Kn = Kcj #nkfor some k, 1 S k S jn. Since Lkhas radius 1/?z,taking n to the limit leads to the conclusion that OnA%= (xJ, and hence. x e K; Kis therefore complete, and the lemma follows by 5.12. w

(26.46)

its closure, and jn is a finite

Proof of 26.19 By separability, a covering of S by l/n-balls Sn = Sx, 1ln4, x eS, has a countable subcover, say fsnii e N1,for each n = 1,2,.... Fix n. For any: > 0 there must exist jn large enough that jt(An) 2 1 - zlln, where A,, =

U7J1&j; otherwise we would have g,(U7=1&j)= g,(S) < 1 - elln, which is acontradiction since g, is a p.m. .

Given :, choose 4,, in this manner for each n and let Kz = O=n=1Xkwhere Wn=

Uklxnj,note. Then Ke is compact by 26.20. Further, sincef

CK1

c3

c = ()z4-= tj(4-)cKe n n ,

n=1 n=1

1 - g,(Wn)f 1 - g,(z4n)S zl2n we have

(26.47)

and noting thatg((Wn)C)

=

X Cr

g,(A-:) < Xg,((Xk)9s : 771/2/ = :,n=1 n=1

(26.48)

or, in other words, g,(Ak) > 1 - s. .

Before moving on, note that the promised proof of 12.6 can be obtained as acorollary of 26.19.

26.21 Corollary Let (S,./,g,) be a separable complete probability space. For anyE e

,/,

there is for any : > 0 a compact subset K of E such that p,(F - A') < s.Proof Let the compact set A e

,/

satisfy g,(A) > 1 - e/2, as is possible by 26.19,and let (A,,L,ga)denote the trace of (S,./,g,) on A. This is a compact space, suchthat every set in yh is totally bounded. By regularity of the measure (26.1)thereexists for any A e JQan open set A' Q A such that g'atA' - A) < e/2. Moving tothe complements, A'c is a closed, and hence compact, set contained in AC. ButAc - A'c = A' - A and gtAf

-z4'9

= g,atAf -A'9g(A) < :/2.

Now for any set E e,/

1et A = CEraA)C, and 1et K =

A'C, and this argumentshows that there is a compact subset K of Erh A (andhence of E4 such thatg,((F r'hA) - A') < J2. Since gtfr-7 A9 S g,(Y)S &2, g(# - A') < E, as required. .

Page 449: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

WrctzkConvergence in Metric Spaces 429

12 6 follows from this result on noting that R is a separable completeLemma .

space.Theorem 26.19 tells us that on a separable complete space, every measure pwof

a sequence is tight. It remains to be established whether the same propel'tyapplies to the weak limit of any such sequence. Here the reader should reviewexamples 22.19 and 22.20 to appreciate how this need not be the case. The nexttheorem is a partial parallel of 22.22, although the latter result goes further ingiving suffiient conditions for a weak limit to exist. Here we merely establishthe possibility of weak convergence, via an application of theorems 5.10 and 5.11,by showing the link between uniform tightness and compactness.

26.22 Theorem (Parthasarathy 1967: th. /.6.7) Let (S,#) be a separable complete

space, and let H c IXbe a family of p.m.s on (S,#7).H is compact if and only if itis unifonnly tight.

Proof Since (S,#) is separable, it is homeomorphic to a subset ofg0,11O,

by 6.22.Accordingly, there exists a metric J' equivalent to d such that (S,J') is rela-tively ompact. ln this metric, 1etS be a compact jpace containing S and 1et

,$

bethe Borel field on S. We cannot assume that S e

,$,

but #7,the Borel tield of S, isthe trace of

./

on S.a A.

a'k

a.

Define a family of measures H on S such that, for p.e H, g,(A) = jttA rnS), g,e1R,for eachA e

,%'.

Toprove that H is compact, we show that asequence of measuresfjtn, n e ENl flpm H has a cluster poiqt in H. Consider the counterpart sequence

zx,% *.

tg,n,n e (N) in H. Since S is conzpact, H is compact by 26.18, so this sequence has

one or more cluster points in H. Let v be such a cluster point. The object is toshow that there exists a p.m. jt e H such that

P,'N

= v.Tightness of H means that for every integer r there is a comppct set Kr i S

*. ,%

a

such that g(#)) k 1 - 1/r, for all g, e H. Being closed in S, Kr e,/

and p,(X)) =

gatlJrf'-'h S) = g(&), al1 g.e H. Since Kr is closed we have for some subsequence (nz,k e IN)

v(#)) k limsup gn1(A-r)2 1 - 1/r, (26.49)k-+x

by 26.10(c). Since Urfre 9,we have in particular that v(UrA)) = 1. Now, supposwe 1etv*(S) denote the outer measureof S in terms of coverings by Xsets.SinceUrrrc S, we must have v*(S) k v*(Ur&)= v(Ur#))= 1. Applying 3.10, note that Sis v-measurable since the inequality in (3.19)becomes

v*(# faS) K v*(#), (26.50)which holds for a1l B c i. Since

,/

is the trace of 9 on S, all the sets of,/

aredingly v-measurable and there exists a p.m. g e H such that ; = v asaCCOF ,

required. For any closed subset C of S, there exists a closed D c l such that C =

% D4 f D4 and limsupkgwto S jttC'') are equiv-D rn S, the assertions limsuptgw/alent, and hence, by 26.10, gw = jt. This means that ltn ) has a convergent sub-

sequence, proving sufficiency.Notice that completeness of S is not needed for this part of the proof.To prove necessitv. assume H is comoact. Lettine tS-;. i G INl he n cnllntnhla

Page 450: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

430

coveringof S by l/n-s:heres, and fjn,n G NJ any increasing subsequence of inte-ers,define Wn= Uklknf.We show tirst that the assumption, Hg,e H such that,g

for > 0,

p,(:n)< l -

, all n, (26.51)

The Functional Central Limit Theorem

leads to a contradiction, and so has to be false. If ,426.51) is true for at leastone element pf (compact)H, there is a convergent sequence tp, k e ENJ in 11,withg,l = jt, such that it holds for all g. (Even if there is only one such element, wecan put g = g,, al1 k). Fix m. By 26.10,

g,(Xk)K limsup g(Xn) f 1 -

. (26.52)k-yx

Letting n.-..-)

x yields p,(S) f 1 -

, which is a contradiction.Putting = zlln, we may therefore assert

(W) > 1 - elln all n, al1 g e H.P, n , (26.53)

Letting Ke = O';xlWn,this set is compact by 26.20 (S being complete) and itfollows as in (26.48)above that jt(Ak) > 1 - :. Since g is an arbitral'y element ofH, the family is unifonnly tight. w

We conclude this section with a useful result for measures on product spaces.See j7.4 for a discussion of the marginal measures.

26.23 Theorem A p.m' . p,on the space (S x F,.$

C)54 with the product topologyis tight iff the marginal p.m.s jtx and jty are tight.

Proof For a set K c S xF, let Kx = J:x(A')denote the projection of K onto S. Sincethe projection is continuous (see j6.5), Kx is compact if K is compact (5.20).Since

p,x(L)= p,(&x13 p,(A3, (26.54)tightness of g, implies tightness of jtx. Repeating the argument for gy proves thenecessity. For sufficiency we have to show that there exists a compact set K e,$(&5, having measure exceeding 1 - e. Consider the setr=A xf whered e

.t/7

andg,x(A) > 1 - e/2, and B e 5 where y(#) > 1 - e/2. Note that

Kc = (z4x Bc) QJ (A6x #) t.p (Af x Bc4, (26.55)where the sets of the union on the right are disjoint. Thus,

,t(A-&)< g(4Cx#)+ gtA xBc) +

2jt(AC xBC4

= gt4c x5'l + jt(S x Bc)

= gxtztcl +p.y(#C) < k. (26.56)

lf A and B are compact they are separable in the relative topologies generatedfrom S and

'r

(5.7),and hence K is compact by 6.17. w

Page 451: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Fdtzk Convergence in Metric Spaces 431

26.6 Skorokhod's Representation

Considering a sequence of random elements, we can now give a generalization ofsome familiar ideas from the theory of random variables. Recall from 26.7 thatseparability ensures that the distance functions in the following definitions areF.V.S.

Let fxnl be a sequence of random elements and x a given random element of aseparable space (S,./). If

#(xn()),.x()) --- 0 for ) e C, with #(C) = 1, (26.57)we say that .x,,

converges almost surely to x, and also write xn --F-*->x. Also, if

Pdxnr 2 E) --- 0, a11ir > 0, (26.58)we say that xn converges in probability to x, and write xn -F-.2: x. A.s.convergence is sufficient for convergence in probability, wlzich in turn issufficient for xn -P-> x. A case subsumed in the above definition is where x = awith probability 1, a being a fixed element of S.

We now have the following result generalizing 22.18.

26.24 Theorem Given a probability space L1,5,P) let fxntlll and (yu()) bebrandom sequences on a separable space (S,#). If xn

--->

x and dxnty -EL 0, then-P- x.yn

Proof Let A e,/

be a closed set, and for 6: > 0 put Ac = (x:#@,A) f :) e./,

also

a closed set for each :, and a4: 1 A as 6: i 0. Since

1: )7ntt,)l e zl l ? lt,):x,,(t,)) Acl kp l: J(x,,(t0),A'n(t,)))k El,

we have

fyn 6 X) < Pxn 6 AJ + Pdxn,y k E), (26.59)and, letting n

--+

x,

limsup Pyn e Al f limsp g'ntz4sl< g(AJ, (26.60)n'-lx n->x

where gwis the measure associated with xn, g the measure associated with x, andh d inequality of (26.60)is by hypothesi on tak)and 26.10(c). Since thist e secon

inequality holds for every E > 0, we have

limsup#@u e A) f g(A), (26.61)n-co

by continuity of the measure. This is sufficient for the result by 26.10. .

In j22.2, we showed that the weakconvergence of a sequence of distributions onthe line implies the a.s. convergence of a sequence of random variables. This isthe Skorokhod representation of weak convergence. That result was in fact aspecial case of the final theorem of this chapter.

26.25 Theorem (Skorokhod 1956: 3. 1) Let fgulbe a sequence of measures on the

Page 452: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

432

separable, complete metric space (S,./). Theremeasurable functions

The Functional Central Limit Theorem

exists a sequence of Og(),1J/.F

ak: (0,1) .-A S

such that pw(A)= m(f : Ak() e A )) for eachA e,/,

where m is Lebesgue measure.If gw= g.,there exists a function x() such that g,(A) = mf (0: x() e A )) for eachA e

./,

and #(xn(),x()) --> 0 a.s.g?nl as n--) cxk.

Proof This is by construction of the functions,v().

For some k e N let (x(,0i6 IN) dellote a Cfmntable collection of points in S such that, fbr every

.< G S,?) S 1/21*1for some . Such sequences exist for every k by separability.d(X,xt )

(11 f 1/2:+1 < r < 1/2: denote a system of spheres in S having theLet SlXt ,rk),Or k ,

roperty jt(0'(XjO,rk)) = 0 fOr every I'. An rk satisfying this condition exists,PSince there can be at most a countable number Of points r such that g(:5'(A1;O r))

> 0 for one or more ; this fact follows from 7.4.* ) i e EN) covers S, and accordingly the setsFor given k, the system (5'(x,,rk

,

i - 1

k s-/l r ) - sxjktrk), i e s ,Di =, k ,

j=3(26.62)

form a partition of S. By letting each of the k integersdently over N, define the countable collection of sets

s = o! cho? ra ... cjnbtk e y.i j ,..

.,ik

I j. I 2

l ... ik range indepen-5 >

(26.63)

Each SiL,...,ik is a subset of a sphere of radius rk < 1/2, and p.(:5'jj,,,.,j:)=

0. By construction, any pair Siq,...,igand 5f9,...,4 are disjoint unless ikk'.Fixing l

,...,4-1

we haveK

's%i f-k gSij.,...,ik-j,1:''';

ik.zzl(26 . 64)

and in particular,

Si: = S.'1= 1

(26.65)

That is to say, for any k the collction f&j,...,j:)forms a partition of S,which gets tiner as k increases. These sets are not a11required to be non-empty.

(n)For any n e ENand k e (N, defie a partition of (0,1) into intervals Aj1,...,j:,t* lies to the left of hi'st iL if ij = i)where it is understood that Afj,,,.,f: j ,...,

for j = 1,...,r - 1 and ir < ir' for some r, and the lengths of the segments equalthe probabilities vmSij,...,ik4.

We are now ready to define a measurable mapping from g0,11to S. Choose anelement :j1,...,j:from each non-empty SiL,...,ik, and for ) e g0,1qput

k if (l) en' (26.66).Aktt0l =: iL,...,ik l ,...,:.

.

k k+m 1/2 for m k 1 and taking k =Note that by construction dxn,xn ((t)))S ,

1,2,...defines a Cauchy sequence in S which is convergent since S is a completelxxr eoc,oxxmvxtlnn XA;v.+zx v rnh - 11w%

-kfnxh

lnf/-w/:x

Page 453: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

433

To show that ak((l)) is a random element with distribution defined by gw, it issufficient to verify that

Pw(X) = Pxn e X) = l?1(ltt: Ak() e (26.67)for, at least, all 4 c

.8' such that g.(?A) = 0. If we 1et At) denote the union of'k' the union of all Si j c Ac, it is clear thatall Sij,...,ik i A and A 1', ' ' . '

kt A A'kt d that (26.67)holds in respect of Aks and A'* Let-4 c c , an .

t1) dx a4)s 1/2k) (26.68)C = (.x: , ,

'kt z'ttk) ck' sincejtntctl) .-+ p.staz4l= 0 as k .-- x, it followsso that 4 - i .

'kt Atk) 0 and hence gutz4tll -->

p, (A). This proves (26.67).that p'utz4 - )-->

, nIt remains to show that, if gw => p,, then xn

.,-.2/

x a.s.g??1l. Since the length ofA('0 equals g.(k%j,...,jk),we can conclude that the sequence of intervalsI 1,,

.vvik

%''? has a limit AiL,...,ik as n x. Pick an interior point (.9 of'(21,...,fkl

A and note that xk meets the condition Z4) e Sit,...,ik, by defini-l , . .. , ik'.t

ion. Then for N large enough we can be sure that, for n k N, ) e Y,Oi andt k. 1.) . , . .)

((,))a-Nt,)ll< 1/2-1 I etting k -- x, we conclude that dxnl x()))hence dlxn , . w

S : for any : > 0 whenever n is large enough. We cannot draw this conclusion forthe boundary points of the Aj(,..,,jk,but these are at most countable even as k---) x, and have Lebesgue measure 0. This completes the proof. .

The construction of j22.2 is now revealed as a particularly elegant special case,since the mapping from g0,11to S is none other than the inverse of the cad.f.when S = R . In his 1956 paper, Skorokhod goes on to use this theorem to proveconvergence results in spaces such as Dg0,11. We shall not use his approachdirectly, but this is a useful trick which has a variety of potential applic-ations, just as in the case of R. One of these will be encountered in Chapter 30;

W'TJ/C Convergence in Metric Spaces

Page 454: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

27Weak Convergence in a Function Space

27.1 Measures on Function SpacesThis chapter is mainly about the space of continuous functions on the unitintelwal, but an important preliminary is to consider the space #r0,1j of all realfunctions on (0,11. We shall tend to write just R for this space, for brevity,when the context is clear. In this chapter and the following ones we also tend touse the symbols x,y etc. to denote f'unctions, and t,s etc. to denote theirarguments, instead of f,g and x,y respectively as in previous chapters. This isconventional, and reflects the fact that the objects under consideration areusually to be interpreted as empirical processes in the time domain. Thus,

.x2(0,11k-- R

will be the function which assumes the value x(f) at the point t.ln What follows the element x will typically be stochastic, a measurable mapping

from a probability space L1,5,P4. We may legitimately write

x2 f 8.-4 #,

assigning x((t)) as the image of the element , but also

x: f x g0,1J8-+ R,

where x(),/) denotes the value of x at ((t),8.We may also write a#) to denote theordinate at t Fhere dependence on ) is left implicit. The potential ambiguityshould be resolved by the context. Sometimes one writes xt to denote the randomordinate where xr((l)) = x(),/), but we avoid this as far as possible, given our useof the subscript notation in the context of a sequence with countable domain.

The notion of evaluating the function at a point is formalized as a projectionmapping. The coordinate projections are the mappings 7:r: #m,1q

---) R, where p@)-1

= x(/). The projections define cylinder sets in R; for example, the set C:r (J), ae R, is the collection of all functions on g0,1Jwhich pass through the point ofthe plane with coordinates (a,t4.This sort of thing is familiar from j 12.3, andthe union or intersection of a collection of k such cylinders with differentcoordinates is a k-dimensional cylinder', the difference is that the number ofcoordinates we have to choose fmm here is uncountable.

Let (/1,...,/klbe any finite collection of points in g0,1J, and letk gg j;A,l,...,r,(x) = (l,1(.'r),...,l,/x))e R ( .

denote the k-vector of projections from these coordinates. The sets of thecollection

Page 455: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gdtzk Convergence in a Function Space 435

-1 R B e Bk t t e gO11 k e ENJ, (27.2)J = ta,1,...,,,(.:)c (0.13: , 1,..., k , ,

are called the fnite-dimensionalsets of #m,1j. It is easy to verify that R is at'ield. The projection c-/c/t is defined as T = c(X).

Fig. 27.1 shows a few of the elements of a rather simple N-set, with k = 1, and-1 R ists of all thoseB an interval (tz,:j of R. The set H = afj((tz,:)) e cons

functions that succeed in passing through a hole of width b - a in a baniererected at the point /1 of the interval. Similarly, the set of all the functionspassing through holes in two such barriers, at /1 and f2, is the image under

-1

afj,u of a rectangle in the plane - and so forth.

I! &

! 10 !

/1

Fig. 27.1

If the domain of the function had been countable, the projection c-field P wouldbe effectively the same collection as :8*0of 12.3. But since the domain isuncountable, T is strictly smaller than the Borel field on #. The sets of example26.3 are Borel sets but are not in T, since their elements are restricted atuncountably many points of the interval. As that example showed, the Borel sets ofR are not generally measurable', but (#,T) is a measurable space, as we now show.

Define for k = 1,2,3,... the family of finite-dimensional p.m.s g,fI,...,f: onkBk indexed on the collection of all the k-vectors of indices,(R , ),

l(/1,..-,fk)-.

tj e E0,1), j = 1,. - -

sk ) -

This family will be required to satisfy two consistency properties. The first ism-k1,1,...,,,(A3= p1,...,,,,,(FX R ) (27.3)

for E e Bk and all m > k > 0. In other words, a k-dimensional distribution can beobtained from an pz-dimensional distribution with m > k, by the usual operation ofmarginalization. This is simply the generalization to arbitrary collections ofcoordinates of condition (12.7).The second is

-1 zy4)lif1, .

...,fk = 1i//(1), . .

.,fp(k) 1 ' 1 -

kwhere n/1 h.....nfI:h i: n nermutation of the intejyerq and r-.: R

Page 456: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

436

denotes the (measurable)transformation which reorders the elements of a k-vectoraccording to the inverse permutation; that is, 9(.v(l),....vp(o)=

.v1,...,-q. Thiscondition basically means that re-ordering the vector elements transforms themeasure in the way we would expect if the indices were 1,...,k instead of/1,...,fk.

The following extends the consistency theorelp, 12.4.?

27.1 Theorem For any family of tinite-dimensionalp.m.s (w,...,tk)satisfyingconditions (27.3)and (27.4),there exists a unique p.m. g, on (#,T), such that

-1 f h finite collection of indices.gl,...,4= g.'If.l,...,Jk Or eac

The Functional Central Limit Theorem

Proof Let Fdenote the set of countable sequences of real numbers from g0,1j;thatis, ':

e F if T = fsj e g0,11, j e IN). Define the projections 7:v: R F-> R= byIT@) = @@l),XJ2)....). (27.5)

For any T, write vnT= g,1,,..,,nfor n = 1,2,... Then by 12.4, which applies

T R= S= h that v'E=

vT7:-1 herethanks to (27.3),there exist p.m.s v on ( , ) suc n n , wn,,(-y)is the projectior of the first n coordinates of y, for y e R=. Consistencyrequires that vnT

=vT,, if sequences I and f have their first n coordinates the

-1 . BS''Q

'r

s F) we may define a p.m. g,onsame. Since evidently P c (7:s(#). e , ,

(#,T) by setting

-1 s s) zy6)g,(T:'r (#)) = V ( .

for each B e 0=. No extension is necessary here, since the measure is uniquelydefined for each element of P.

lt remains to show that the family fwl,...,rk1con-esponds to the finite-dimensional distributions of g,. For any wj,...,uthere exists 'r

e F such that(/1,...,fk) (.1,...,.,,1,for some n large enough. Construct a mapping v:

RNF-

k b r first applying a pennutation p to the indices .l,...,Jn which sets xlpsijjR , Jf RN

to R b suppressing the= xt for i = 1,...,k, and then projecting rom yindices svk+j),...,svn). The consistency properties imply that

-1-1

T-1 -1

&1,...,4= g,x1,...,:nV= Vr,V = Y (V0ln) = IIVOJGOJ%I. (27.7)Since vozrsoav= a,j,...,u projection, w:,...,u a finite-dimensionaldistributionof p,. .

If we have a scheme for assigning a joint distribution to any t'initecollection ofcoordinate functions (x(f1),...,x(/k)1 with rational coordinates, this can beextended, according to the theorem, to deine a unique measure on (S,T). Thesep.m.s are called the tinite-dimensionaldistributions of the stochastic process x.The sets generated by considering this vector of real r.v.s are elements of R, andhence there is a corollly which exactly parallels 12.5.

27.2 Coroilary R is a determining class for (R,T). I:n

Page 457: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'ctzkConvergence in a Function Space

27.2 The Space CVisualize an element of Cg(),1j,the space of continuous real-valued functions ong0,11, as a curve drawn by the pen of a seismograph or similar instrument, as ittraverses a sheet of paper of unit width, making arbitrary movements up and down,but never being lifted from the paper. Since g0,1j is a compact set, the elementsof C@,lj are actually uniformly continuous.

To get an idea why distributions on Q(),1jmight be of interest to us, imagineobserving a realization of a stochastic sequence (5'/)lT, from a probability

space ,5,P4, for some finite n. A natural way to study these data is to displaythem on a page or a computer screen. We would typically construct a graph of Sjagainst the integer values of j from 1 to n on the abscissa, the discrete pointsbeing joined up with ruled lines to produce a

4time plot' , the kind of thing shownin Fig. 27.2.

(0)

1Fig. 27.2

We will then have done rather more thanjust drawn a picture', by connecting thepoints we have defined a random continuous function, a random drawing (thewordhere operates in both its senses !) from the space CI1,nj.lt is convenient, andthere is obviously no loss of generality, if instead of plotting the points atunit intervals we plot them at intervals of 1/4rl- 1)., in other words, let thewidth of the paper or computer screen be set at unity by choice of units ofmeasurement. Also, relocating the origin at 0, we obtain by this means an elementof C%,lj, a member of the subclass of piecewise linear functions, with fonnula

x(f) = i - tmjxli - 1)/-) + (1+ tm - ilxilm) (27.8)R for ifor t e (( - 1)/-, ilmj, and i = 1,...,-, m = n - 1. The points xilm) e

= 0,...,- are the m + 1 vertices of the f'unction.In effect, we have defined a measurable mapping between points of Rf and

elements of C%,lj, and hence a family of distributions on C%,1jderived fromL1,5,P), indexed on n. The specific problem to be studied is the distribution of

these graphs as n tends to infinity, under particular assumptions about the

sequence (5'j1.When (u%)is a sequence of scaled partial sums of independent orasymptotically independent random variables, we shall obtain a usefulgeneralization of the central limit theorem.

Page 458: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

438

As in j5.5, we metrize C%,lj with the unifonu metric

The Functional Central Limit Theorem

Jg(x,yl = sup Ix(/)-y(f)

l.

t(27.9)

lmagine tying two pens to a rod, so that moving the rod up and down as ittraverses a sheet of paper draws a band of fixed width. The unifonn distancedvfx,y) between two elements of G0.1jis the wldth of the narrowest such bandthat will contain both curves at a1l points. We will henceforth tend to write Cfor (Cm,1),#g) when the context is clear.

C is a complete space by 5.24, and, since (0,1jis compact, is also separable by5.26(ii). ln this case an approximating function for any element of C, fullydetermined by its values at a finite number of points of the interval (compare5.25) is available in the form of a piecewise linear function. A set l'l,s =

(/1,...,/,,) satisfying 0 = tl < fl < ... < tm = 1 is called a partition of (0,1).This is a slight abuse of language, an abbreviated way of saying that the collect-ion defines such a partition into subintervals, say, Aj = (/f-1,h) for i =

1,...,m - 1 together with Am = (/,,-1,11.The norm

11H,,,11= max ln - //-1 l (27.10)l S i S m

is called the jineness of the partition, and a rehnement of H,,, is any partitionof which H,u is a proper subset. We could similarly refer to minlxjxsjt/f - ff-l )as the coarseness of FI,u.

The following approximation lemma specializes 5.25 with the partition H2a =

( illn, i = 1,...,2* ) for n k 1 playing the role of the 6-net on the domain, within this case 6 < llln.

27-3 Theorem Given x e C, letyu E C be piecewise linear, having ln + 1 vertices,with

max l.x(2-'')-ya(2-'')

Ij < e.1f i :( ln

(27.11)

There exists n large enough that dvfx,y < e.Proof Write dj = g2-f( - 1), 2-''1, i =

1,...,2N. (lnclusionof both endpoints isinnocuous herep) Applying (27.8)we find that, for t e Ai, yn = lynto +

(1 - Dyn(f'')where t' = 2-&(f - 1), /'? = 2-2/ and , = i - lnt. Noting that

Ix(f)-y,,(/)

I < lal'-xt')

I + (1 - 1) lx(/)- x(/'') I(27.12)

and that for n large enough, supmrszkIx(.) - x(/) l < r1eby continuity, it followsthat for such n,

+ 11x(/') - yn(/') 1+ (1 - 1) lxt' - y,,(f'') I,

duxty = max sup jx(8 - ynt/) j < . wLfifln fexf

(27.13)

Page 459: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'etzkConvergence in a Function Space 439

Note that as n--)

x, H2n -- D (thedyadic rationals). There is the followingimportant implication.

27.4 Theorem If x,y e C and.x(/)

= y(f) whenever t e D, then .x

= y.

Proof Let zn be piecewise linear with zn(f) = x(/) = y(/) for t e H2n. Byassumption, such a znexists for every n e 5. Fix :, and by taking n large enoughthat maxtt/mza), dvytznl )< e, as is possible by 27.3, we can conclude by thetriangle inequality that dux,y) < :. Since : is arbitrary it follows that Jgtx,yl

= 0, and hence x = y since du is a metric. w

The continuity of certain elements of #, particularly the limits of sequences offunctions, is a crucial feature of several of the limit arguments to follow. Animportant tool is the modulus of continuity of a function x e R. the monotonefunction wx: (0,11E-> R+ defined by

wx(8) = sup Ix(J) - x(f) I. (27.14)Is-t I<

wxhas already been encountered in the more general context of the Arzel-Ascolitheorem in j5.5. It tells us how rapidly x may change over intervals of width 8.Setting 8 = 1, for example, defines the range of x. But in particular, the factthat the .x

are unifonuly continuous functions implies that, for every x e C,

wx(6) 1 0 as 8 1 0. (27.15)

For fixed 8, we may think of wx() = w(x,6) as a function on the domain C. Sincelw@,6) - w@,6) I S ldvlx,y). w@,8) is continuous on C, and hence a measurablefunction of x.

The following is the version of the Arzel-Ascoli theorem relevant to C.

27.5 Theorem A set A c C is relatively compact iff

sup Ix(0)l < x, (27.16)AlA

lim sup wx() = 0. :a (27.17)-+0 ire/t

These conditions togetherimposetotal boundedness and uniform equicontinuity onz4. Consider, for some t e g0,1jand k e N,

1.-.,,u 1.-(0,1+ x- -(),)--('-,',)

.

/=1(27.18)

Equality (27.17)implies that for large enough k, supxsxwxtl/k) < x. Therefore(27.16) and (27.17) together imply that

sup sup 1x(f)I < =. (27.19)t irf A

In other words, 1lthe elements of A must be contained in a band of finite widthnrnllnd fl Thiq thenram iq therefnre n qtrniehtfnrwnrd cnrnllnr'v nf G.AN

Page 460: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

00 The Functional Central Limit Theorem

27.3 Measures on C

We now see how 27.1 specializes when we restrict the class of functions underconsideration to the members of C. The open spheres of C are sets with the form

Sx,r) = (y e C: dux,y) < rl (27.20)for x G C.

Fig. 27.3

Such sets can be visualized as a bundle of continuous graphs, with radius r andthe function x at the core, traversing the unit interval - for example all thefunctions lying within the shaded band in Fig. 27.3. We shall write Oc for theBorel field on C, and since hdv) is separable each open set has a countablecovering by open spheres and Sc can be thought of as the c-field generated by theopen spheres of C. Each open sphere can be represented as a countable union ofclosed spheres,

X

sx,r) = UAx,r - 11n4,n=1

(27.21)

and hence fc is also the c-field generated from the closed spheres.Now consider the coordinate projeciions on C. Happily we know these to be

continuous (see6.15), and hence the image of an open (closed)finite-dimen-sional rectangle under the inverse projection mapping is an open (closed)elemenfof P. Letting Rc = IHrh C: Se Jf) with R defined in (27.1),and so defining Pc =

c(>tc), we have the following important property:

27.6 Theorem Oc = Pc.

Proof Let

- 2-% I < (x e Rc, (27.22)Hkx, = y e C: max Iy(2 )-x(

1<fs;2and so 1et

Page 461: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

WTJ/C Convergence in a Function Space 441

X

s(x,a)= 'Hkx,-k=1

= y e C: sup Iy(/)- x(f) I f a e Pc,t G (D

(27.23)

where D denotes the dyadic rationals. Note that we cannot rely on the inequalityin (27.22)remaining strict in the limit, but we can say by 27.4 that

S(x,a) = R.x,a), (27.24)where k is the closure of S. Using (27.21),we obtain

sx,r) = US(x, r- $ln).n=1

(27.25)

It follows that the open spheres of C 1ie in Tc, and so Bc Tc.

Fig. 27.4

To show Pcz fc consider, for (x e R and tj e (0,11,functions xn e Cdetined bythe restriction to (0,1jof the functions on R,

a + n(n + 1/n)(/ + 1/n - to), /() - 3In S t < /0

xn(/) = a + nn + Lflnlto + 1/n - /), /0 S t < to+ 1/n (27.26)

a, otherwise.

Every element y of the set Sxn,n) e fc has the property y(/o) > a. (This is theshaded region in Fig. 27.4.) Note that

X

G(a,fO) = ty e C: 7:ro('y) > al = U5'(ak,/l)e Bc.n=1

(27.27)

Now, G(a,/o) is an element of the collection Rcto where for general t we define-JRct = tlr (#), B e B 1. (27.28)

Page 462: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

442

In words, the elements of Rct are the sets of continuous functions x having x(/) eB, for each B e f. ln view of parts (ii)and (iii)of 1.2, and the fact that B canbe generated by the collection of open half-lines (a,x), it is easy to see thatRct is the c-field generated from the sets of the form G(a,/) for fixed t and (x 6

R. Moreover, Rc is the c-field generated by Llct, t e (0.11). Since G(a,/) q fcfor any a and t by (27.27),it follows that Rc Q Bc and hence Pc Oc. *

lhe Functional Central Limit Theorem

It will be noted that the limit.zu(/)

of (27.26)is not an element of C, takingthe value (x at a1l points except to, and +x at /(). Of course, (xnl is not a Cauchysequence. However, the countable union of open spheres in (27.27)is an open set(the inverse projection of the open half line) and omits this point.

Pc is the projection c-field on C with respect to arbitrary points of thecontinuum (0,11,but consider the collection P( = fHrh C: H e T'), where T' is thecollection of cylinder sets of Sg(),ljhaving rational coordinates as a base. Inother words, the sets of T' contain functions whose values x(f) are unrestricted

except at rational /. Since elements of C which agree on the rational coordinates

agree everywhere by 27.4,

PL = Tc. (27.29)This argument is just an alternative route to the conclusion (from6.22) that Cij homeomorphic to a subset of R*. However, it is not true that P = T', because Pis generated from the projections of every point of the continuum (0,11,andarbitrary functions can be distinct in spite of agreeing on rational f.

Evidently (C,fc) is a measurable space, and according to 27.2 and 27.6, Rc is adetermining class for the space. In other words, the finite-dimensional distribu-tions of a space of continuous functions uniquely determine a p.m. on the space.Every pom. on #go,1Jmust satisfy the consistency conditions, but the elements ofC have the special property that x4/1) and x(/a) are close together whenever fland h are close together, and this puts a further restriction on the class offinite-dimensional distributions which can generate distributions on C. Suchdistributions must have the property that for any e > 0, ('i 8 > 0 such that

I/1 - tl 1< = p,(tx: Ix(fll-x(/2)

I < s1) = 1. (27.30)The class of p.m.s in (C,Bc), whose finite-dimensional distributions satisfy thisrequirement, will be denoted Nc. Note that, thanks to 26.14, we are able to treatNc as a separable metric space. This fact will be most important below.

27.4 Brownian MotionThe original and best-known example of ap.m. on C, whose theory is due to NorbertWiener (Wiener 1923) is also the one that matters most from our point of view,since in the theory of weak convergence it plays the role of the attractor measurewhich the Gaussian distribution plays on the line. lt is in fact the naturalgeneralization of that distribution to function spaces.

27.7 Definition Wiener measure <is the p.m. on (C,fc) having these properties:

Page 463: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wcelk Convergence in a Function Space *3

(a) lP(x(0) = 0) = 1;

1 j&(b)<(x(f) < a4 =

:C2-:7 -x

for evel'y partition (/I,...,/kl of (0,11, the increments x(/1) - x(/o),x(/3) - x(/2),...,x(4) - x(4-I) are totally independent. u

2-.j /2/ (; j c j .

e d, < ,

Parts (a) and (b) of the definition give the marginal distributions of the coord-inate functions, while condition (c) fixes their joint distribution. Any finitecollection of process coordinates (x(/j),i = 1,...,k1 has the multivariate Gauss-ian distribution, with xtp - No,tj), and Extpxtj) = minl tj,tjit . Hence,x(/1) - x(/2) - N(0, l/1 - tzI), which agrees with the requirements of continuity.This f'ull specification of the finite-dimensional disibutions suffices to definea unique measure on (C,fc). This does not amount to proving that such a measureexists, but we shall show this below; see 27.15.

F may equally well be defined on the interval (0,:)for any b > 0, including b =

x, but the cases with b # 1 will not usually concern us here.A random element distributed according to lIJ is called a Wiener process or a

Brownian motion process. The latter tenn refers to the use of this p.m. as amathematical model of the random movements of pollen grains suspended in waterresulting from thermal agitation of the water molecules, first observed by the

27I tice thetermsWienerprocess andBrownianmotionbotanistRobertBrown. nprac ,

tend to be used synonymously. The symbol F conventionally stands for the p.m.,and we also follow convention in using the symbol B to denote a random elementf'rom the derived probability space (C,Oc,J!/). ln terms of the underlying proba-bility space (f1,F,#) on which we assume B1 8...y C to be a F/fc-measurablemapping, we have <(A'') = #(# e A) for each set E s Sc.

The continuous graph of arandorn element of Brownian motion, #4) for (t) e f1,is quite a remarkable object (see Fig. 27.5). It belongs to the lass ofgeometrical forms namedfractals (Mandelbrot 1983). These are curves possessingthe property of selfsimilarity, meaning essentially that their appearance isinvariant to scaling operations. It is straightforward to verify from the defini-tion that if B is a Brownian motion so is #*, where

#*( t4 = k-3I1(B(.,s+kt) -B(,s)) (27.31)for any s l (0,1)and k e (0,1 - j. Varying J and k can be thought of as

<zooming

in' on the portion of the process from s to s + k.The key property is the one contained in part (iii) of the definition, that of

independent increments. A little thought is required to see what this means. Inthe definition, the points /1,...,4 may be arbitrarily close together. Consider-ing a pair of points t and t + A, the increment B,t + A) - #4,/) is Gausianwith variance A, and independent of #(,/). Symmetry of the Gaussian densityimplies that

#4: (#(,/+ A) - Bk,lB,tl - #(,f - A)) < 0) =

Page 464: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

M4

for A f t S 1 - A and evety A > 0. This is compatible with continuity, butcompletely rules out smoothness; in any realization of the process, almost everypoint of the graph is a corner, and has rlotangent. This property is also apparentwhen we attempt to differentiate #(). Note from the definition that

The Functional Central Limit Theorem

B,t+h) - Bl,tl- N@, 1/).'

h (27.32)

The sequence of measures defined by letting h --> 0 in (27.32)is not unifonnlytight, and fails to converge to any limit. To be precise, the probability that thedifference quotients in (27.32)fall in any finite interval is zero, another wayof saying that the sample path x(/,) is non-differentiable at t, almost surely.

A way to think aboutBrownian motion which makes its relation to the problem ofweak convergence fairly explicit is as the limit of the sequence of pmial sums

-1n jj tof n independent standard Gaussian r.v.s, scaled by n . Note t a

j,.((0)= s1/2(s(o,.j/n) -B,(j- 1)/s)) - x(0,1) (27.33)

forj = 1,...,n and B,jlnq = n-1/2I7jj((,)).By taking n large enough, we canI=1

express B,tl in this form for any rational /, and by a.s. continuity of theprocess we may write

(nlj- 1/2B,t) = lim n (f((l))a.s.

n-yx f=1(27.34)

for any t e L0,11, where gnfldenotes the integer pal't of nt.Consider the expected sum of the absolute increments contributing to Bt).

1/2According to 9.8, l(./1has mean (2/a) and variance 1 - 2/a, and so by indepen-denceof the increments the r.v. An(f) = n-1/2:t''1 j(jj has mean ntjllnlll =

I = 1

mtvn) (say)and variance 1 - 2/1. Applying Chebyshev's inequality, we have thatfor t > 0,

PAn > ?n(f,n)) > #(IAn(/) - mtnn) I S qmtnnl)4(l - 2/a)

' @

2mt,n)(27.35)

1/2 vus means that theSince mt,n) = On ), An(/)-->

x a.s.gm for al1 t > 0.random element #4(t)) is a function of unbounded variation, almost surely. Sincelimn-/xnt/) is the total distance supposedly travelled by a Brownian particle asit traverses the interval from 0 to t, and this turns out to be intinite for t >

0, Brownian motion cannot be taken as a literal descdption of such things asparticles undergoing thenual agitation. Rather, it provides a simple limitingapproximation to actual behaviour when the increments are small.

StandardBrownian motionis merelytheleadingmemberof anextensivefamily ofa.s. continuous processes on r0,11having Gaussian characteristics. For example,if we multiply B by a constant c > 0, we obtain what is called a Brownian lpotion

with variance Gl. Adding the deterministic function g,f to the process defines aBrowninn mntion with drift 11.. Thllq. Yttj = aRttj + llr renreqentq n fnmilv of

Page 465: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wetzk Conveqence in a Function Space 05

rocesses having independent increments X(/) - X(.) - N'(g(/ -

.), o-2It - s I).PMore elaborate generalizations of Brownian motion include the following.

1+f5for-1

< j)< x. Xis a Brownian motion which has27.8 Example Let X(/) = Bt )been subjected to stretching and squeezing of the time domain. Like #, it is a.s.continuous with independent Gaussian increments. It can be thought of as the limitof a partial sum process whose increments have trending variance. Suppose (f())-

f$) which means the variances are tending to 0 if j) < 0, or to infinity if f'JNni ,

-1-ps(stnt1j')2

..0 f1+p and> 0. Then n ,=1 i ,

(nl1-(1+f5#2y-'qj ((t))

--.j s((o,/l+p;a.s. un i=1

(27.36)

27.9 Example Let Xt) = 0(f)#(/) where 0: g0,1j F-+ R is any continuous determin-istic function, and B is a Brownian motion. For s < t,

X - X(J) = QBt) -Bs)4 + (0(/)- 0@))#(J), (27.37)which means that the increments of this process, while Gaussian, are not inde-pendent. It can be thought of as the almost sure limit as n -- x of a doublepartial sum process,

(n i- 1

n-1/2-)Qifnlqi + (0(j/a)- 0((j - 1)/a))7)jy()),

f=1 j=1(27.38)

where (j - N(0, 1). u

27.10 Example Letting B denote standard Brownian motion on (0,x), define

-f5fs(I5r) c'y.g9)Xt) = e

for fixed f')> 0. This is a zero-mean Gaussian process, having dependent incrementslike 27.9. The remarkable feature of this process is that it is stationary, withXt) - No, 1) for al1 t > 0, and

jtzn1intf,yl-t-sj

-j'J jt-s I gy 44y))Exx) = e = e . ( .

This is the Onutein-uhlenbeck process. n

27.11 Example The Brownian bridge is the process B0 s C where

Bot) = #(/) - /#(1), t e (0,11. (27.41)This is a Brownian motion tied down at both ends, and has EB0t)B0s)) =

mint/,-) - ts. A natural way to think about B0 is as the limit of the partial sumsof a mean-deviation process, that is

'

En/'l 1 n

s'(?,(,))= lim ,,-1/27q (j()) - - 77t./(l a.s.nn--yx

=1 /=1(27.42)

where(/(t))- #(0,1). u

Page 466: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

06

We have asserted the existence of Wiener measure, but we have not so far offered

a proof. The consistency theorem (27.1)establishes the existence of a measure onC,Bc4 whose finite-dimensional distributions satisfy conditions (a)-(c)of 27.7,

so we might attempt to construct a continuous process having these properties.Conjider

The Functional Central Limit Theorem

(lfl

(' t,lk = n-1/2 y-')jj((o)+(nf - gnfjljsj+jt)) ,

fzz1(27.43)

where (j - 140,1) and the set ((1,...,(nl are totally independent. For given fa),

1%4.,) is a piecewise linear function of the type sketched in Fig. 27.2, althoughwith Fn40,) = 0 (the(j represent the vertical distances from one vertex to thenext), and is an element of C. L(/) is: Gaussian with mean 0, and

L.

2 = s-ltjrjfj + nt - ggfjlz)Evn)

= /+ n-1((nf1 - nt + nt - (?1/q)2)

t + Knnln, (27.44)-1

(say) where 0 < Kln,t4 < 2. Moreover, the Gaussian pair Fn(/) and L(/ + s + n )-1 0 independent. Extrapolating the same argument to general- l'n4/+ n ), s > , are

collections of non-overlapping increments, it becomes clear Fn4/)-P-> N(0,f), andmore generally that if l',j -P-y F, then i' is a stochastic process whose finite-dimensional distributions match those of W. Fig. 27.5, which plots the pmialsums of around 8000 (computer-generated)independent random numbers, showsthe typical appearance of a realization of the process approaching the limit.

Vhs%hxN.k)'C&'*/'O'1. 0 if

%$?10 . 5

0

1.3

0

Fig. 27.5

This argument does not show that the measure on (C,Sc) corresponding to Factually is <. There are attributes of the sample paths of the process which arenot specified by the finite dimensional distribtions. According to the continuousmapping theorem, Fn-P->Wwould imply that h(Yn)-P--/l(F) for any a.s. continuousfunction h. For example, suprl l'n(f)I is such a function, and there are no groundsG-nm tlazcx ov-cy,,mgx,atc onnelaoroa ohnvlx.fnrq,Innnqine thnt KI1n.IF-f/) l

.-p-ysuo/l

Jvtf) I.

Page 467: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

M7

However, if we are able to show that the sequence of measures corresponding tol's converges to a unique limit, this can only be W',since the finite-dimensionalcylinder sets of C are a determining class for distributions on (C,Sc). This iswhat we were able to conclude from 27.6, in view of 27.1. This question is takenup in the next section, and the proof of existence will eventually emerge as acorollary to the main weak convergence result in 527.6.

lpetzk Convergence in a Function Space

27.5 sveakConvergence on C

Let tp,nlbe a sequence of probability measures in Mc. For example, consider thedistributions associated with a sequence like (Yn, n e IN) , whose elements aredefined in (27.43).According to 26.22, the necessal'y and sufficient condition forthe family (g,a)to be compact, and hence to possess (by5.10) a cluster point inNc, is thay it is unifonnly tight. Theorem 27.5 provides us with the relevant

compactness criteria. The message of the following theorem is that the uniformtightness of measures on C is equivalent to boundedness at the origin andcontinuity arising with sufticiently high probability, in the limit. Sincetightness is the concentration of the mass of the distribution in a compact set,this is just a stochastic version of the Arzel-Ascoli theorem.

27.12 Theorem (Billingsley 1968: th. 8.2) tg.,,lis uniformly tight iff there existsN c INsuch that, for all n > 0 and for a11n 2 N,

(a) there exists M < (x, such that

>n(tx:Ix(0)I> Afl) K n; (27.45)(b) for each : > 0, there exists q (0,1)such that

>n(lx:wx(8) 2 El) f n. n (27.46)

Condition (b)is a form of stochastic equicontinuity (comparej21.3). lt is easierto appreciate the connection with the notions of equicontinuity defined in 95.5 ifwe write it in the fol'm #(w(.L,8)k :) < q, where (.L) is the sequence of stochas-tic f'unctions on g0,lq having derived measures gw.Asymptotic equicontinuity issufficient in this application, and the conditions need hold only over n N, forsome finite N. Since C is a separable complete space, each individual member oftgwl is tight, and for unifonn tightness it suffices to show that the conditionshold <in the tail' .

Proof of 27.12 To prove the necessity, 1et lgzl be unifonnly tight, and for n > 0choose a compact set K with g,u(A')> 1 - q. By 27.5, there exist M < cxo and 8 G

(0,1) such that

K (x: Ix(0)I f Ml f'n tx:wx() < irl (27.47)fo- any E > 0. Applying the De Morgan law,

n k >n(A5) 2 gntfx: lA(0)I> A4lkp tx:>px() :l)

Page 468: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

08 The Functional Central Limit Theorem

maxfg,,,tfx: 1x(0)l > M34, p,atf.x:wa.() k :)) ) . (27.48)

Hence (27.45)and (27.46)hold, for al1 n e 5.Write /t*(.) as shorthand for supnkxgwt.). To prove sufficiency, consider for k

= 1,2,... the sets

Ak = (x: wxt8k) < 1//c), (27.49)* .4

> 1 - 0/2+1 for 0 > 0. This iswhere (6k) is a sequence chosen so that p, ( k),

possible by condition (b).Also set B = (x: 1x(0)l S Ml , where M is ehosen so thatg.*(#)> 1 - 0/2, which is possible by condition (a).Then define a closed set K =

(O7=1Ak67 #)-, and note that conditions (27.16)and (27.17)hold for the case A =

K. Hence by 27.5, K is compact. Butfr

*Kc4 S p,* UAkJlc

k=1

G)

; Xp.*(A) + g.*(#c)1=1

<2eX1/2+2+ O/2 = 0.&1

(27.50)

This last inequality is to be read as suprlkxgwtrf) S 0, or equivalentlyinfp,lrxjtutAM> 1 - 0. Since 0 is arbitrary, and every individual gw is tight by26.19, in particular for 1 S n < N, it follows that the sequence lg,,,lis uniformlytight. .

The following lemma is a companion to the last result, supplying in conjunctionwith it a relatively primitive sufficient condition for uniform tightness.

27.13 Lemma (adaptedfrom Billingsley 1968: th 8.3) Suppose that, for some e(0,1),

sup g. a7: sup Ix(.)-.7(/)

I : < q. (27.51)0S/K1- l<x<f+

Then (27.46)holds.

Proof Fixing 8, consider the partition (,...,fr) of (0,11,for r = 1 + g1/6j,where ti = in for i = 1,...,r- 1 and fr = 1. Thus, for zl< 5 < 1 we have r = 2and the pmition (, 1), for j < S r1we have r = 2 and the partition (8, 28,1), and so on. The width of these intervals is at most 6. A given interval g/,/'Jwith I/' - /1 S must either 1ie within an interval of the partition, or at mostoverlap two adjoining intervals', it cannot span three or more. In the event thatlx(f') - x(f) I E, x must change absolutely by at Ieast r1sin at least one of theintervalts) overlapping g/,f'1,and the probability of the latter event is atleast that of the former. In other words, considering all such intervals,

Page 469: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wetzk Convergence in a Function Space 09

r

>n(lx:wx() sl) yn LJx:f=1

sup lx(s') - x(ll I k z1:s,s'

I(/'4-1,n1

r

< 7'lg,u x.. sup l.x(.')

- .x(J) l k ,1:

i.,..,,1.b.'e

gfj-j,fjj

< l?n) S 11, (27.52)

where the third of these inequalities applies (27.51),and the final one followsbecause rb < 2. .

These results provoke a technical query over measurability. In j21.1 weindicated diftkulties with standard measure theory in showing that functions suehas supruyufu Ix@)- xt) I in (27.51), and wz(8) in (27.46), are randomvariables. However, it is possible to show that sets such as the one in (27.51)are T-analytic, and hence nearly measurable. ln other words, complacency aboutthis issue can be justified. The same qualification can be taken as implicitwherever such sets alise below.

27.6 The Functional Central Limit Theorem

Let S t) = 0 and Snj = Z7m1Uni for j = 1,...,n, where (Unil is a zero-meanN

2 1 As in the previous applicationsstochastic array, normalized so that Esnn) =.

of an'ay notation, in Part V and elsewhere, the leading example is Uni = Uilsn,here (Ui3 is a zero-mean sequence and = E(XjUi)l. Define an element L ofW

Cg(),lj,somewhat as in (27.43)above, as follows'.

F,(/) = sn,lnrj + nt - r-?1f1-)U,:,(,,f)+1,

= Sngj- + nt-j

+ lunj for Q'- 1)/n f t < g'/n, j = 1,...,?z;

Fn(1) = Snn.

This is the type of process sketched in Fig. 27.2. The question of whether thedistribution of F,ypossesses @weak limit as n

-)

oo is the one we now address.The interpolation tenns in Fa(/) are necessary to generate a continuous func-

tion, but from an algebraic point of view they are a nuisance', dropping them, weobtain

(27.53)

(27.54)

Xn = Sn,fntj= Snj-j for U - 1)/n K t < /n, j = 1,...,n, (27.55)A%(1)= Snn. (27.56)

lf conditions of the type discussed in Chapters 23 and 24 are imposed on fUni),.L(1) -P-> N(0, 1) as n

--+

x. lf for example Ui - i.i.d.(0,c2), so that Uni = Uilsnl l his is just the Lindeberg-Levy theorem. However, the Lindeberg-where sn = nn , t

Levy theorem yields additional conclusions which are less often remarked', it iseasv to verifv that. for each distinct oair /1 ,tz

e r0,11.

Page 470: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

450 The Functional Central Limit Theorem

Xn(f2)-Xn(/1) -P- N, Itz - fl I). (27.57)Since non-overlapping partial sums of independent variates are independent, wetind for example that, for any 0 < /1 < tz < f3 f 1, Xnlt - .L(/1) and Xnt) -

Xnh) converge to a pair of independent Gaussian variates with variances h - /1

and t? - t1, so that their sum Xnt) - Xnltj) is asymptotically Gaussian withvariance t - /l, as required. Under our assumptions,

IFn(?)-.%(/)1

= nt - Ln/1)I&n,(n,1+1I .T.T-->0, (27.58)so that L(/) and Xn(f) have the same asymptotic distribution. Since F,;(0) = 0,the finite-dimensional distributions of L converge to those of a Brownian motionprocess as n ---) x.

As noted in j27.4, this is not a sufficient condition for the convergence of thep.m.s of Fn to Wiener measure. But with the aid of 27.12 we can prove that (J%Jisuniformly tight, and hence that the sequence has at least one cluster point in Mc.Since a11such points must have the nite-dimensional distributions of W, and thefinite-dimensional cylinders are a determining class for (C,Oc), Wrmust be theweak limit of the sequence. This convergence will be expressed either by writing

= <, or, more commonly in what follows, by F,, -P-4 B.gwThis type of result is called a functional central limit theorem (FCLT),

although the term invariance principle is also used. The original FCLT for i.i.d.increments (the generalization of the Lindeberg-Levy theorem) is known asDonsker's theorem (Donsker 1951). Using the results of previous chapters, inparticular 24.3, we shall generalize the theorem to the case of a heterogeneouslydistributed martingale difference, although the basic idea is the same.

27.14 Theorem Let L be defined by (27.53)and (27.54),where (Uninsnil is aingale difference array with varianc array lczujj,and Z')=lc2nj= 1. Ifmal4

n

(a) 77Uli -'.2:

1,f=1

(b) max IUnif .-J'-'..y 0,1< i < n

(n/) 21im nni = t, for al1 t e i)0,11,n-cn f=1

then L -P-> B. n

Conditions (a)and (b)reproduce the corresponding conditions of 24.3, and theirrole is to establish the finite-dimensional distributions of the process, via theconventional CLT. Condition (c)'is a global stationarity condition (see j13.2)which has no counterpart in the CLT conditions of Chapter 24. Its effect is torule out cases such as 24.10 and 24.11. By simple subtraction, the condition issuftkient for

Page 471: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wkck Convergence in a Function Space 451

gn-l2lim Gni = s - /,

n--jx f=(nfJ+1(27.59)

Proof of 27.14 Conditions 24.3(a) and 24.3*) are satisfied, on writing Uni forXnt. ln view of the last remarks, the finite-dimensional distributions of Lconverge to those of W', and it remains to prove that (Ll is uniformly tight(i.e., that the sequence of p.m.s of the L is unifonnly tight).

2 b+) 0.2 .Define, for positive integers k and m with k + m < n, snkm =jzu +1 ni

l h S - S = /+J' U The maximal inequality forEsn,kwm- Sn , w ere n,k-vj nk r= +1 ni.

martingales in 15.14 implies that, for . > 0,

E l 'n,k-vm- snkIP

P max lsnvk-q- snkI > hhkm K .

ju'mtL-nt,,;)P

(27.60)

In particular, set k = grl/land m = grl8jfor fixed 8 e (0,1)and t e g0,1 - 8J sothat m increases with n, and then we may say that sn,kwm- Snlsnkm -C-> Z -

N(0,1), by 24.3. For given positive numbers q and :, choose satisfying

3 2 61), > maxf :/8, 256F1 Zl /q: 1, (27.2/6412< 1 There must exist Nt k 1 for which, with n kand consider the case 6 = ir .

No, the Gaussian approximation is sufficiently close that

3 1FIsn,k..m- snkI 'q:

.t ,< = 411 .

3 :56:2hsnkmt(27.62)

2 8 For the choice of indicated thereAlso observe from (27.59)that limn-jxskzm=.

exists Nj k 1 such that knkm< / for n k Nj, and hence, combining (27.60)with p= 3 with (27.62),such that

# max ISn,k..j- SnkI k 14: < 14q6,

j<j<m(27.63)

for n k maxf No,Nj J.Now,

F,l(-) - Fn(/) = 5'n,(,,,1- s'rl,rnrj+ Rns,

for s > /, from (27.53)and (27.54),where

Rnsn = ns - (l-1)t&,(n,j+1- nt - (rlf))Un,g<+1. (27.65)For t e g0,1 - 1, there exists s' G g/,t + j such that

IFn@') - Fn(/)I = sup IF,,(-) - 1's(/)I. (27.66)tss f;r+

There nlqn eviqtt n lnry- d'xnfal, ala /eox, ,q > Ar-' f'G'C.: fnr wanx, c'mmz-la +eanz4 o' Fw-+R -'-'

(27.64)

Page 472: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

452

(n.'J S (Z/JJ+ rzlland hence

I5k,En.s-)-

-$%,(n,1

1 :f lnax lSn-kntj-q- 5',,,,,ryl.

li-/s(n,l

It follows (alsoinvoking the triangle inequality) that for n k N1,

The Functional Central Limit Theorem

(27.67)

lFn(-') - 1'n(/)I = I5'?,,(n.,'q- sk-lnrj+ Rnls'n lf lnax I5',,,(,,,q+./- 5k,(,,,)f + lRns'n I.

l<./<(ny(27.68)

By condition (b)of the theorem, #(sups,rl#n(.,/) l k ) --> 0 as n--->

x, and hencethereexists Ns 1 1 such that, for n 2 N5,

P IRnst) l k ) f 14q. (27.69)Inequalities (27.69)and (27.63)jointly imply that, for al1 t e g0,1 - 51 and n

N* = max fNo,Nj,N1,N? 1,

#(lFn(.'l - 1'n(/)I k .z1s)

S # max Isn,nv..j-

,s',,-rnrp I+ IRns', l k A1E,

lsvsnaq

S P max 15'n,(nr1+./- s'rl,rnrdI k 14: t.ptRns'n I 2l4:)

lvgnl

f # max lvb-n,nv-vj - 5',,,(n,1l l4s + #( lRns'n l 14:)

ls./lnalu ln.

7 (27.70)

The conclusion may be written as

sup # sup 1L(J) - F,,(f)I k ;lE < n6,n 2 N*.0<r<1- lSJ<r+

Note that (27.51)is identical with (27.71)for the case gwtdl = PYn q 4), andthat q and e are azbitrary. Therefore, uniform tightness of the correspondingsequence of measures follows by 27.12 and 27.13. This completes the proof. .

We conclude this section with the result promised in j27.4:27.15 Corollary Wiener measure exists. Ia

(27.71)

The existence is actually proved in 27.14, since we derived a unique limitingdistribution which satisfied the specifications of 27.7. The points which areconveniently highlighted by a separate statement are that the tightness argumentdeveloped to prove 27.14 holds independently of the existence of W'as such, andthat the central limit theorem plays no role in the proof of existence.

Page 473: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wktzk Convergence in a Function Space 453

Proof Consider the process i'n of (27.41).We shall show that, on putting Uni =

-1/2 ditions 27.14(a), (b), and (c) are satisfied. It will follow by then l' COn

reasoning of 27.12 that the associated sequence of measures is uniformly tight,and possesses a limit. This limit has been shown above by direct calculation tohave the finite-dimensional distributions specified by 27.7, which will concludethe proof.

Condition 27.14(c) holds by construction. Condition 27.12(a)) follows from anapplication of the weak law of large numbers (e.g.Khinchine's theorem), recallinghat since (j is Gaussian ((?,) is an independent sequence possessing a11 itst

moments. Finally, condition 27.14(b) holds by 23.16 if the collection ((1,...,(z)satisfy the Lindeberg condition, which is obvious given their Gaussianity and23.10. x

27.7 The Multivariate CaseWe would like to extend these results to vector-valued processes, and there is nodifficulty in extending the approach of j25.3. Define the space Cgqljzn,which wewrite as cn' for brevity, as the space of continuous vector functions

x = (x1,...,xk)':L0,11= F-.> R''',

where (0,11=and R*' are the Cartesian products of m copies of g0,1qand R respect-ively. Cmis itself the product of m copies of C. It can be endowed with a metricsuch as

dx,y) c.r max Lduxjtyjl, (27.72)ljm

whichinduces theproduct topology, andcoordinateprojections remain continuous.Since C is separable, Cm is also separable by 6.16, and frt = fc (8)fc@ ... ())Oc(thec-field generated by the open rectangles of (0,11=) is the Borel tield of Cm

by pz-fold iteration of 26.5. cm,B2!)is therefore a measurable space.Let

J,t2 :zi (7:-,1tkB) c Cm1B e Bmk f1,...,/1 e (0,1q,k e (NJ, (27.73)1 ,..., '

denote the finite-dimensionl sets of Cm. Again thanks to the product topology,R is the field generated from the sets in the product of m copies of Rc.

27.16 Theorem R is a determining class for (Cm, fp).

Proof An open sphere in B is a set

Sx, = (ye Cm'. dmux,y) < a) ,

= y e Cm'. max sup Iyj-xj I < a .

tsjsm t(27.74)

The set

Page 474: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

454 The Functional Central Limit Theorem

m.-k

z-kj)j < (xHk#,al = y e C . max max Iyjl i) - xj ,

3% S m 1< j < lk(27.75)

is an element J'fp. lt follows by the argument of 27.6 thatOX

sx, = UOHkx, r- 1/n) e Pmc= c(1t'J),n=1 &1

(27.76)

and hence, that Brtc J7T.Since 347is a tield, the result follows by the extensiontheorem. w

lt is also straightforward to show that B = T11,by a similar generalization f'rom27.6, but the above is a1l that is required for the present purpose.

A leading example of a measure on (C''',f;) is W'&,the p.m. of m-dimensionalstandard Brownian motion. A m-vector B distributed according to <'& has as itselements m mutually independent Brownian motions, such that

Bt) - 140, tlm), (27.77)where Im is the m x m identity matrix, and the process has independent incrementswith

EBs)-#(f))(#(.)

- #(f))') = (.:- lm (27.78)for 0 S t < s S 1. The following general result can now be proved.

27.17 Theorem Let funi,snij be a m-vector mmingale difference array withvariance matrix array (Esj), such that Z:=1Enj = Im. Then let

Fn(f) = Sni-j + nt-j

+ lunj for (j - 1)/rl K t <.j/n,

(27.79)for j = 1,...,H, and Fn(1) = Snn, where Sno = 0 and

j

r = t/ i, g- = 1,...,/1.

fazl

lf

(27.80)

n

U U '.-.--y''

r Ini ni #l,

f=1

(b) max Un'iuni--->rr

0,lifrl

grllJlim 7)E i = tlm, for al1 t e g0,1J,(c) nn-->co

=1

D #then Fn --

.

Proof Consider for an v-vector A of unit length the scalar prpcess A'L, havingincrements A'foj. By definition, fA'Uni,snil is a scalar martingale differencearray with variance sequence A'EnfA. lt is easily verified that all the conditions

Page 475: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gtzk Convergence in a Function Space 455

of 27.14 are satisfied, and so 'L -P-> #. This holds for any choice of . Inmicular, 'L(f) -P-> N(0,/), with similar conclusions regarding all the finite-P

dimensipnal distributions of the process.It follows by the Cramr-Wold theorem that

Fn(f) '-P-')X(0, tl), (27.81)with similar conclusions regarding al1 the tinite-dimensionaldistributions of theprocess', these are identical to the finite-dimensional distributions of W''''. SinceR is a detennining class for Cm,Bm),any weak limit of the p.m.s of (F,,) canonly be Wr'''.It remains to show that these p.m.s are uniformly tight. But this istrue provided the marginal p.m.s of the process are uniformly tight, by 26.23.Picking A to be the ythcolumn of Im for.j = 1,...,- and applying the argument of27.14 shows that this condition holds, and completes the proof. .

The arguments of j25.3 can be extended to covert 27.17 into provide anunusually powerful limit result. The conditions of the theorem are easilygeneralized, by replacing 27.17(c) by

EnfllimXEnj =

,

n--yx f=1

where E is an arbitrary variance matrix. Defining L- such that L-SL-' = Im as in25.6, 27.17 holds for the transformed vector process Zn = L-Yn. The limit of tl:eprocess Yn itself can then be detennined by applying the continuous mappingtheorem. This is a linear combination of independent Brownian motions, the t'inite-dimensional distributions of which are jointly Gaussian by 11.13. We call it anm-dimensional correlated Brownian motion, having covariance matrix E, anddenoted BX). The result is written in the fonu

L -P-> #(E). (27.82)An invariance principle can be used in this way to convert propositions aboutdependencebetween stochasticprocesses convergingtoBrownianmotionintomoretractable rsults about correlation in large samples. Given an arbitrarily relatedset of such processes, there always exist linear combinations of the set which are

f another.z8asymptotically independent o one

Page 476: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

28Cadlag Functions

28. 1 The Space DThe proof of the FCLT in the last chapter was made more complicated by the

presence of terms necessary to ensure that the random functions under considera-tion lay in the space C. Since these terms were shown to be asymptotically negli-gible, it might reasonably be asked whether they are needed. Why not, in otherwords, work directly with Xn of (27.55)and (27.56),instead of Fn of (27.53)and(27.54)? Fig. 28.1 shows (apartfrom the omission of the point aL(1),to beexplained below) the graph of the process Xn corresponding to the Fn sketched inFig. 27.2. Xn as shown does not lie in Cm,lj but it does 1ie in Dp,lj, the spaceof cadlag functions on the unit interval (see5.27), of which Cgtlj is a subset.Henceforth, we will write D to mean Dgt),1ywhen there is no risk of confusion withother usages.

Fig. 28.1

As shown in 26.3, D is not a separable space under the uniform metric, whichmeans that the convergence theory of Chapter 26 will not apply to D,dv). dv isnot the only metric that can be defined on D, and it is worth investigatingalternatives because, once the theory can be shown to work on D in the same kindpf way that it does on C, a great simplitkation is achieved.

Abandoning dv is not the only way of overcoming measurability problems.Another approach is simply to agree to exclude the pathological cases from thefield of events under consideration. This can be achieved by working with thec-field To, the restriction to D of the projection c-field (see j27. 1). In con-trast with the case of C, Po c fo (compare27.6) and a1lthe awkward cases such asuncountable discrete subsets are excluded from To, while all the ones likely toarise in our theory (whichexclusively concerns convergence to limit points lying

Page 477: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 457

in C4 are included. Studying measures on the space D,dv),Po) is an interestingline of attack, proposed originally by Dudley (1966,1967) and described in detailin the book by Pollard (1984).

While this approach represents a large potential simplification (muchof thepresent chapter could be dispensed with), an early decision has to be made aboutwhich line to adopt; there is little overlap between this theory and the methodspioneered by Skorokhod (1956,1957), Prokhorov (1956),and Billingsley (1968),which involves metrizing D as a separable complete space. Although the technicaloverheads of the latter approach are greater, it has the advantage that, once theinvestment is made, the probabilistic environment is familiar; at whatever remove,one is still working in an analogue of Euclidean space for which all sorts ofuseful topological and metric properties are known to hold. There is scope fordebate on the relative merits of the two approaches, but we follow the majority ofsubsequent authors who take their cue from Billingsley's work.

The possibility of metlizing D as a separable space depends crucially on thefact that in D the permitted departures from continuity are of a relativelylimitd kind. The only ones possible are jump discontinuities (also calledKdiscontinuities of the first k.ind'l: points t at which Ix(/) - x(/-) I > 0. Thereis no possibility of isolated discontinuity points / at which both 1x(/)- x(f-) 1and Ix(/) - x(/+) I are positive, because that would contradict right-continuity.There is however the possibility that x(1) is isolated; it will be necessary todiscard this point, and 1et x(1) = x(1-). This is a little unfortunate, but sincewe shall be studying convergence to a limit lying in C%,1j(e.g.,B), it will notchange anything material. We adopt the following definition.

28.1 Definition Dgqlj is the space of functions satisfying the following condi-tions:

(a) x(/+) exists for t e (0,1),.(b) x(/-) exists for t e (0,11.,(c) x(/) = x(/+), t < 1, and x(1) = x(1-). n

The t'irsttheorem shows how, under these conditions, the maximum number ofjumps is limited.

28.2 Theorem There exists, for all .x

e D and every E > 0, a tinite partition(/1,...,/r) of g0,11 with the property

sup 1x(/)-xs)

I < : (28.1)s,t gf.j-lyffl

for each i ::::: 1, ..

.,r.

Proof This is by showing that tr = 1 for a collection (,...,/rl satisfying(28.1), with to = 0. For given x and E 1et .1

= supt frl, the supremum being takenover a11 these collections. Since x(/-) exists for all t > 0, .1 belongs to theset; that is, there exists r such that = /r.

Suppose tr < 1, and consider the point /r+ 6 f 1, for some 8 > 0. By definitionof /r, Ixtr- ) - x(/r-l) I k s. Hence consider the interval Ltr,tr+ 6). By

Page 478: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

458

choice of we can ensure by right continuity that 1.x(/r+ ) - xtr) l < e. Hencethere exists an (r + ll-fold collection satisfying the conditions of the theorem.We must have ': k /r+1 = fr + 8, and the assertion that tr =

': is contradicted. Itfollows that tr = 1. w

The Functional Central Limit Theorem

This elementary but slightly startling result shows that the number of jump pointsat which Ix(f)

-x(/-)

I exceeds any given positive nupber are at most finite. Thenumber of jumps such that Ix(/) -

.x(f-)

I > 1In is finite for every n, and theentire set of discontinuities is a countable union of tinite sets, hence count-able. Further, we see that

sup lx(/) I < x, (28.2)f

since for any t e g0,11,x(/) is expressible according to (28.1)as a finite sumof finite increments.

The modulus of continuity wx(8) in (27.14)provides a means of discriminatingbetween functions in Cp,1J and functions outside the space. For just the samereasons, it is helpful to have a means of discriminating between cadlag functionsand those with arbitrary discontinuities. For 8 G (0,1), let H denote a partition(/1,...,/,) with r K g1/q and minjf ti - -1 ) > 8, and then define

wx'() = inf max sup lx(/) - x(.) I .

r% 1< i < r s,t e (lj-1,rj)(28.3)

Let's attempt to say this in English! wx'(8) is the smallest value, over allpartitions of g0,1q coarser than 8, of the largest change in x within an intervalof the partition. This notion differs from, and weakens, that of wx(8), in thatwx'(6)can be slnall even if the points ti are jump points such that wx(8) would belarge. For < 1zthere is always a partition H in which ti - fj-l < 26 for somei, so that for any x e D,

wx'(8)S wx(28) (28.4)for 8 < 1z. So obviously, lim-owx't8) = 0 for any

.x

e C. On the other hand,

1imwx'(:) = 0 (28.5)-+0

is a property which holds for elements of D, but not for more general functions.

28.3 Theorem If and only if x e D, 3 such that wx'(8) < e, for any 6: > 0.

Proof Sufficiency is immediate from 28.2. Necessity follows from the fact that if

x B D there is a point other than 1 at which x is not right-continuou' s; in qtherwords, a point t at which I.x(/)- x(/+) I 6: for some 6: > 0. Choose arbitraryand consider (28.3).If t # ti for any i, then wx'(8) 2 : by definition. But even,if t = ti for some i, ti e (n,n+1)and Ix(n)- xtt-.) l k e, and again wx'() kE. >

Page 479: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 459

28.2 Metrizing D

Recall the difficulty presented by the existence of uncountable discrete sets in(o,#g), such as the sets of functions

0, 0 f t < 0Ab(f) = (28.6)

1, 0 < t < 1,

the case of (5.43)with a = 0 and b = 1. We need a topology in which Ab and .b'

are regarded as close when I0 - 0/I is small. Skorokhod (1956)devised a metricwith this property.

Let A denote the collection of al1 homeomorphisms 1: (0,1q F-> (0,11with :(0) =

0 and (1) = 1; think of these as the set of increasing graphs connecting theopposite corners of the unit square (seeFig. 28.2). The Skorbkhod J1 metric isdetined as

dslx,y) = inf : > 0: sujp I14/) - /1). e A

f :, suyp Ix(f)-y(l(f))

I s ej. (28.7)

ln his 1956 paper Skorokhod proposes four metrics, denoted J1, J2, M1, and M2.We shall not be concerned with the others, and will refer to ds as is customary,as

tthe' Skorokhod metric.

t

Fig. 28.2

It is easy to verify that ds is a metric, if you note that supf I1,(/)- t I =

-1 d suprla/)-y(l(f))

l = supfIx(-1(/))-y(/)

I, where l-1 e AsupfIf -

. (/)I anif , e A. While in the uniform metric two f'unctions are close only if theirvertical separation is unifonnly small, the Skorokhod metric also takes intoaccount the possibility that the horizontal separation is small. If .x is uniformlyclose to y except that it jumps slightly before or slightly after y, the functionswould be considered close as measured by ds, if not by du.

Consider ab in (28.6),and another element aw. The unifonn distance beiweenthese elements is 1, as noted above. To calculate the Skorokhod distance, notethat the quantity in braces in (28.7)will be 1 for any , for which /,40) y: 0 + .

Confining consideration to the subclass of A with V0) = 0 + 8, choose a case

Page 480: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

460

where I (/) - JI < 6 (forexample, the graph (t, :,4/)1,obtained by joining thethree points (0,0), (0,0+ 6), and (1,1) with straight lines, will fulfil thedefinition) and hence,

#XAb,Ab+) = 8. (28.8)

The Functional Central Limit Theorem

This distance approaches zero smoothly as 8 10, whigh might conform betterto ourintuitive idea of <proximity' than the uniform meiric in these circumstances.

28.4 Theorem On C, ds, and du are equivalent metrics.

Proof Obviously dsxny) K duxnyjs since the latter corresponds to the case wherel is the identity function in (28.7).On the other hand, for any :,

dvxny,< sup I-x(/)-y(,(/))

l + sup ly(l(r))-y(?)

I. (28.9)t t

Suppose y is uniformly continuous. For every e:> 0 there must exist 6 > 0 suchthat, if dy(x,y) < (and hence supl (?)- /1 < ), then sup Iy(,(r))- y(/) I <

E. ln other words,

dslx,y) < 6 = duxny) < + :. (28.10)

The criteria of (5.5) and (5.6) are therefore satisfied. Uniform continuity isequivalent to continuity on g0,1J, and so the stated inequalities hold for a1l

e c ..J' .

The following result explains our interest in the Skorokhod metric.

28.5 Theorem (D,ds4 is separable.

Proof As usual, this is shown by exhibiting a countable dense subset. The counter-pal't in D of the piecewise linear function defined for C is the piecewise constantfunction (as in Fig. 28. 1) defined as

y = yti), t e ti, /f+1) i = 0,...,- - 1, (28.11)where the y(/j) are specified real numbers. For some n e N, define the set A,, asthe countable collection of the piecewise constant functions of form (28.11),withn = iI2n for i = 0,...,2* - 1, and y(/j) assuming rational values for each i.Letting A denote the limit of the sequence fAn), :4 is a set of functions takingrational values At a set of points indexed on the dyadic rationals D, and hence iscountable by 1.5.

According to 28.2, there exists for .v

e D a finite partition (/1,...,&J ofg0,1J,such that, for each i,

sup I-x(.l-

.x(f)

I < E.J,/ 6 (n-I,rJ

Lety be a piecewise constant function constructed on the same intervals, assumingrational values y1,...,ym where yj differs by no more than 6: from a value assumedby x on gn,n+1).Then, dslxny) < 2E. Now, given n 1, choose z e An such that

zj = yf when jlln e g/f,/f+l).Since D is dense in g0,1J,dsy.-..h

0 as n -- x.

Page 481: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 46l

Hence, dsjx,zl < Jy(x,y) + dsyvz) is tending to a value not exceeding 2s. Sinceby taking m large enough : can be made as small as desired, x is a closure pointof a4. And since x was arbitrary, we have shown that A is dense in D. w

Notice how this argument would fail under the unifonn metric in the cases where xhas discontinuities at one or more of the points ti. Then, duy,z) will be smallonly if the two sets of intervals overlap precisely, such that ti = jlln for somej. lf ff were irrational, this would not occur for any finite m, since jlln isrational. Under these circumstances a7would fail to be a closure point of A. Thisshows why we need the Skorokhod topology (thatis, the topology induced by theSkorokhod metric) to ensure separability.

Working with ds will none the less complicate matters somewhat. For one thing,ds does not generate the Tychonoff topology, and the coordinate projections arenot in general continuous mappings. The fact that x and y are close in theSkorokhod metric does not imply that x(/) is close to y(/) for evel'y t, theexamples of ab and x(j+ cited above being a case in point. We must therefore findalternative ways of showing that the projections are measurable.

Fig. 28.3

And there is another serious problem: D,ds) is not complete. This is easilyseen by considering the sequence of elements lxn) where

1, t e ( , + 1u)xn(f) = (28.12)

0, otherwise

(seeFig. 28.3). The limit of this sequence is a function having an isolated pointof discontinuity at , and hence is not in D. However, to calculate dsxnvxm) ,

mustbe chosen so that ,Ql)= , and ( +) = r1+ j; the distance is 1 for anyother choice. The piecewise-linear graph with vertices at (0,0), (,), ( +j

la+)), and (1,1) fulfils the definition, and satisfies (28.7). lt appears thatdsxn,xmt= I -

,11, and so fxnl is a Cauchy sequence.

28.3 Billingsley's Metric

The solution to this problem is to devise a metric that is equivalent to ds (inthe sense of generating the same topology, and hence a separable space) but in

Page 482: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

462

which sequences such as the one in (28.12) are not Cauchy sequences. Ingeniousalternatives have been suggested by different authors. The following is due toBillingsley (1968),from which source the results of this section and the next areadapted.

Let A be the collection of homeomorphisms from (0,11to (0,11with (0) = 0and (1) = 1, and satisfying

The Functional Central Limit Theorem

l(f)- l(x)IIII = sup logt.s (28.13)

l#J

Here, lIII:A F-> R+is afunctional measuungthemaximumdeviationof thegradientof from 1, so that in particular 11111= 0 for the case (4 = t. The set A islike the one defined for the Skorokhod metric with the added proviso that IIIlbe

-1 b strictly increasing functions. Then definefinite; both , and l must e

dpx,y)= inf s > 0: IIlI < s, su,p lxtfl- y(l(f)) I < ej.lGA

(28.14)

We review the essential properties of #s.

28.6 Theorem Js is a metric.

Propf Jstx,yl = 0 iff x = y is immediate. dsxny) = dpy,x) is also easy once itis noted that 111-111= IIII.To show the triangle inequality, note that

1ltk)- 1(.) l2(J) - 2(.)

II111+ 111211k sup logt. x

+ logf ..s

t # '

(X1(/)- 1(,))(l2(f) -

,2(5'))

k sup 1ogz

,.& t -

.)

(1(f)- I(,))(l2(J') - 2@'))

k sup 1og , ,

t - t - s )t#s

(28.15)

for arbitrary t' and J'. On setting f' = 11(/) and s' = 2(.), we obtain

111111+ 111211k II1ol2Il, (28.16)wher 1,10:2(/) = ,1(L4/)),and lolz is clearly an element of A. Since

supIx(fl- z(2(l1(J))) I < sup Ix(fl-y(1(/))

l + sup 1y(f)-z(2(/))

l (28.17)t t t

by the ordinal'y triangle inequality for points of R, the condition dsxa)Jy@,y) + dsynz) follows from the definition. w

Next we explore the relationship between ds and #s, and verify that they areequivalent metrics. lnequalities going in both directions can be derived providedthe distances are sufficientlv small. Given functions .x and v for which tlotr'vj =

Page 483: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 463

: < z1,consider , e A satisfying the definition of Js for this pair, such that, inparticular,11111< z. Since 1(0) = 0, there evidently must exist t e (0,11such

that Ilog(l(/)//) I < 11111, or

-.: 14/) i;e < < e .

t(28.18)

Using the series expansion of eC,we find ek - 1 S 2: for : f r1,and e-C

- 1 k-2:

similarly, which implies that

(28.19)

or, 11(/) - /1 < 2e. And in view of our assumption about , suprlxt/) - y((?)) l: and hence dsxny) cannot exceed 2:. In other words,

dsx.yj f ldBxnyj (28.20)whenever delx,y) < .

Now consider a function jz e A which is piecewise-linear with vertices at thepoints of a pmition 1%, as defined above (28.3)for a suitable choice of to bespecified. The slope of g is equal to (g(/j)- jt(n-1))/(ff - fj-1) on the inter-

2vals g/j-1,fj),w'here ti - /j-1 > . Notice that, if suprl p,(/)- /1 S ,

p,(n)- p,(/s1) Ij.t(/j) - tiI+ Ig,tfj-ll - n-1I u z,- 1 f .

ti - /j-1 45(28.21)

1 the series expansion logt 1 +.:)

= a: - /2 + 1x3 -

... impliesFor 1x1< 2, 3

2 < 21xI. (28.22)Ilogl 1 + xl I < maxt 1x1, Ix-

.;t IlSubstituting for .x in (28.22)the quantity whose absolute value is the linorant

ide of (28.21),we must conclude that, if suprl p,(/)- /1 f 82 for 0 < < 1 thenS 4,

11:11< 4. (28.23)2 hich means there exists , e A satisfyingNow, suppose dsxny) =

, w2 d I t)

-x((/))

I < 2 chooseg, as the piecewisesupfl1(/) - /1 < , an supr y .

-1 js ttiedlinear functioa with gtnl = ht for i = 0,...,r. The function , g,down' to the diagonal at the points of the partition; that is, it is increasing on

-1 if and only if t e gn-1,/.the inten-als (/j-1,Qwith g,(f) e (fj-1,/j)Therefore, choosing l-l to correspond to the definition of wx'(8), we can say

...1 y... j j;; j1x4/)

-x(p,(/))

I ; Ix(/)-x(,

g,(/))I + Ix( g,(/))-x(p,(

' )+ :2 (28.24)f wx( .

Putting this together with (28.23)gives for 0 < 8 f 14the inequality

dexty)< maxt4, wx'() + 2) < wx'() + 4. (28.25)since for x e D we may make wx'() arbitrarily small by choice of , we have

Page 484: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

464 The Functional Central Limit Iheorem

1/2dBxnyj Z 4dvx,yl (28.26)whenever dsx,y) is sufficiently small. We may conclude as follows.

28.7 Theorem In D, metrics dp and ds are equivalent.

Proof Given e > 0, choose 8 S t,and also smal,1 enough that wx'(8)+ 48 f :.2Then, for q < mint , /) ,

dex,yt < n = dsx,yt < E, (28.27)dsx.y) < n = dex,y) < e, (28.28)

by (28.20)and (28.25)respectively, The criteria of (5.5)and (5.6)are thereforesatisfied. x

Equivalence means that the two metrics induce the same topology on D (theSkorokhod topology). Given a sequence of elements (ak),dexnnx) ---> 0 if and onlyif dslxnbx)

.--)

0, whenever .x

e D. But it does not imply that fak) is a Cauchysequence in (D,#s) whenever it is a Cauchy sequence in ll,d, because the latterspace is incomplete and a sequence may have its limit outside the space. It isclear in particular that dexnp --> 0 only if dsxn,x)

-->

0 and lim-oowx/t8) = 0.For example, the sequence of functions in (28.12)is not a Cauchy sequence in

(D,#s). To define dexnpxm) (forn k 3, m k 4) it is necessary to tind the elementof A for which Q1)= ,1 and V1a+ )) = j + , and whose gradient deviates aslittle as possible from 1. This is obviously the same piecewise-linear function,with vertices at the points (0,0),Q1,J),(;j+ j, j + )) and (1,1),as defined fords. But the maximum gradient is nlm, corresponding to the segment connecting thesecond and third vertices. dnxn,xm) = mint 1, Iloglnlm) I), which does notapproach zero for large n and m (set m = ln, for example).

28.8 Theorem The space (D,#s) is complete.

Proof Let (yz,k e INl be a Cauchy sequence in D,dp4 satisfying deyk,yk-vj) <k i 1 ing the existence of a sequence of functions lg e AJ with1/2 , mp y

suplyl7tf)-y1+1(g(f))

l < (28.29)l

11g,11< 1/2t (28.30)l f llows from (28.20)that supf Ip+,u(f) - f 1f lllk'bm for m > 0. Define p m

=t 0tp

ga-og+m-l o...ogs also an element of A for each finite m', the sequence lg,,h,,m= 1,2,... ) is a Cauchy sequence in (C,du) because

:+r,$ agaj)sup1g,,,,+1(J) - g,r,,(f)1 = sup Ig+,,,+I@) -

,l

< 1/2 . ( .

t s

Since (C,du) is complete there exists a limit function = limk-oxg,p,. To showthat e A, it is sufticient to show that lllzIl< oo. But by (28.16),

Page 485: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 465

m m 1 1lIg,,,,,ll< Ilg.kop+lo...op,+,,,ll<'>7

lIg,+./Il< :--( k+j <ak-l,-0 j=o l

(28.32)

forany m, so IIlkl1< l/2-1-1 ol-l d hence by (28.29),Note that k = ..l og, so that ,t+1 = g,k k an ,

-1 1-1 I = sup Iyk(5')-yk+l(g,k(.))

I < 1/2k (28.33)supIylk (/))-yk+l(

k+1(f)).

t S

-1 din to (28.33)this is actu-So considr the sequence (ypo,k e D, k e EN). Accor gally a Cauchy sequence in Dndul. But the latter space is complete; this is easilyshown as a corollary of 5.24, whose proof shows completeness of C,dv) withoutusing any of the properties of C, so that it applies without modification to the

-1 h limit y e D. Since this means both thatcase of D. Hence ykok as asupflyz/)

-y(l,(f))

I = supflyl,tl-kltfll -y(/)

I --+ 0 and that 111,11= 112Jk111-- 0,dsk,y)

---2h

0 and so (yklhas a limit y in D,h).kWe began by assuming that (yklwas a Cauchy sequence with deyk,yk-vj) < 1/2 .

But this involves no loss of generality because it suffices to show that anyCauchy sequence (ak,n e N) contains a convergent subsequence fyk= xx, k e N).Clearly, a Cauchy sequence cannot have a cluster point which is not a limit point.Every Cauchy sequence contains a subsequence with the required property; if

-1 kdaxn,xn,v) < 1/:(rl) -- 0 (say), choosing nk 2 g (2 ) is appropriate. Thiscompletes the proof. K

28.4 Measures on DWe write BD for the Borel field on (D,#s). Henceforth, we will also write just Dto denote (D,Js), and will indicate the metric specifically only if one differentfrom s is intended. The basic property we need the measurable spae D,Bo) topossess is that measures can be fully specified by the finite-dimensional sets. Anargument analogous to 27.6 is called for, although some elaboration will beneeded. In particular, we have to show, without appealing (o continuity of theprojections, that the tinite-dimensionaldistributions are well-detined and thatthere are hnite-dimensionalsets which constitute a determining class for D,BD).

We stal't with a lemma. Define the tield of tinite-dimensionalsets of D as RD =

fll'l-bD1 H l >f), where R was detined in j27.1.

28.9 Lemma Given x G D, (x > 0, and any /1,...,/,,, e (0,11, let

Hmx,L = y e D.. R . e A s.t.

Then Hmx, E Rp.

11111<a, max Iy(n)-a.(l(c))

I<a . (28.34)1f iKm

Proof Since Hmxn i D, a1l we have to show according to (27.2)is that'ntb,...,tmblmx,l e Bm. This is the set whose elements are @(/1),...,y(/,,;))foreach y G Hmx,k. To identify these, first define the set

Page 486: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

466 The Functional Central Limit Theorem

zArat.x,al- lal(/1)),...,x(l(?,,l)):, e A, 11111< al c R'''. (28.35)

Then it is apparent that

%L,...,tmHm(x,4

= :1,...,:,,,: max Iai - bi Ik'i<m

< (x, (tz1,...,tz,,,)z4,,,@,a) t:.iR'''. (28.36)

In words, this is the set A,,,@,a) with an open a-halo, and it is an open set. lttherefore belongs to Bm. .

To compare the present situation with that for C, it may be helpful to look at thecase k = 1. The one-dimensional projection ntllhxnl, where

z$(x,a)= fy e 0: (.E.). e A s.t. 11111< a, ly(f)-?(l(f))

I < al, (28.37)is in general different from &x(f),(x),that is, the interval of width 2a centred

on.x(f).

If .;t is continuous at f the difference between these two sets can be madearbitrarily small by taking G small enough, and at these points the projectionsare in fact continuous. Since the discontinuity points are at most countable, theycan be ignored in specifying finite-dimensional distributions for .x,

as will beapparent in the next theorem.

However, the point that matters here is that we have the material for theextension of a measure to D,Bp) from the finite-dimensional distributions. It iseasily verified that Ro, like R, is a t'ield. The final link in the chain is toshow that Ro is a determining class for (D,To4.

28.10 Theorem (cf. Billingsley 1968: 1. 14.5) BD = sRpl.

Proof An opn sphere in D,dn4 is a set of the form

5'txrul = (y e D: dliVnxl< (x)

= y e 0: R 1, e A s.t. 11111< G, syp Iy(f)-.x(l(?))

I <txj

(28.38)

for .x

e D, (x > 0. Since these sets generate Bo, it will suffice to show they canbe constructed by countable unions and complements (andhence also countableintersections) of sets in Rp. Let Hxb = O7=1Sk@,a) where Hklx, is a set) k jyatwith the frm of Hm defined in (28.34),but with m = 2 - 1 and ti = iI2 , so tthe set (/1,...,/2k-1 J converges on D (thedyadic rationals) as k

-->

x. Consider

y e S(x,G). Since y e Hkx, for every k, we may choose a sequence tzl suchthat, for each k 1,

IllkII< G, (28.39)'-k , 2-h)) I < a. (28.40)max Iy(2 -

.x(

z(1<f<1k- 1

Making use of the fortuitous fact that lk has the properties of a c.d.f. on g0,1),Helly's theorem (22.21)may be applied to show that there is a subsequence (,1..

Page 487: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 467

n e INJconverging to a limit function , which is non-decreasing on g0,11with V0)= 0 and (1) = 1. , is necessarily in A, satisfying

11111 (28.41)according to (28.39).And in view of (28.40),and the facts that X1(/)-- X(/) and

x is right-continuous on g0,1), it must also satisfy either Iy(/) - x((/)) I K aor Iy(f)- x(X(f)-)l f for every t e D. Since D is dense in g0,11,this isequivalent to

supIy(/)-.x(l(f))I< a. (28.42)t

The limiting inequalities (28.41)or (28.42)cannot be relied on to be strict, butcomparing with (28.38)we can conclude that y e JV,a).This holds for al1 such y,so that Hx,ul q Rx,(x).Put a = r - 1/l, and take the countable union to give

Co X

UH(x, r- 1/n) i UL%x,r - 1/n) = S(x,r).

n=1 n=1(28.43)

It is also evident on comparing (28.34)with (28.38)that 5'@,a) i Hkx, for (x >

0. Again, put a = r- 1/n, andCo X

Sxt = USx, r - 1/s) I UHx, r - 1/n).n=1 n=1

(28.44)

It follows that, for any x e D and r > 0, Sx, = U';=1O7=1fA@,r- 1/s) whereHkx, r- 1/s) e Ro. This completes the proof. .

The defining of measures on D,Bp4 is now possible by arguments that broadlyparallel those for C. The one pitfall we may encounter when assigning measures tofinite-dimensional sets is that the coordinate projections of Bp sets may have no%natural'interpretation in terms of observed increments of the random process.For example, suppose Xn e D is the process defined in (27.55)and (27.56),withrespect to the underlying space (f,1,#). It is not necessarily the case thatp(X,,()) is measurable with respect to in,gnfj= Guni, i f g?zf)),as casualintuition might suggest. A fo-set like Htx, in (28.37)is the image under the

' i fact we could write E = %1(zt-1(#))wheremapping Xn3( F-> D of a set E e 5, n , t ,

B e f. But E depends on the value that x assumes at (/),and if (/) > t then Ecannot be in Tn,gsrj.

However, this difficulty goes away for processes lying in C almost surely. Inview of 28.4, we may

tembed' (C,d,T in ((D,Js),fo) and ap.m. defined on theformer space can be extended to the latter, with support in C. In particular,Wiener measure is defined on (D,fo) by simply augmenting the conditions in 27.7with the stipulation that JF'@e C) = 1.

28.5 Prokhorov's MetricThe material of this section is not essential to the development since Billings-ley's metric is a1l that we need to work successfully in D. But it is interesting

Page 488: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

468

to compare it with the alternative approach due to Prokhorov (1956).We begin with

an alternative approach to defining a continuity modulus for cadlag functions. Let

The Functional Central Limit Iheorem

zxtl = max sup (mintIx(f?)- x(/) I, lx(/'') -

-x(/)

l)),/- < t's t f t'' < J+

sup 1x40)-.x()

I, sp Ix() - x(1) j . (28.45)0<l< 1-8<1<1

Again, it may be helpful to restate this definition in English. The idea is thaqfor every t e r, 1 - 8), a pair of adjacent intervals of width 8 are constructedaround the point, and we determine the muimum change over each of these inter-vals; the smaller of these two values measures the

-continuity

at point t, andthis quantity is supped over the interval. This means that the function can jumpcliscontinuously without affecting fk(), so long as no two jumpsare ioo closetogether. The exceptions are the two points 0 and 1, which for /x() --+

0 must betrue continuity points from the right and left respectively.

The following theorem parallels 28.3.

28.11 Theorem lf and only if x e D,

1im37?x(6) = 0.-->0

(28.46)

Proof Suppose x e D. By 28.1(c), the second and third tenns under the tmax' in(28.45) definitely go to zero with :. Hence consider the tirst term. Let(fk,fl,fl denote the sequence of points at which the supremum is attained onsetting = Llk for k = 1,2,... Assume tk

-->

t. (If need be consider a convergentsubsequence.) Then G -->

t and tl -- t. Since x(f) = x(f+), this implies(x(fJ

-x(f3

--> 0, which proves sufficiency.Now suppose ;(8) -- ;x(0) > 0. Since

'x(0)= maxt lx(0)- x(0+) l, lx(1)-x(1-)

l, minf lx(?)- x(f-) I, Ix(fl - x(f+) l l ,

it follows that .x D, proving necessity. .

Now define the function

f(e2+), z < 0,Wx(z) = (28.47)

VV1), z k 0.

This is non-decreasing, right-continuous, bounded below by 0 and above by ;(1).It therefore defines a finite measure on R, just as a c.d.f. defines a p.m. on R .

By defining a family of measures in this way (indexedon.x)

on a separable space,we can exploit the fact that a space of measures is metrizable. In fact, we canuse Lvy's metl'ic L* defined in (26.33).The Prokhorov metric for D is

dPlx,yb = dzl7,Fyl + f,*(V,x,Vy), (28.48)

Page 489: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 469

where Fx and Fy are the graphs of .x and y and du is the Hausdolf metric.The idea here should be clear. With the first term alone, we should obtain a

property similar to that of the Skorokhod metric; if we write #@(f),Fy) =

inf,Ws(x(/),y(l')), then

Js(Fx,Fy) = max sup J@(/),Fy), sup J(Fx,y(/')) .

f t'(28.49)

In words, the smallest Euclidean distances between x(/) and a point of y, and y(/)and a point of .&,

are supped over t. For comparison, the Skorokhod metricminimizes the greater of the horizontal and vertical distances separating pointson Fx and Fy in the plane, subject to the constraints imposed on the choice of ,

such as continuity. In cases such as the functions ab of (28.6),ab and xeu azel i CDd ) when 8 is small. (Think in tenns of the distances the graphs wouldc ose n , u

have to be moved to fit over one another.)The purpose of the second term is to ensure completeness. By 28.11, limz-j-xpxtz)

= 0 if and only if x e D; otherwise this limit will be strictly positive. Unlikethe case of (D,du4, it is not possible to have a Cauchy sequence in (D,J#)aproaching a point outside the space. It can be shown that dp is equivalent to ds,and hence of course also to Js, and that the space (D.dp4 is colnplete. The proofsof these propositions can be found in Pmhasarathy (1967).For practicalurposes, therefore, there is nothing to choose between X and Js.P

28.6 Compactness and Tightness in DThe remaining task is to characterize the compact sets of D, in parallel with theearlier application of the Arzel-Ascoli theorem for C.

28.12 Theorem (Billingsly 1968: th. 14.3) A set Wc D is relatively compact inD,d.) if and only if

sup sup Ix(f) I (28.50)-xG A f

limsup wx'() = 0. n (28.51)-->0XGA

This theorem obviously parallels 27.5 but there aze signiticant differences in theconditions. The modulus of continuity wx'appears in place of wx which is a weaken-ing of the previous conditions, but, on the other hand, (28.50)replaces (27.16).Instead of suprlxt/) I we could write Jst Ixl ,0), where 0 denotes the element of Dwhich is identically zero everywhere on (0,11.It is no longer sufticient to boundthe lements at one point of the interval to ensure that they are bounded every-where: the whole element must be bounded.

A feature of the proof that follows, which is basically similar to that of 5.28,is that we can avoid involing completeness of the space until, so to speak, thelastmoment. The sufticiency argument establishing total boundness of A is couchedin terms of the more tractable Skorokhod metric, and then we can exploit theIxcllllvnl/ArlrtA nf rlo witla sk

fxnmnlata motrlct qllctlanq rln tn (ynt the cnmnnctnaqq nf W

Page 490: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

470

The argument for necessity also uses ds to prove upper semicontinuity of wx'(8),aproperty that, as we show, implies (28.51)when the space is compact.

The Functional Central Limit Theorem

Proof of 28.12 Let supxsxsuprl x4/) I = M. To show suftkiency, t'ix e;> 0 andchoose m as the smallest integer such that both 1Im < : and supxsxwx'tl/vl) < :16:.

Such an m exists by (28.51).Construct the finite collection Em of piecewiseconstant functions, whose values at the discontinuity points / = jlm for j =

0,...,- - 1, are drawn from the set fMlulv - 1), u = 0, 1,...,v J where v is aninteger exceeding lMIz; hence, Em has (v+

1)R different elements. This set isshown to be an e-net for A.

Given the definition of m, one can choose for x e:4

a partition Hl/,,, =

(,...,/r), defined as above (28.3),to satisfy

max sup Ix(f) - x(.) i < /.l !; i f r s, t e l'j.-! ,lj)

(28.52)

For i = 0,...,r - 1 let ji be the integer such that jilm f ti < ( + 1)/-, notingthat, since the ti are at a distance more than 1/-, there is at most one of themin any one of the intervals Ljlm,(j+ 1)/-), j = 0,...,- - 1. Choose a piecewiselinear function c A with vertices lUf/@)= tt, i = 0,...,r. Since lti

-jilm

I f1/-, maxoujsrl hhlm)

-jilm

l < r1:, and the linearity of l between thesepoints means that

supj ,(f)- /1 G las. (28.53)

f

By construction, , maps points in Ljlm,U+ 1)/-) into (fj,fj+1) whenever ji S jG +1, and since x varies by at most rl: over intervals (/f,/s1),the compositefunction xoiw can vary by at most : over intervals Vlmnj+ 1)/m). An examplewith m = 10 and r = 4 is sketched in Fig. 28.4*,here, jq = 2, h = 4 and h = 6.The points fo,...,/4 must be more than a distance 1/10 apart in this instance, Onecan therefore choose y e Em such that

tyjlm) - x(lU/r?z))l < Y,j = 0,...,- - 1. (28.54)

Fig. 28.4

Page 491: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

471

Since y(/) = yjlm) for t e U/rn,U+ 1)/pz), we have by (28.52)and (28.54),

suply(f)-x(1(/))

I <t

max jyjlm)-xhljlmj)

Io<jnm- 1

Cadlag Functions

sup Ixhlyml)-x(l(f))

jt f#v,+1)/?zl)(28.55)

Together, (28.55)and (28.53)imply dsxny) ; :, showing that Emis an s-net for :4

as required. This proves that A is totally bounded in D,ds4. But since ds and Jsare equivalent (28.7),A is also totally bounded in (D,#s); in particular, if Emis an E-net for A in lnds), then we can find n such that it is also an n-netfor# in (D,Js) according to (28.27)and (28.28),where n can be set arbitrarilysmall. Since (D,#s) is complete, X is therefore compact, proving sufficiency.

When # is totally bounded it is bounded, proving the necessity of (28.50).Toshow the necessity of (28.51),we show that the functions w'@,1/rn) = wx'(1/rn) areupper semicontinuous on (D,ds4 for each m. Tllis means that the sets Bm = (x:wx'(1/m)< :) are open in lnds) for each e > 0. By equivalence of the metlics,they are also open in (D,Js). In this case, for any such e, the sets LBm,m e EHJare an open covering for D by 28.3. Any compact subset of D then has a finitesubcovering, or in other words, if X'is compact there is an m such that Wc Bm. Bydefinition of Bm, this implies that (28.51)holds.

To show upper semicontinuity, fix ir > 0, > 0, and x e D, and choose a parti-tion l'I satisfying

max sup 1./) - x(.) I < w.<'() + las.

1<i :f r s,t e (ff-(l,lj)(28.56)

Also choose n < /, and small enough that

max (n - /j-1 ) > + 2n.1< i < r

(28.57)

Our object is to show, aftr (5.32),that if y e D

wy'() < wx'() + E.

lf dsxnyj < n there is e A such that

and dsx,y) < q then

(28.58)

suply(l(f))-x(f)

It

sup I (4 - t l < n.t

and

(28.59)

(28.60)

Letting si = klt, (28.57)and (28.60)and the triangle inequality imply that

maxt-f-

si-k ) > max (ti -

n-1J - 2n > . (28.61)lKffr li<r

Page 492: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

472

If both s and t 1ie in (/j-1,n) (J) and (/) must both lie in (.f-I ,Jf). Itfollows by (28.56),(28.59),and the choice of n that

The Functional Central Limit Theorem

max sup ly(J) - y(J) l < wx'() + e.1: i :6; r s,t G g-f-j,.J

(28.62)

In view of (28.61),this shows that (28.58)holdk , and since e:and x are arbitrarythe proof is complete. w

This result is used to characterize uniform tightness of a sequence in D. The nexttheorem directly parallels 27.12. We need completeness for this argument to avoidhaving to prove tightness of every gz, so it is necessary to specify an appropri-ate metric. Without loss of generality, we can cite tfs where required.

28.13 Theorem (Billingsley 1968: th. 15.2) A sequence fgw) of p.m.s on ((D,Js),fo) is uniformly tight iff there exists N G (Nsuch that, for a11n k N,

(a) For each n > 0 there exists M such that

g'ntfx:sup 1x(/)l > A/')) S n;l(28.63)

(b) for each : > 0, n > 0 there exists G (O,1) such thatM,n(lI: w28) k el) f n. (28.64)

Proof Let tjtn) be uniformly tight, and for n> 0 choose a compact set K with gw(&> 1 -

n. By 28.12 there exist M < x and 8 e (0,1)such that

K c (-x:sup Ix(/) l S.&f)

rn (x:wx'() < : lt

(28.65)

for any : > 0. lnequalities (28.63)and (28.64)follow for n e :, proving neces-sity.

The object is now to find a set satisfying the conditions of 28.12, whoseclsure K satisfies supnxmtm > 1 - 0 for some N e N and al1 0 > 0. BecauseD,da) is a complete separable space, each p. is tight (26.19)and the above is

suftkient for uniform tightness. As in 27.12, 1et g,*stand for supuzxp.. For 0> 0, define

Ak = fx:wx'(8) < 1/k) , (28.66)8 i hosen so that g,*(AJ> 1 - llk'vb ossible by condition (b).Alsowhere ( z) s c , p

set B = (x: sup lx(f)l K M) such that g,*(#) > 1 - z10,possible by condition (a).Let K = (fV=1z%rn#)-, and note that K satisfies the conditions in (28.50)and(28.51), and hence is compact by 18.12.With these definitions, the argumentfollows that of 27.12 word for word. >

The last result of this chapter concerns an issue of obvious relevance to thefunctional CLT; how to characterize a sequence in D which is converging to a limitin C. Since in all our applications the weak limit we desire to establish is in C,no other case has to be considered here. The modulus of continuitv w- iq tlaa

Page 493: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Cadlag Functions 473

natural medium for expressing this property of a sequence. Essentially, thefollowing theorem amounts to the result that the sufficiency part of 27.12 holdsin (D,#s) just as in (C,Jg).

28.14 Theorem (Billingsley 1968: th. 15.5) Let (g.) be a sequence of measures on(D,dp),Bp). lf there exists N e ENsuch that, for n 2 N,

(a) for each n > 0 there is a finite M such that

>a(lx:11(0)1> Afl) E n; (28.67)(b) for each E > 0, n > 0 there is a 8 e (0,1) such that

gw(lx:wx(6) k :l) f n; (28.68)then tg,nlis unifonnly tight, and if jt is any cluster point of the sequence, p.(C)

Proof By (28.4),if (28.68)holds for a given 6 then (28.64)holds for 8/2.Let k

= g1/81+ 1 (so that k& > 1) where 8 > 0 is specified by condition (b). Thenaccording to (28.68),g,n((x: Ixtilk) - xtli - 1)Ik4l 2 el) S n for i = 1,...,/c,and t E (0,1j. We have noted previously that

lx(,)Is 1x(0)1+ :-': x(),)-x (-,.1

,)

f,

izz1(28.69)

where each of the k intervals indicated has width less than 8. lt follows by(28.68) and (28.67)that

pw(l.x:sup lx(fll> M+ lel) < g.ttx: I.x(0)I > Ml) ; n,l(28.70)

so that (28.63)also holds for finite M. The conditions of 28.13 are thereforesatisfied, proving uniform tightness.

Let g, be a cluster point such that gw = g,for some subsequence frlk,k G ENJ.

Defining ,4= f.x:wx(8) > :J, consider the open set A0, the interior of A; for

example, x eAO if wx(&2) k 2E. Then by (d) of 26.10, and (28.68),

p,(AO) < liminf p,,,:(z4O) f q.k-Ax

(28.71)

Hence g,(#) < n for any set B A0. Since : and q are arbitrary here, it is pos-sible to choose a decreasing sequence f6yJsuch that bkBp < Slj, where Bj = (x:wxty) Sljl . For each m k 1, p,(O7=,u)= 0, and so, by subadditivity, p.(#) = 0where B = liminf Bj. But suppose x e

#C, where BC= O,,LIU''/=mKis the set

f.r: wx(Y)< 1#, some j m; al1 m e ENl .

Since f./1

is monotonic, it must be the case that lim-owxt6) = 0 for this x.Hence Bc i C, and since p,(#f') = 1, gtC) = 1 follows. w

Page 494: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

29FCLTS for Dependent Variables

29. 1 The Distribution of Continuous Functions on DA surprising fact about Wiener measure is that definition 27.7 is actuallyredundant; if part (b) of that definition is replaced by the specitkation merelyof the first two moments of a/), Gaussianity of x must follow. This fact leadsto a class of functional CLTS of considerably greater power and generality than ispossible with the approach of j27,6.

29.1 Theorem (Billingsley 1968: th. 19.1) Letxbe a random element of Dm,lj withthe following properties:

2 0 < t < 1.(a) E (X(f)) = 0, Extj ) = f,

(b) P(X e C'''I= 1.(c) For any pmition f ,...,41 of (0,11,the increments Xt - X(/1),

X(tz) - X(/2), ..., Xt - Xtk-L), are totally independent.Then X - B. u

This is a remarkable theorem, in the apparent triviality of the conditions; if anelement of D is a.s. continuous, independence of its increments is equivalent toGaussianity! The essential insight it provides is that continuity of the samplepaths is equivalent to the Lindeberg condition being satisfied by the increments.

The virtuosity of Billingsley's proof is also remarkable. The two preliminarylemmas are technical, and in the second case the proof is rather lengthy', thereader might prefer to take this one on trust initially. If (1,...,(,,,is a randomsequence, and we define Sj = S=1(jfor 1 Kj K m, and St = 0, the problem is tobound the probability of lxs'p,l exceeding a given value. The lemmas are obviouslydesigned to work together to this end.

29.2 Lemma ISmI f 2 max minf ISjI, ISm- h 1J + max j(./j.sjsm 0<jKr)

Proof Let I (0,...,m)denote the set of integers k for which I&1 < Ivb'm- 51 .

If Sm = 0 the lemma holds,.and

if Sm# 0 then m e 1. On the other hand, 0 e 1. Itfollows that ther is a k e I such that k - 1 e 1. For this choice of k,

lSm1f ISm -

u%1+ lSkl

K lSm-

.%1

+ iSk-q1+ 1klK 2 max mint I.h. l, lvb'm

- h Il + max I%.l. . (29.1)fssj f m 0 sj<m

Page 495: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 475

The second lemma is a variation on the maximal inequality for partial sums.

29-3 Lemma (Billingsley 1968: th. 12.1) If

k 2Eh - Siblsk - Sj4l) < b, , j = j,...,k,

/=f+1(29.2)

for each pair i,k with 0 < i f k f m, where f:I,...,%) is a collection of posi-tive numbers, then R K > 0 such that, for a11 (x > 0 and a11 m,

1KBP max minf lsjI, Ism-

'j Il (x K 4 ,

0 Kj S m (X(29.3)

where B = 'Xjbj../

Proof For 0 f i f k S m and (x > 0, we have

z'tminfIsj - siI, Isk- sjIl a) = #(( ISj - SiI al fa (lSk - 5)1 k al)

< #( I&.- sil lsk- sjI k a )z k x 21

< 4 X bl ,

(X /=j+1(29.4)

where Chebyshev's inequality and (29.2)give the tinal inequality. If m = 1, theminorant side of (29.3)is zero. If m = 2, (29.4)with i = 0 and k = 2 yields

2(&1+ :2)#tmaxto,minf I& l, lvbb - st ll l k a) f

4 ,

(X(29.5)

so that (29.3)holds for zv = 1 and hence for any K 2 1.The proof now proceeds by induction. Assuming there is a K for which (29.3)

holds when m is replaced by any integer between 1 and m - 1, we show it holds for

m itself, with the same K. The basic idea is to split the sum into two parts, eachwith fewr than m terms, obtain valid inequalities for each part, and combine

be the largest integer such that Zlx-lljS #/2 (thesum is zerothese. Choose h toif h = 1); it is easy to see that L%)=hwqbj

f #/2 also (thesum being zero if h =

m4. First define

171= max minf lM1, Ivh-t - 5)l l (29.6)0 GjKh-j

D1 = minl ISh-t l, lSm- 5'-1 I1. (29.7)Evidently,

-1 1 2K Awpvt u a) / w Xbj <

,tx j=j 4a

(29.8)

by the induction hypothesis. Also, by (29.4)with i = 0 and k = m,

Page 496: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

476 The Functional Central Limit Theorem

2B#(DI k a) K w. (29.9)

The object is now to show that

minl15)1, 1Sm- ,S)11 < 171+ Dl, 0 < j S h - 1. (29.10)t

If I5'.jlK &1, (29.10)holds, hence suppose ISh- - Sj( < &1, the only other possi-bilityaccording to (29.6).If D1 = ISh-L I, then

mintlh I, ISm- h ll (SjI K ISh-k - Sjl + ISh-t I S 1.71+ D1 .

And if D1 = ISm- Sh-j l then again,

mint lSjl, ISm - $.ll < ISm-vb'j l S I,%-

1- s'./I+ lSm-

-%-

l l S !71+ D1.

Hence (29.10)holds in al1 cases. Now, for 0 S g. f 1,

#4U1 + D1 k a) f #(( (/1 k gal k? (Dj k (1 - jt)aJ)f #41.71k gal + #(D1 k (1 - jt)a)

2 ,2KB< + 4 4.4 4 1

.jt;

41 yt (x ((29.11)

choosingjt to minimize r/4g.4+ 141 - jz)4 yields g, = (,1r)1/5/(1+ (tx)1/5j(uselculus).Back-substituting for p. and simplifying yields, for K k 2(1 - (r1)1/5j-5ca

= 55,021,

#2g(j.r)1/5+ 1j5 vpz81.71+D1 2 a) S

-

S n'u.

4 2a4G(29.12)

According to (29.10),we have bounded mint 1SjI, 1Sm- 5'yl) in the range 0 f j Sh - 1. To do the same for the range h f j f m, define

U1 = max minl lSj - Shj , lSm-

'y1

) (29.13)h<jsm

D2 = minl 1,:,,1

, lsm-

.,1

). (29.14)lt can be verified by variants of the previous arguments that

minfI5'./i, 1sm- 5'j(l < &c+ Dz, h < j G m, (29.15)

and also that2KB#(W+D2 2 a) S42a

(29.16)

for the same choice of K. Combining (29.16)with (29.12),we obtain

Page 497: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes

P max mint ISjI, ISm- SjIl k a ; J'tmaxf .71+ ol , Uz + o2) a)0 %<m

= P(f &1+D3 a) k.p (Uz +D2 k a))

< #4171+ D1 k a) + PIUI + D1 k a)2KB

< k . w (29.17)

Proof of 29.1 Let the characteristic function of X(/) be

ix#(/,) = E e ).We can write, by (11.25),

iu ! + iu -1jy2 + yu),e =

z

(29.18)

(29.19)3 We shall write either Ayt or A(J,/), as is most convenient,where Irul I S Iu I .

,

to denote aY(.)- Xt) for 0 S t S s S 1. Observe that by conditions (a)and (c)of2 uence,the theorem, Etn,t) =

.

ij-ht) fhxlkr+h,t .. j )j9(/+h,h4 - 9(/,1)= Fge e

i.Y(P(f,A +y t- jhlhl /; t + r(,A,+yy,,))j= Fle t , t.v ,

=(1)(/,1)(w112/1 + F(r(A,+,,;)j, (29.20)

where the last equality is because X(/) and ht-vh,tare independent by condition(c), Since Etfhhtn,) S 3FIAf+,,rI3, it follows that

,:3,1 A ,,I 3

(/+,1)- 9(/,1) s2 t s; u'(/, ) t+ ,!'

+ ' - h. (29.21)

Now, suppose that

1 :3lim FIAr+/,,rI= 0.i$0(29.22)

lt will then follow that, for a110 S t < 1, possesses a right-hand derivative,

#(/+h,h) - 9(/,1) saj(/,u).lim = -

hnztl(29.23)

Further, for h > 0 and h S t S 1, (29.21)holds at the point t -

, so by consid-ering a path to the limit tluough such points we may also conclude that

9(/,) - ()(/ - ,X) 1s2j(j-,s). (z9.a4)1im = -r

hl()Since 9(/-,1) = 9(/,:) because () is continuous in /, by condition (b) of thetheorem, (1) is differentiable on (0,1)and

Page 498: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

478 The Functional Central Limit Theorem

:4) jN2j(/,s;.g = -J (29.25)

This differential equation is well known to have the solution-tkllz(f,X) = #(0,X)d , t 2 0. (29.26)

(Verify this by differentiating log (1) with respect to /.) Since X(0) = 0 a.s.,#(0,) = 1, and applying the inversion theorem we conclude that X( - N(0,/) foreach t e (0,1). By continuity of () at 1, the result also extends to t = 1.

Hence, the task is to prove (29.22).This requires the application of 29.2 and29.3. For some tinite m let Y.= A(f + hjlm, t + JIU- 1)/-) for j = 1,...,-. Byassumption, the (j are independent r.v.s with variances of hlm. If Sj = Ei,=1k=A(/ +jhlm, /). then

E((h - Si4l(Sk- Sj)l) = (j - ijlk-phllml

E hl (;9.gg;By 29.3, setting bj = hlm. we have

JI,(,

+

---,./

, /) l, ja(,+ ,, t + L,) 1j J sr-...-zz.P max min

ojsm t m ) (x(29.28)

Hence by 29.2,

i /) I,Ia(,+F,, t+Lh4I#( IA(/+ h, I a) / P 2 max min IA(/+ --hm

,

mONKr?;

j j - 1+ max jA(/ +

-mh

, t +s; )Ik

Oij<pl

r*2 j j- 1f x + P max jA(t +

--./1.

, t +tyt

h)j +1 ,

a* oxjx,a(29.29)

+ 4where K = 4 K. Letting m-->

x, the second term of the majorant member must goto zero, since X e C with probability 1 by condition (b), so we can say that

2rP Iht-bh'tI a) <4

.

%(29.30)

we may now use 9.1s to give

3=

JC

1Ar+J,,,l 3dF + zP lA,+,fj3 z :) + jortjA/+/,,,I3> ()#(FIA,+,,I

o z

-#(IAf+,,,lz (1'3)<

vwj-

x*2

2 1 3# h< e+r dL= :+ .

4/3 1/3E(e

(29.31)

Page 499: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 479

+ 3/4 3/2Choose er = (K ) h to minimize the last member above, and we obtain3 u 44s.+;3/4,3/2 zoap,;E IAr+,,I . ( .

This condition verifies (29.22),and completes the proof. .

Notice how (29.30)is a substantial strengthening of the Chebyshev inequality,hich gives merely #( lAl+/,,rIa) f hlql. we have not assumed the existence ofW

the third moment at the outset; this emerges (alongwith the Gaussianity) from theassumption of independent increments of arbitrarily small width, wlch allows usto take (29.29)to the limit.

29.2 Asymptotic lndependence

Let IULJTdenote a stochastic sequence in (D,BD). We say that Xnhas asymptoti-cally independent increments if, for any collection of points fsi,ti,i = 1,...,r)such that

0 < -1 f /1 < n < tl < ... < sr S tr K 1,

and a11collections of linear Borel sets #1,...,#r e ,

r

Pxnti)-xnsi)

e Bi, i = 1,...,r) --> I1#(-Yn(n) -xnlsil

e Bi)f=1

as n--->

x. Notice that in this definition, gaps of positive width are allowed toseparate the increments, which will be essential to establish asymptotic indepen-dence in the partial sums of mixing sequences. The gaps can be arbitrarily small,however, and continuity allows us to ignore them as we see below.

Given this idea, we have the following consequence of 29.1.

(29.33)

29.4 Theorem Let t.Ll';xl have the following properties:(a) The increments are asymptotically independent.(b) For any : > 0 and n > 0, 3 8 e (0,1) s.t. limsuw-oxfwtxn,) 2 e) f n.(c) (Xn(/)2)=s=1is unifonnly integrable for each t e (0,11.

2 ja t s p 1j.(d) Elxnj--> 0 and E(Xn(t) )

-->

t as n -- x, eac ,

Then Xn -P-> #. n

Be careful to note that w(.,) in (b) is the modulus of continuity of (27.14),notw' of (28.3).Proof Condition (b),and the fact that FIXno) I --> 0 by (d),imply by 28.14 thatthe associated sequence of p.m.s is uniformly tight. Theorem 26.22 then impliesthat the latter sequence is cpmpact, and so has one or more cluster points. Tocomplete the proof, we show that a11 such cluster points must have thecharacteristics of Wiener measure, and hence that the sequence has this p.m. asits unique weak limit.

Consider the properties the limiting p.m. must possess. Writing Xfor the random2element, 28.14 also gives PX e C) :: 1. Uniform integrability of Xnlt) , and hence

2 2 16 By condition (a)weof Xnt), implies that E(Xt)) = 0 and Ext) ) = t, by 2 . .

Page 500: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

480

may say that the increments X(/jl - X(-1),...,X(/r) - X(.r) are totally indepen-dent according to (29.33).Specitkally, consider increments Xlti) - Xs andXtiwt) - A,fmll for the case where xf+l = ti + Slm. By a.s. continuity,

lim (X(/j+I) - Xti + 1/&4) = Xtiwj) - Xti) w.p. 1, (29.34)mM*

The Functional Central Limit Theorem

.j'

so that asymptotic independence extends to contiguous increments. Al1 the condi-tions of 29.1 are therefore satisfied by X, and X - B. w

Our aim is now to get a FCLT for partial-sum processes by linking up theasymptotic independence idea with our established characterization of a dependentincrement process; that is to say, as a near-epoch dependent function of a mixingprocess. Making this connection is perhaps the biggest difficulty we still have tosurmount. An approach comparable to the Kblocking'

argument used in the CLTS ofj24.4 is needed', and in the present context we can proceed by mapping an infinite

sequence into g0,11andidentifying the increments withasymptotically independentblocks of summands. This is a particularly elegant route to the result. However,an asymptotic martingale difference-type property of the type exploited in 24.6 isnot going to work in our present approach to the problem. vsrhile the tenns of amixing process (of suitable size) can be Kblocked'

so that the blocks areasymptotically independent (moreor less by definition of mixing), mixingaletheory will not serve here; near-epoch dependent functions can be dealt with onlyby a direct approximation argument.

What we shall show is that, if the difference between two stochastic processesis ty41) and one of them exhibits asymptotic independence, so must the other, in asense to be defined. Near-epoch dependent functions can be approximated in therequired way by their near-epoch conditional expectations, where the latter arefunctions of lnixing variables. This result is established in the following lemmain terms of the independehce of a pair of sequences, which in the application willbe adjacent increments of a partial sum process.

29.5 Lemma (Wooldridge and White 1986: Lemma A.3) If (1%1and lZ;.nl are realstochastic sequences, and

(a) Yj - Z'n -EC> 0, for j = 1,2;(b) Yjn-P-> 4. for j = 1,2;(c) for any Al,

-42

e B,

#(tZ1n e All rn tZ2n e A2l) -- Pzn e A1)#(Z2n e A2) (29.35)as n

-->

=;

then#(t Fln 6 #1 l f''h l i'2,, e #21) -- /i'1 e BL4PCYI e #2) (29.36)

for all Y/continuity sets (setsBj c B such that Pj'j. e Bp = 0) for j = 1,2.

2 ith the EuclideanProof Considering (Z1n,4z) and (F1n,F2n) as points of R wmetric, (a) implies #x((z1n,Z2n),(F1n,F2n)) --) 0, and by an application of 26.24,Jlah lmnlioe lantla (7. 7,. h X y (Y. Foh onrl l = 1 7 Wrlto

Page 501: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Ptocesses 481

#(lZ1n e #1 1f''n fz2,,e #21) = p.,;tll xBz4, (29.37)where gwis the measure associated with the element (ZIn Z2u).If g,is the measureJ by p?(sy) - p. e s,);associated with (l'1,F2), define the marginal measures g,then VBp = 0 for j = 1,2 implies g(:(#1 x#2)) = 0, in view of the fact that

Bj x #z) c Bj xR) t.p(R x B. (29.38)Applying (e)of 26.10, it follows from the weak convergence of the joint distribu-tions that, for al1 lcontinuity sets Bj,

#(lZln e #1 l r'n lZ.2ne #21) = m(#1xB

.-+/1,(#1xB1)

= #(( J'I e Bj l f'a fFc e #2J). (29.39)And by the weak convergence of both sets of marginal distributions it followsthat, for these same Bj,

#(Zln e F1)#(An e B1) -- PY1 e Bj4PY1 e #2). (29.40)This completes the proof, since the limits of the left-hand sides of (29.39)and(29.40) are the same by condition (c). .

29.3 The FCLT for NED Functions of Mixing Processes

From 29.4 to a general invariance principle for dependent sequences is only ashort step, even though some of the details in the following version of the result

are quite fiddly. This is basically the one given by Wooldridge and White (1988).29.6 Theorem Let (Unil be a zero-mean stochastic an-ay, fcnil an an'ay of positiveconstants, and (#n(f),n e IN) a sequence of integer-valued, right-continuous,incteasing functions of f, with Kn(04 = 0 for al1 n, and #n(f) - Kns) -->

x as nX' flou If--- x if t > s. Also define Xn =

f 1 ni.

(a) Euni) = 0,'

(b)suptsll Unilcni6kr< x, for r > 2-,(c) Uni is AZ-NED of size -y, for zl; y ; 1, with respect to the constants

tcnj), on an array fF,,f) which is a-mixing of size-r/(r

- 2)*,

2(f ) A'n(?+*Vn , o .. .--x od) sup limsup < x, where vj(J,) = X c'Li;(

rG g0,1),:G (0,1-f1

n---lco f=Ak(/)+1

1T-1 here y is defined in (c);max cni = oKn ) ), w1!5;i < Kn(1)

K 2 f jj t s p jj .(9 Exnl )--.2/

t as n --A x, or eac , ,

then X*' -P- #. Inn

2

Right-continuity of Kn ensures that vn(f,) -- 0 as..-.#

0, if we agree that a

sum is equal to zero whenever the lower limit exceeds the upper.

Page 502: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

482

If 'y is set to 1 in condition (c),condition (e) can be omitted. lt is importantto emphasize that this statement of the assumptions, while technically correct, issomewhat misleading in that condition (c)is not the only constraint on the depen-dence. ln the leading cases discussed below, condition (9 will as a rule imply aLa-NED size of

-1.

Theorem 29.6 is very general, and it may help in getting to grips with it toextract a more basic and easily comprehended set of suftkient conditions. What wemight think of as the Estandard'

case of the FCLT - that of convergence of apartial sum process to Wiener measure - corresponds to the case Kn = g?7/l.We

i1l omit the K superscript to denote this case, writing Xn(/) = Z,tl'1& The5V ni.

full conditions of the theorem allow different modes of convergence to be definedfor various kinds of heterogeneous processes, and these issues are taken up againin j29.4 below. But it might be a good plan to focus initially on the case Xn(t),mentally making the required substitutions of gn/jfor Kn in the formulae.

In particular, consider the case Uni = Uilsn where

The Functional Central Limit Theorem

11 (2. n n - 1 n - i2 2

sn = E Ui = cj + 2 cf sm,f=l f=1 f=1 rrl=l

(29.41)

withc?, = Var(&f) and ojom = Covui,uiwm). Also, require that supfll&fIIr< x,

r > 2. Then we may choose cni = llsn, and with Kn = (rl/1,condition 29.6(d)2/ > 0 uniformly in n. ln this case, 29.6(e) isreducesto the requirement that sn n ,

1 If in addition snlln --> ().2 < x, then E(X,,(r))2 = sl gy/.s2 .-+

tsatisfiedfor y =

r. rnand 29.6(9 also holds. These conclusions are summarized in the followingcorollary.29.7 Corollary Let the sequence fUi have mean zero, be uniformly fv-bounded,andQ-NED of size

-,1

on an G-mixing process of size-r/(r-

2), and 1et Xnt) =

-1/2Uf 14./ If rI-1().j,j)2 ..-, c2, () < (y2 < x, then Xn -P-- c2#. ut I=1 j.

Be careful to note that c2 =(51-

+ 2;=1., where l= limn-yx rl-1Z2=1cland l,u =

1im-+xl-1E2=-Tcfiwm. This and not-c2

is the variance of the limiting Browniann ,

motion, notwithstanding the fact thatB has independent increments. The condition2/ -- c2 has two parts. The first is that the limits d2 and , for m = 1,2,3,...Sn N v

al1 exist, which is the condition of global wide-sense stationarity discussed inb 24 10 and 24.11.j13.2. Examples where this condition is violated is provide y .

The second is that E,1xl3.< x, for which it is sufficient that Z':.I lokimm1< x

for each i. According to 17.7 this follows from condition 29.6(c), with theadditional requirement that y = 1.

Thecomplications of 29.6 arechiefly to accommodate globalnonstationarity. Thefollowing is such a case.

29 8 Example Let the sequence fUi) have variances c? - 7 and (justfor simplic-- o , (' ake) be serially uncorrelated. Then, Jj = O(n'+i5) and choosing Kntj =ity s s ,

1R1+O ill serve to satisfy conditions 29.6(d) and 29.6(9. nnt 1 w

Page 503: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

483

It is instructive to compare the conditions of 29.6 with those of 24.6 and 24.7.Since A$(1) -P-) #(1) - N(, 1), the two theorems give alternative sets of condi-tions for the central limit theorem. Although they are stated in very differentterms, conditions 29.64d) and (e)clearly have a role anlogous to 24.6(d). While24.6 required a Q-NED size of

-1,

it was pointed out above how the same conditionis generally enforced by 29.6(9. However, 29.649 itself has no counterpart inthe CLT conditions. It is not clear how tough this restriction is, given our freechoice of Kn, and this is a question we attempt to shed light on in j29.4. What isclear is that the convergence of the partial stlm process Xn to B requires strongerconditions than are required just for the convergence of uL(1)to #(1), which isthe CLT.

Proof of 29.6 We will establish that the conditions of 29.4 hold for the sequenceK C ndition 29.4(d) holds directly, by the present condititjns (a) and (t).lXf 1. O

Conditions (a),(b),and (c) imply by 17.6(i) that (Uni,snil is a Q-mixingale ofsize - with respect to the scaling constants lcnjl,where 5ni = Gvi-j, j k 0).

2/ 2 j is uniformlyIn view of the uniform fv-boundedness with r > 2, the array (Uni cniintegrable. If we let k = Knt) and m = Knt + 6) - Knl for 4'5 e (0,1 - f), itfollows by 16.14 (whichholds irrespective of shifts in the coordinate index) thatthe set

FCLTS for Dependent Processes

2(Sn,k+j- Snmax , n k 1

2 :jm vn(/, )(29.42)

is unifonnly integrable, for any t and . Further, because of condition (d)we mayassume there is a positive constant M < cxo such that for any f e g0,1)and any 8 E

2 4'5 / < M for n(0, 1 - /1), there exists N(t,&) 1 with the property vn(/, )Ntnn). Therefore the set

2sn,k-vj- Snmax , n Nt,nj (29.43)

is also unifonnly integrable. If N* = sup,A(/,8),condition (d) implies that N*is finite.

Taling the case t = 0 and hence k = 0 and m = Kn(&4in (29.43)(butthen writing

t in place of for consistency of notation), we deduce uniform integrability ofK '='

f / e (0 11(thesummands from 1 to ,V(0,/) - 1 can be included bylXn(f)1n=1 Or any ,

K=condition (b)).In other words, condition 29.4/) holds for t.L )n=1.Note that hlp I#I > 1) S F(-Y21()xlu)) for any square-integrable r.v. X. There-

fore, the uniform integrability of (29.43)implies that for any 8 e (0,1),any / S1 -

, and any : > 0 and q > 0, 3 . > 0 large enough that for n k M,2ME

P max Isn,k..j- &11k kv S ,

2tusm 8%(29.44)

where k amd m are defined as before.The argument now follows similar lines to thelj4hl (2944) itnpliesproofof 27.14. For the case 6 = : , .

Page 504: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

484 The Functional Central Limit Theorem

K xK 1sup P sup Ix,,(J) - nq)

0<f< l - t KJ <J+32 : < ltn6, n k N*, (29.45)

which is identical to (27.71).Condition 29.4*) now follows by 27.13, as before.The final step is to show asymptotic independence. Whereas the theorem requires

us to show that (29.33)holds for any r, since the argument is based on the mixingpropel'ty and the linear separation of the incremerts, it will suffice to showindependence for adjacent pairs of increments i, i + 1) having ti < Jf+1 . Theextension to the general case is easy in principle, though tedious to write out.

Hence we consider, without loss of generality, the pair of variables

ratj)K xK g

.

= 1 and 2Yjn = Xnt? - ns? = ni, J ,

izznsjj- 1(29.46)

where0 < J1 < h < n < /2 K 1. We cannot show asymptotic independence of F1nandF2,, directly because the inerement process need not be mixing, but there is an

kapproximationargument direct from the NED property. Defining 5n,j =

F F ) the r.v. F(F1a ITfnt/ ) is shlt ' -measurable,

and similarlyc( nj,..., nk , , x r,. x

E (F2nls,Knszlj is Tanwntupmeasurable.By assumption (c),

SuPA'a(l) B e Fc<'A e 5n , x, n,Knsz)

-- 0 aS /1 -V C>O (29.47)henever fl < J2, Where the events A include those of the form (F(l'1s I?xntlzl ) eW x

E )for E e 0, and similarly events # include those of the type (F(l%n ls*n,Knszll

e A') .'Fhese conditional expectations are asymptotically independent r.v.s, and itremains to show that Flrl and F2s share the same property.

We show that the conditions of 29.5 are satisfied when Z1a= Ejhn I?/Xnnt/ ) and,

CK3

Zzn = E(YznIs*n,xnszlj.This is sufficient in view of the fact that thelcontinuity sets are a convergence-determining class for the sequences (Yjn), by26.10(e). The argument of the preceding paragraph has already established condi-tion 29.5(c). To show condition 29.5(a) we have the inequalities

j#(A f''h #) - 17(4)/7(#) j = a(Kn(sz) - Ak(/1))

Knh)

e'11Fa - e'tlhnITftt' k)112s 77 11Uni- Eunik @fn,t' k) II2f=A'n(.1)+l

KntL Q

K 2 11&nf- F( UniITfnnctr-ls' (,j)) 112izzKns 1)+ 1

KntL)

S 2 cniMxtkj-if=A-n(J1)+1

Page 505: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 485

Ak(ll)-k(.1)-1

f 2 max cni vmKnsLj< <&t/l) m'u

--+ 0 as n -- x, (29.48)

wherewe have applied Minkowski's inequality, 10.28, and finally assumptions (c)d (e), and 2.27. This implies that Fln - E(F1nlTfa ' ) -T-r->0. Note thatan x

A'

condition (d) implies that supjof--+ 0 as n

.,--)

x, so in the case'y = 1, (e) can

be dispensed with. By the same reasoning, l%n- Ebn Is*n,Knsvj -C!-> 0 also.Since we have established that conditions 29.4(b) and 29.44d) hold, we know that

K i uniformly tight, and so containsthe sequence of measures associated with fXn l sleast one convergent subsequence- (nk,k e EN1,say

-such

that XK.-2-h

XKat nkK 1 lt follows that the continuous mapping(say) as k

-->

x where PX e C) = .

K XK and we may asserttheorem applies to the coordinate projections xtx ) = ( ,

hat XK t) -P-> XKt4. Confining attention to tls subsequence, condition 29.5(b)t nkK XK A1l the conditions of 29.5is satisfied for the case Ynkvj= Xntp - nsp.

have owbeen confirmed, so these incrementsare asymptoticallyindependentinthesense of (29.36).But since tllis is true for evel'y convergent subsequence (nzJ,we

K h toticallyindependentincrementscan conclude thatthe weaklimitof fA%l as asympKwhenever it exists. A1l the conditions of 29.4 are therefore fulfilled by (Ak ),

and the proof is complete. .

It is possible to relax the moment conditions of this theorem if we substitute aunifonn mixing condition for the strong mixing in condition (c).

K b defined as in 29.6' assume29.9Theorem Let fUnij , tcnj), (#s(/)),and (Ak l e ,

thatconditions 29.6(a), (d),(e) and (9 hold, but replace conditions 29.6*) and

(c) by the following:(b') suptn 11UnilcniIIr< x, for r 2, and (Unlilcnli) is uniformly integrable;(c') Uni is AZ-NED of size -y, for ,1 < y < 1, with respect to constants fcnjl,

on an array (Fnj) which is (-mixingof size-r/2(r

- 1), for r 2',thenXK -P-> B. nn

The uniforin integrability stipulation in (b') is required only for the case r = 2,and the difference between this and the a-mixing case is tht tls value of r ispermitted, corresponding to a (-mixingsize of

-1.

Proof By 17.6(ii), (Unil is again an Aa-mixingale of size - in this case. The

same arguments as before establish that conditions 29.4(b),(c) and (d)hold; and,since a(?Al) S (440,condition (29.47)remains valid so that asymptotic indepen-dence also holds by the same arguments as before. .

29.4 Transformed Brownian Motion

To develop a fully general theory of weak convergence of partial sum processes,permitting global heterogeneity of the increments with possibly trending moments,

Page 506: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

486

and particularly to accommodate the multivariate case, we shall need to extend theclass of limit processes beyond ordinary Brownian motion. The desired generaliza-tion has already been ineoduced as example 27.8, but now we consider the theoryof these processes a little more formally. A transformed (orvariance-transforme+Brownian motion Sn will be defined as a stochastic process on g0,IJ with tinite-dimensional distributions given by

fc(?)- f(n(J)). t 6 E0,11. (29.49)where #is aBrownian motion andn is an increasing homeomorphism on (0,1jwithn40) = 0. The increments of this process, #n(l) - #n(J) for 0 ; t < s S 1, aretherefore independent and Gaussian with mean 0 and variance n(f)- n(J). Since4(1) must be tinite, the condition n(1)= 1 can be achieved by a trivial normal-ization.

To appreciate the relevance of these processes, consider, as was done in j27.4the characterization of B as the limit of a partial-sum process with independentGaussian summands. Here we let the variance of the terms change with time. Sup-

N(0 c?), and 1et J2u= F(IJ=1(j)2= :2.1c?,.Also suppose the variancepose kd-

, ,

sequence (c?,JT=1has the property that, for each t e (0,11,

The Functional Central Limit Theorem

2xlnfj

--4 q(/) as n -- x,2

Sn(29.50)

where the limit function n: g0,1J F- (0,11is continuous and strictly increasingeverywhere. ln this case, according to the definition of #n, we have

(nf ljZ ff=l o-..-->snql,Sn(29.51)

for each t e (0,11.What mode of evolution of the variances might satisfy (29.50),and give rise to

this limiting result? ln what we called, in j 13.2, the globally stationary case,where the sequence (c2j)Tis Cesro-summable and the Cesro limit is strictlypositive, it is fairly easy to see that n(/)= t is the only possible limit for(29.50). This conclusion extends to any case where the variances are uniformlybounded and the limit exists; however, the fact that uniform boundedness of thevariances is not sufscient is illustrated by 24.11. (Try evaluating the sequencein (29.50)for this case.)

Alternatively, consider the example in 27.8. lt may be surprising to tind that(for the case

-1

< j)< 0) the partial sums have a well defined limit process evenwhen the Cesro limit of the variances is 0. However, 27.8 is more general than itmay at first appear. Define a continuous function on g0,x)by

2 + (V -(V1)c2gv+1j. (29.52)#(V) = Jvl

lfu%2

satises (29.50),g is regularly varying at intinity according to 2.32. g.

. . ,. x-a

.-- .. - rrv,, rajg .1.j

.,,.j, fjasf o(n + jj =

Page 507: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 487

gln) +g'(n) for integer n, and note that by 2.33 (whichholds for right deriva-tives) g' is also regularly varying. The variance process of 27.8 can be general-ized at most by the inclusion of a slowly varying component.

This is the situation for the case of unweighted partial sums, as in (29.53),the one that probably has the greatest relevance for applications. But rememberthere are other ways to define the limit of a partial-sum process, using an arrayformulation. There need only exist a sequence (',,)Tof strictly increasing func-tions on the integers such that gnfntjjlgnln) --- q(/), and the partial sums ofthe array ((,,jl, where (nf- Nnhl and

62 = (gni)- gnli - 1))/%(?z), (29.53)ni

will converge to #n. And since such a sequence can always be generated by setting.gn(E?l/))= q(/)t7n,where an is any monotone positive real sequence, any desiredmember of the family #n can be constructed from Gaussian increments in thismannec.

The results obtained in j29.1 and j29.2 are now found to have generalizationsfrom B to the class #n. For 29.1 we have the following corollary.

29.10 Corollary Let condition 29.1(a) be replaced by

' F(X(/)) = 0, E(X(t4h = q4/), 0 t S 1.(a )Then X - #n.

Proof Define X'*t) = X(q-1(/)) and apply 29.1 to X*. n-14) is continuous, socondition 29.1(b) continues to hold. Strict monotonicity ensures that if

-1(/1 ...,tm ) define arbitrary non-overlapping intervals, so also do ln (f1),...,-1q tmll, so 29.1(c) continues to hold. .

Similarly, for 29.4 there is the following corollary.

29.11 Corollary Let the conditions (a),(b).and (c)of 29.4 hold, and instead ofcondition 29.4(d) assume

' EX (/))--> 0 and Exntjl) --y

n(/)as n-->

x, each t E g0,1).(d ) a

Then Xn -P-->#n.

Proof The argument in the proof of 29.4 shows that the conditions of 29.10 holdfor X. .

29.12 Example Let (Uill denote a sequence satisfying the conditions of 29.7, withthe extra stipulation that the Q-NED size is

-1.

Detine the cadlag process

Ilnll1xn =

m 77./f4.Gn j=1

(29.54)

This differs from the process a-l/zztnlll-ryc only by the multiplication of the7=

summands by constant weightsy/n, taking values between $Inand 1.The argumentsof 29.6 show that conditions 29.4(a), (b), and (e) are satisfied for this case,and it remains to check 29.11(d'). We show that

Page 508: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

488 The Functional Central Limit Theorem

2 1j3 z, 55)f(Xntf) )'-+

! . ( .

1/2Choose a monotone sequence (bne IN) such that bn -- cxa but bnln --y 0; bn = grl jwill do. Putting rn = fntlbnjfor t e (0,1J and n large enough that rn 1, wehave

r ib (raj'

j n n

xn =

s,, 77 77 jUj + 77 jUj .

Gl'l f=I'=(f-

1)?u+1 j=rnbn-k

The terms in this sum have the decompositionil,k

77 juj = ibnsni-bnsnsi- (19.57)s(f-l)u+1

in which sni = .j=j/-llu+lI4, and si = tabtj-llu+ltz,,lyI4, where anq =

? - j a fp sr(ibn-jybn

s (0,1). The assumptions, and 17.7, lmply that bn Esni) -->

-1 E 5- vs ,4 l = O 1i - f'1-1-)

for 6 > 0. Neithereach i = 1,...,rr, and that bn I ( ni ni-1F(5'*3 nor limsupn&-r,lIEsniv l exceed 0.1, whereas l7-nlIE-i-i') Iandlimsupah,l ni

-1F(5' S* ,)

are of 0( li -

'l-1-)

The same results apply to Sn ra+1 andbn a/ ni .

,

Sn*rs4.l , the analogous terms corresponding to the residual sum in (29.56).2 lti lying out the square of (29.56)after substitut-Thps, consider F(Xn(f) ). Mu p

ing (29.57),we have three types of summand'. those involving squares and products2 ' those involving squares and products of the Sn'iof the Sni ((rn+ 1) terms),

2 d those involving products of Sn'i with Sni (lrn + 1)2((rn+ 1) tenns); an2

-3

gy-ls2) and thisterms). The tenns of the second type are each of Obnn ) =n ,

block vanishes asymptotically. The terms in the third block (givenibn = On)) are-2

-2

of Olbnn ) = orn ), and hence this block also vanishes. This leaves the terms ofthe tirst type, and this block has the fonn

(29.56)

r +1 2 ?b? ra-l.l pjsl )1 n rn n 1 z ni

e' 77ibnsni =

s

-5)

v5 p nl j::z1 Fl n j=1

rn+1 f- l Elsivj j.m)2 - . ,

+ - iqi - m) .

bnn i=1 p1=1

(29.58)

Noting that rnbnln -->

/, applyinj standard summation formulae and taking the limitields (29.55).Thus, according to 29.11, Xn -P-> Sn where n(f)= jP. uyThere is an intimate connection between the generalization from B to Sq and

'the

style of the result in 29.6. The latter theorem does not establish the convergencef the partial sum sequences Xn = Ltllraj, either to B or to any other limit.o

In fact there are two distinct possibilities. ln the first, Knln converges to-1 f t e (0 IJ for some n as in (29.49).lf this holds, there is no loss of11 (?) Or , ,

-1 d nder condition 29.6(9 this has thegenerality in setting Kn = r?m(f)),an uimplication

Page 509: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 489

2 F(x&(n(f))2) .-+

n(/). (29.59)vxn ) -n

In other words, Xn ..-P-A#n by 29.11. Example 29.8 is a case in point, for wlzich1+f5In these cases the convergence of the process

(XX-)to Brownian motionn(f) = t . n

can be also represented as the convergence of the partial sum process (Ak) to #q.On the other hand, it is possible that no such n exists, and the partial sums

have no weak limit, as the following case demonstrates.

29.13 Example Let a sequence (UiI have the property2* 2k+10 a.s., 2 S i < l , k = 0, 1,2,3,...

Ui =

2(0,c ), otherwise.

Thus, 1.71= 0, 174= U5 = U6 = Uy = 0, &16= U1'7=... = &31= 0, and so forth. Let

& = U.Is as before, and put Xn =,t''(1&

j. Then, observe that for :1 < t f 1,ni I n n

k l for k even,1 when n = 2 -

Xn = A-n(1t)with probability1 for k odd.0 when n = 2 -

Since this <cycling' in the behaviour of Xn is present however large n is, Xn doesnot possess a limit in distribution.

However, let Kn be the integer that satisfies

Kn2k- 1 s j < g2k k s y; = y/jy771(2 ,

izu2(29.60)

where 1(.) is the indicator function, equal to 1 when i is in the indicated rangeand 0 otherwise. With this anungement, n counts the actual number of incrementsin the sum, while Ak(1) counts the nominal number, including the zeros; #n+1(1) =

lk i lch case Ak1(1) = 22:+1 The conditionsA%(1)+ 1 except when #n(1) = 2 , n w .

f 29.6 are satisfied with n(f)= t, and .Xfn-P- B. uoIncidentally, since condition 29.649 imposes

Kn 1) 2ExKsjl)

= E y) uninf=1

(29.61)

one might expect that Knqjln --->

1. The last example shows that this is not neces-sarily the case.

To get multivariate versions of 29.6 and 29.9, as we undertake in the nextsection, it will be necessary to restate these theorems in a slightly more generalform, following the lines of 29.10 and 29.11.

29.14 Corollary Let conditions 29.6(a), (b), (c),(d), and (e) hold, and replace29.649 by

' EX%t)l) ..-.#

n(/)as n -- x, for each t e g0,11;(f ) n

then XK -P-> B . The same modifkation in 29.9 leads to the same result. nn 11

Page 510: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

490

ne main practical reason why this extension is needed is because we shall wish tospecify Kn in advance, rather than tailor it to a particular process', the samechoice will need to work for a range of different processes - to be precise, for

every linear combination of a vector of processes, for each of which a compatible

n will need to exist. However, the fact that partial sum processes may converge tolimits different from simple Brownian motion may b,eof interest for its own sake,so that 29.14 (withKn = (rl/1)becomes the more appropriate form of the FCLT.See Theorem 30.2 below for a case in point.

29.5 The Multivariate Case

The Functional Central Limit Theorem

To extend the FCLT to vector processes requires an approach similar in principleto that of j27.7. However, the results of this chapter have so far been obtained,unlike those of j27, without explicit derivation of the finite-dimensional distri-butions. It has not been necessary to use the results of j28.4 at any point.Because we have to rely on the Cramr-Wold device to go from univariate to multi-variate limits, it is now necessary to consider the finite dimensional sets of D,and indeed to generalize the results of j28.4. This section draws on Phillips andDurlauf (1986).

We detineff as the space of v-vectors of cadlag functions, which we endow withthe metric

dx,y) = max fdelxj,y? l , (29.62)Lsjsm

where de is the Billingsley metric as before. dk induces the product topology, andthe separability of D,ds) implies both separability of Dm,Gland also that B? =

BD x Bn x ... x Bo is the Borel field of Dm,db. Also let

J'tg = (';1 tkB) c Dm: B e Bmk /j,...,fk q (0,1J,k e (N) (29.63)1..' . . . :, '

be the finite-dimensional sets of Dm, the field generated from the product of mcopies of Ro. The following theorem extends 28.10in a way which closely parallelsthe extension of 27.6 to 27.16.

29.15Theorem J'f; is a determining class for (Dm, S;).

Proof An open sphere in B? is

5'(x,a)= (y e Dm1d%x,y) < a)

= y e Dm: H , G A s.t. 11111< a, max suply//)-xjhtl)

l <a . (29.64)Lsj<m t

Define, for (/1...,/ke g0,11,k e EN) ,

Page 511: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 491

max max Iyy(/f)- xjlkt) l < a e R?. (29.65)l<./f m 1<ffk

It follows by direct generalization of the argument of 28.10 that, for any.x

q Dmand r > 0,

X (M) 1

sx,r) = UOHkx, r- 1/n) e c(>t/).n=1 k=1

(29.66)

Hence, B? i c(>C)as required. w

The following can be thought of as a generic multivariate convergence theorem,in that the weak limit specified need only be a.s. continuous. It is not necessar-ily Bm.

6 Theoremzg Let Xn e Dm be an rn-vector of random elements. Xn.-2-

X,29.1herePX e Cm) = 1, iff k'Xn

-2-

A'# for every tixed with 'A = 1.W

Proof lf xj e D, j = 1,...,-, TJzzlya) possesses a left limit and is continuous onthe right, since for t e g0,1),

limX hjxjt + :) = X hj lim xjt + :) = X hjxj.s1o j=1 j= ir-ko j=?

(29.67)

Hence, x = @1,...,x,u)'e Dm implies 'x

e D. It follows that A'#n is a randomelement of D. It is clear similarly that x e Cm implies 'x

e C, and hence #('#

e c'')= 1.

sTo prove sufficiency, let gwdenote the sequence of measures corresponding to'X d assume g, = g, Fix /1,...,4 e (0,11,for t'inite k. Noting thatA n, an n .

-1 #) chD e Rp c BD for each B e Bk (see 28.10), the projections are'7:,,.,...,,:(

.,

-1

j rjkpk) Ajtjaough ayj,...,u ismeasurable and vn = jt,:1: rj,..,,r: s a measure on ( , .

X 1 impliesnot continuous (seethe discussion in j28.2), the stipulation jt (0 =

d hence vx = vx by the contin-that the discontinuity points have g,-measure 0, an n

X i the p.m. of a k-vector of r.v.s, and Auous mapping theorem (26.13).Since vn sis arbitrary, the Cramr-Wold theorem (25.5)implies that vn = v, where vn =

-1 i the p.m. of an vk-vector, the distribution of Xnt3),...,Xn(tk).ptnirr1,.,.,?: S

Since ,...,/k

are arbitrary, the finite dimensional distributions of XitConvefge.

To complete the proof of sufficiency, we must show that (g,nlis uniformly tight.Choose A = ej, the vector with 1 in position j and 0 elsewhere, to show that Xnj-P-->Xj; this means the marginal p.m.s are unifonnly tight, and so tg,nlis uni-

' i ht by 26.23. Then Xn -P-.hX by 29.15.formlyt gTo show necessity, on the other hand, simply apply the continuous mappingtheo-

rem to the continuous functional hlx) = A'x. .

Page 512: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

492

Although this is a general result, note the importance of the requirement g,tC) =

1. It is easy to devise a counter-example where this condition is violated, inwhich case convergence fails.

The Functional Central Limit Theorem

29.17 Example Suppose g' is the p.m. on (D,fo) which assigns probability 1 toelements x with

0, t < 12.'t7(/)= .

1, r k z1

1 to elements withAlso, let g. assign probability

0, t < 1a+ Lx =

.

1, t k z1+

lf Xjn - g, all n, and X1n - gw,then clearly (Xn,X1n) -P-> (#1,2-2)= @,x)w.p.1.But X1n - XLn is equal w.p.1 to the function in (28.12),which does not convergein (D,Js). n

Now we are ready to state the main result. Let f#n(f1)) denote the family of

m x 1 vector transformed-Brownian motionprocesses on g0,l 1,whose members are' P ' d a covariance matrix Qdefined by a vector of homeomorphisms (n ,...,n ) an

m x m). If X - #n(Q), the t'inite-dimensionaldistributions of X are jointlyGaussian with independent increments, zero mean, and

E(X(X(t)') = DHt4D',

where D m xpl has rank p, DD' = (2 and H(t) = diaglnlt/),....,

nP(/) ) ,with #(1)

= Ip. ln other words, the yth element of X may be expressed as a linearcombination Xjdjkzk where Z = (Z1,...,Zp)' is a vector of independent orocesses-,p)- lpzxwith Zk - Snk. With p < m, a singular limit is possible. Note, Z = (D .

29.18 Theorem Let (Unij be an array of zero-mean stochastic m-vectors. For ani n K ( ) define XK

= ZY'2(0r.7'increasing, integer-valued right-continuous funct o n.

n , ni,

and suppose that(a) For each fixed v-vector A satisfvin/ A' = 1 there exists a scalar array

.- -

. ,

lcnjl and a homeomorphism q on (0,1) with n (0)= 0 and n (1)= 1, such'U l and lc'f ), withthat the conditions of 29.14 hold for the arrays (A ni n

srespect to n .

' ,(b) Letting H be defined as above with elements.q?

denoting n for the caseA = ej Uthcolumn of the identity matrix), for j = 1....,#,

E(XKXKnt)') --+ DHt)D' as n -- x. (29.68)n

Then XK -P-+ x - #n(f1). nl

A point already indicated above is that under thes conditions Kn must be the samefunction for each A, and must satisfy condition 29.6(d) in each case as well as

' The conditio nX(1)= 1 can always be achieved by a renormalization,29.14(f ).

Page 513: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 493

and simply requires that differences in scale of the vector elements be absorbedin the matrix D.

Proof Consider first the case m = p and D = Im. Condition (a) is sfficient under29 14 for X'XK

-%

B l where this limit is a.s. continuous, for each . The* n 1) ,

K follows by 29.16. The fonn of theconvergence of thejoint distribution of Xn nowmarginal distributions is implied by 29.14, independence of the vector elementsfollowing from condition (b).If D # Im, the theorem can be applied to the array( D'D4-D'Unij , for which the limit in (29.68)is Ht) as before. Since lineartransformations preserve Gaussianity, the general conclusion follows by thecontinuous mapping theorem. .

Theorem 29.18 is a highly general result, and the interest lies in establishinghow the conditions might come to be satistied in practice. While we permit Kn.) :#:

(?T/) to allow cases like 29.13, thse are of relatively small importance, and itwill simplify the discussion if it is conducted in terms of the case Kn = (rl/1.The K superscript can then be dropped and Xn becomes the vector of ordinarypartial sum processes.

Even then, the result has considerable generality thanks to the array formul-ation, and its interpretation requires care. We can invoke the decomposition

f2 = E + A + A', (29.69)where

E = lim TEltlniun'il,n--)x f=l

n i -- l

h = lim Evnvi-mlzi).n--yx f=2 rrluzl

(29.70)

(29.71)

But it should be observed that the conditions of 29.18 do not explicitly imposesummability of the covariances. While E and A are finite by construction, withoutsummability it would be possible to have E = 0. We noted previously that conditiop29.649 appeared to impose summability, but it remains a conjecture that the moregeneral 29.14(f') must always do the same. This conjecture is supported by theneed for a summability condition in 24.6, whose conclusion must hold whenever29.14 holds for the partial sums, but is yet to be demonstrated. Replacing29.14(f') with more primitive conditions on the increment processes would be auseful extension of the present results, but would probably be diftkult at thepresent level of generality.

Note that Q, not E, is the covariance of the process, notwithstanding the factthat #a(Q) is a process with independent increments. The condition f2 = 1, suchthat the elements of Sn are independent, neither implies nor is implied by thecontemporaneous uncorrelatedness s?fthe Uni.While uncorrelatedness at all lags issufticient, with E = I and A = 0, it is important to note that when the elementsof Uni are related arbitrarily (contemporaneouslyand/or with a lag) there always

Page 514: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

494

' -1D' der which the elements of the limitingexistsa linear transformation (D D4 , un

processare independent of one another.As we did for the scalar case, we review some of the simplest sets of sufficient

ditions.Let Uni = Sn-$*U where Sn = E(1=jUiU14. For this chice, D = 1 iscon ih, ,s- 1 7i/2imposedautomatically. If (ti) is unifonnly fv-bounded, choose cni = (A n .

'& /cX is a linear combination of the Ui with weights summing to 1 andThen A nj ni

lpnll'loj/clfllr < x holds for any A, so that conditions (a)and (b)of 29.6 areSt

satisfied. The multivariate analogue of 29.7 is then easily obtained:

The Functional Central Limit Theorem

29.19CorollaryLet ltil beazero-mean,uniforlyAr-bounded--vectorsequence,with each element Q-NED of size - on an a-mixing process of size

-r/(r-

2).,and-S --->

f2 < x. If Xn =>l-1/2jnj1& then Xn -P.-+#(Q). uasstlme n n ,= it

Compare this formulation with (27.82),and as with 29.7, note the importantdifference from the martingale difference case, with fl taking the place of E. Itis also worth reiterating how the statement of conditions is potentially mislead-ing, given that the last one is typically hard to fulfil without a NED size of - 1.

Somewhat trickier is the case of trending moments, where different elements ofthe vector may even be subject to different trends. The discussion here will have

some close parallels with j24.4. Diagonalize Sn as

Sn = CnMncn'. (29.72)where Mn is diagonal of rank m, and Cncn' = Cn'cn= Im.Assume, to fix ideas, thatCn ---> C, which can be thought of as imposing a form of global stationarity on the

- l /2 1/2 z (j.cross-correlations. Then Sn = JN.Cn an

Enfl Illl'lElx-nxn') = Mn-IlC'E X UiTt./: %M-nIl

n

/5!::: 1q l-= 11

- 1/2% .1/2 gytj (gp.gg;= Mn ,, r!---h

,

where the approximation is got by setting Cn to C, and can be made as good asdesired by taking n large enough. The statps of conditions 29.18(a) and (b)mustbe checked by evaluating the elements of Nin (29.73).An example is the best wayto illustrate the possibilities.

29.20 Example Let m = 2, and assume EUiU1-m) = 0 for m # 0, but 1et

;$l 0i ,EUiU1) = C C (29.74)420 i

p1+1 X+1 d H(t) = djagj/l'f'l /2+1j yol. jj ,fort'ixedc. Then, Mn = diagtn , n ) an , .

1 /1+1and /2+1are increasing homeomorphisms on the unit square, and02> -

,

condition 29.18(b) is satisfied. It remains to check 29.18(a). Condition 29.14(f')holds for the array (A'&,,/) with respect to

;t2/p1+1jy 7t2/42+11) (/) = 1 2 , (29.75)

Page 515: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

FCLTS for Dependent Processes 495

2 2 i increasing homeomorphism on the unit square withwhich,since :,1+l2 = 1, s anR 1) = 1 and n(0) = 0 whenever f31,02>

-1.

Assuming that 29.6(b) holds forn (l/2I31 i6z

l , 2 2cni= IIA&,,fIl2= 1,1 j1+1

+ 1,2 I,z+ln #!(29.76)

we can check conditions 29.6(d) and 29.6(e). The latter holds for 'y =. We also

tind that

lt ) 1 E''e+*1Vn , : z= y cni)'=(n.1+l

Xl((f+ 6)71+1- /71+1) + l((/+ 8)C2+1-/r2+1)

=

--> Xl(p1+ 1)fF1+ %(j32+1)f'2 < x (29.77)

as -- 0, where the approximation is as good as desired with large enough n.Condition 29.6(d) is satisfied, and hence 29.18(a) holds. This completes the veri-fication of the conditions. u

Page 516: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

30Weak Convergence to Stochastic lntegrals

30.1 Weak Limit Results for Random Functionals

The main task of this chapter is to treat an important corollary to the functionalcentral limit theorem: convergence of a particular class of partial sums to alimit distribution which can be identified with a stochastic integral with respectto Brownian motion, or another Gaussian process. But before embarking on thistopic, we tirst review another class of results involving integrals, superficiallysimilar to what follows, but actually different and rather more straightforward.There will, in fact, turn out to be an unexpected correspondence in certain casesbetween the results obtained by each approach,

For a probability space (f,T,#), we are familiar with the notion of a measurablemapping

/': (

where C is Cw,lj as usual. We now want to extend measurability to functionals onC, and especially to integrals. Let F(f) = fds: C i- R denote the ordinaryRiemann integral of f over g0,/q.

30.1 Theorem If f is F/oc-measurable, the composite mappingFofl 1--.yR

is T/f-measurable for t e (0,11.Proof It is sufticient to show that F(f) is continuous on (C,Jg). This followssince, for G(f) = Jo'tds, g E C, and 0 f t < 1,

' lJ - ,q lds S sup If - gs) I. w (30.1)lF - G(/) I S jo s

This shows that F(f) is a random variable for any /. Now, writing

F: C F-.y C

as the mapping whose range is the set of functions assuming the values F(/) at t,it can further be shown that F is a new random function whose distribution isuniquely found by extension from the finite-dimensional distributions, just as forf The same reasoning extends to F2(f) = j't ds, to X, and so on.

Other important examples of measurable functionals under du include theextrema, suprl/f) J and inff(J(/)J. As a simple example of technique, here is aningenious argument which shows that if B is standard Brownian motion, suprttf) Jhas the half-nonnal distribution (see(8.26)).Consider the partial sum process Sn

Page 517: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Gdtzk Convergence to Stochastic Integrals 497

= I7=l(f,where the (j are independent binary r.v.s with #((f = 1) = P(Li=-1)

=

1a.Straightforward enumeration of the sample space shows that

# nnax Si k an = 2#(in > an4 + Psn = Jn),

1S fKn

(30.2)

for any an k 0 (seeBillingsley 1968: ch. 2. 10). Since this holds for any n, onputting an = Wa the FCLT implies that the limiting case of (30.2)applies to #,in respect of any constant (x k 0. This also defines the limit in distribution ofsuptxntfll for every process Xn satisfying the conditions of 29.4. This is a neatdelonstration of the method of extending a limit result from a special case to ageneral case, using an invariance principle.

Limit results for the integrals (i.e,sample means) of partial-sum processes, orcontinuous functions thereof, are obtained by the straightforward method ofteaming a functional central limit theorem with the continuous mapping theorem.

30 2 Theorem Let S () = 0 and Snj = S=1Uni for j = 1,...,n - 1. If Xnt) =

- '$x

s'n,gpwj,assume that Xn *',---+ #n (see29.11). For any continuous function g: R p-p R,

n-1 1

' sgso-p-->

Jgla-jvt.N. 0

(30.3)

Proof Fonnally,(/+1)/n +1)/?1

gsn = ng) dt = n gxntdt.jIn jIn

(30.4)

Hence,

j. n- 1 n- 1 (/+1)/n 1

-n77'(5k)= D-/jjjn '(Xn(f))J/ = jonxntdt.j= /=0(30.5)

1 t44dt x e c, is a continuous mapping from c to R, the result followssince ogx ,

by the continuous mapping theorem (26.13)..Note how gsnn) is omitted from these sums in accordance with the convention thatelements of D are right-continuous. Since the limit process is continuous almostsurely, its inclusion would change nothing material. These results illustrate theimportance of having 29.14 (withKnl = gz7/J)as an alternative to 29.6 as a

K d fined inrepresentation of the invariance principle. The processes Xn(f) are e(0,1q, and cannot be mapped onto the integers 1,...,n by setting t = jln. There is

K i the manner of (30.3),no obvious way of detining the sample average of gxn) nand for this purpose the partial-sum process Xn with limit Sn has no substitute.

The leading cases of g.) include the identity function, and the square. For they-1y

=former case, 30.2 should be compared with 29.12. Observe that Z .1 nj!-1 If U = n-IlU /c reversing the order of summation ln 29.12Z2=1(n- uni. ni y-f ,

hows, in effect, that n-l'k-'l5k -P-> #n(1), for the case n(/) = jf3.In others J

words, J(/dt - N, j.1).'f'

'f'

. . - . a %. - .- -

!- - . - - - - -. - -.

!--- -- .. .-

.- - - !- - -

1- .. - .- r- -

A.1-.. .j?. .. .. .-. ! .... ... ...

Page 518: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

498

2 h limit results do not generally yield closedlimit for the case g.4 = (.) . T eseformulae for the c.d.f., so there are no exact tabulations of the percentag.points such as we have for the Gaussian case. Their main practical value is illetting us know that the limits exist. Applications in statistical inferenc,usually involve estimating the percentiles of the distributions by Monte Carllsimulation', in other words, tabulating random? variates generated as the averageof large but finite samples of g evaluated at a Gaussian drawing, to approximabintegrals of #(#n). Knowledgeof the weakconvergence assures us thatsuch approximations can be made as close as desired by taking n large enough.

Given a basic repertoire of limit results, it is not difficult to find th(distributions of other limit processes and random variables in the same manner. T(take a simple case, if (Uij is a sequence with constant variance c2, an(

-111g%/c .-P-hBt) where s'knrj= t;:(1&j,we can deduce from the continuou:11. (nrjmapping theorem that the partial sums of the sample mean deviations converge tlthe Brownian bridge', i.e.,

The Functional Central Limit Theorem

1LX'f'1

X Ui - l -P-> B - /#(1) = Bo,1/2

Gn /=1

(30.6

here'Un=?z-1'3 qUi. On the other hand, if we express the partial sum proces!W j=

itse% in mean deviations, Sj - L where L = rl-1Z6-15'we find convergenctJ=0 jn

according to

' s-o--)

-2-

(?)-jlss.

1/2Gn 0

The limit process on the right-hand side of (30.7)is the de-meaned Brownimotion. One must be careful to distinguish the last two cases. The integral of th(latter over (0,11is identically zero. The mean square of the mean deviation!

converges similarly, according to

j s-1 1 1 2

psaXy.sj-

n)2.-0-+ joBlds

-

joBds. (30.8'J

There is also an easy generalization of these results to vector processes. Thefollowing is the vector counterpart of the leading cases of 30.2, the details olwhose proof the reader can readily supply.

30 3 Corollary Let (Unil satisfy the conditions of 29.18. lf Snj = S=1I&j, ther

n-1 11 o- Tsnj ---y Bvdt,

/=1

n-1 11 , o ,

- /snjsnj--- BlBldt. nn c/=1

(30.9)

(30.10)

Page 519: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W'ctzkConvergence to Stochastic Integrals 499

Note in particular that for #, the m-dimensional standard Brownian motion,

J()#ldt - N(0 ,,1Im).

The same approach of applying the continuous mapping theorem yields animportant result involving the product of the partial-sum process with itsincrement. The limits obtained do not appear at first sight to involve stochasticintegrals, although there will turn out to be an intimate connection.

30.4 Theorem Let the assumptions of 30.2 hold, with n(1)= 1. Thenn-1

X S U 1.--AD 1(z

2( 1) - /2 )nj n, j+ 12 ,

jzzz1(30.11)

here0.1- = lims-p-n-llr,-lcznj.W

ing S = )=1Uni = Snvj-j + Unj, note the identityProof Lett nj

2 2 zs s + g2Sn,j+3 = Snj+ nj ,,,./+1 n,./+1.

Summing from 0 to n - 1, setting Sn0 = 0, yields

(30.12)

n-1 n-1 nl l 2 ')

s y + (/2Snn = Sn,j+j - Snjt = - na; n,af+I nj,

./=0 j=1.j=1

(30.13)

- j nn

1 2 2Snjun.j-vk=

r Snn- Unj .

j=1./=1

(30.14)

Under the assumptions, snn-P-+ #n(1) - N@, 1) and lzzjunli.T-r.->.1.

The result

followson applying the continuous mapping theorem and 22.14(i). w

This is an unexpectedly universal result, for it actually does not depend on theFCLT at al1 for its validity. It is true so long as (Uiit satisfies the conditions

2 i -EU i-muni), the left-handfor a CLT. Since W = 1 - lh where , = limn-yxZ2=2Z,?,=1n,

side of (30.11) has a mean of zero in the limit if and only if the sequence (Unitis serially uncorrelated.

There is again a generalization to the vector case, although only in aicted sense. Let Snj = S=1&nf,and then generalizing (30.12)we have therestr

identity

Sn./+1u%'

j+1 = Snjsn'j+ Snjun'./+1 + Un,j-vjsn'j+ &n,./+1&n',./+1. (30.15)1 : '

summingand taking limits in the same manner as before leads to the followingresult.

30.5 Theorem Let (Unil satisfy the conditions of 29.18. Then

Tsnjun',j-v+ Yun,j,vksn'j.-2-A #c(1)#n(1)'-Ej=3 7=l

-./(4121#()1()'

- E. ::1 (30.16)Details of the oroof are left to the reader. Th oeculiaritv of this result is

Page 520: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

500

that it does not lead to a limiting distribution for the stochastic matrix-1 C'-'S U' This must be obtained by an entirely different approach, whichn

ZJ -1 nj n,.j+l .

is explored in j30.4.

The Functional Central Limit Theorem

30.2 Stochastic Processes in Continuous Time

To understand how stochastic integrals are cofkstructed requires some additionaltheory for continuous stochastic processes on (0,11.Much of this material is anatural analogue of the results for random sequences studied in Pal't 111.

A yltration is a collection (@(/),t e g0,1J) of c-subfields of events in acomplete probability space (L-j,5,P) with the property

s c T(J) when t S s. (30.17)The tiltration f5t) J is said to be right-continuous if

st) = @(/+) = OF(.). (30.18)s>t

A stochastic process X = (X(/), t e (0,1)) is said to be adapted to (5(t4) if Xis Tt/l-measurable for each t (comparej 15.1). Note that right-continuity of thetiltration is not the same thing as right-continuity of X, but if X e D (whichwill be the case in all our examples) adaptation of X(t) to 5(t) implies adapta-tion to T(/+) and there is typically no loss of generality in assuming (30.18).

A stronger notion of measurability is needed for defining stochastic integralsof the X process. (X(/) ) is said to be progressively measurable with respect to(T(/)J if the mappings

X(.,.): f x r0,/1F.-: R

are T(/) () tRgtl,fj/f-measurable,for each t e (0,1). Every progressively measurableprocess is adapted

'ust

consider the rectangles E x g0, for E e 5t)) but theconverse is not always true; with arbitrary functions, measurability problems canarise. However, we do have the following result.

30.6 Theorem An adapted cadlag process is progressively measurable.

Proof For an adapted process X e D and any t e (0,1j, define the simple processon (0,/1:

X(,,)(,-) = A), 2-*/c), s e (2-''W- 1), 2-*/c), k = 1,...,(2''/J, (30.19)with Xnllttl = Xnt). Xn) need not be adapted, but it is a right-continuousfunction on f x (0,/j.lf A1 = (): 2(), 2-Q) S

.x)

e 5t), then

Xx = t(,J): A%)(),.) S .Yl

= Ug2-N(k- 1), 2-lk) xf'l t.p l/) xAn'jzafjx.l

k(30.20)

is a finite union of measurable rectangles, and so Ax e T(/) (&tBp,lj.This istrue for each .x

q R, and hence Xn) is 5(t) (8)fm,rj/f-measurable. Fix and s,

Page 521: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Weak Convergence to Stochastic Integrals 501

and note that for each nX(n)(),J) = X(,If), (30.21)

where u > J, and u 1.s as n --- x. Since Xnu)--> X(),J) by right-continuity, it

follows that X(n)(),.)-.-: X(,J) everywhere on f x E0,/1and hence X is

st) (&fm,fj/f-measurable (apply3.26). This holds for any f, and the theoremfollows. w

Since we are dealing with time as acontinuum, we can think of the momentat which

some event in the evolution of X occurs as a real random variable. For example,the first time #(fl exceeds some positive constant M in absolute value is

F()) = inf (f: X(),f) I > MI. (30.22)t e g0,IJ

F() is called a stopping time of the filtration ( ;4/), t e (0,1)) if ( ): F() ;/) E 5(t) (comparej15.2). It is a simple exercise to show that, if X is progres-

F here XF(/)= Xt A F).sively measurable, so is the stopped process X w

Let X e D, and let X(f) be an Ttfl-measurable r.v. for each t E g0,1J. Theadapted pair (X(/),9;(f) ) is said to be a martingale in continuous time if

supE lX 1f

(30.23)

(30.24)Exs) IF(/)) = X(t) a.s.(#), 0 S t K s K 1.

lt is called a semimartingale (sub-or super-) if (30.23)plus one of the inequal-ities

s(x(,,)I@(,))'.(kj

x(,) a.s.zq, o < t u s s 1 (3o.2s)

hold. One way to generate a continuous-time martingale is by mapping a discrete-time martingale fSp5jI1into g0,1j, rather in the manner of (27.55)and (27.56).lf we let X(f) = 5'gn/j+l,this is a right-continuous simple function which jumpsat the poilgs where En/l= nt. lt is Tt/l-measurable where T(/) = Fgnf!+1,and thecollection l@(f),0 S t < 1J is right-continuous.

Properties of the martingale can often be generalized from the discrete case.The following result extends the maximal inequalities of 15.14 and 15.15.

30.7 Theorem Let ((X(,5(t)) t E (0,1)) be a mmingale. Then

A'lX 1P(i) # sup 1#4J) l > : < , p 1, (Kolmogorov inequality).zp(E g(),f1

(ii) E sup 1.YtulIP

-:(E ((),/1(,#j)PsIx(/)I,,

p > 1, (Doob inequality).

Proof These inequalities hold if they hold for the supremum oter the intervalE0,t),noting in (i) that the case s = t is just the Chebyshev inequtity. Given a

Page 522: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

502

discrete mmingale fSk,5kI=twith m = g2''/q,define a continuous martingale Xn)

on g0,/1as in the previous paragraph, by setting Xnls) = '(2n,)+I for s eg0,f), with Xnj = X(n)(f-) =

,%2nr). The inequalities hold for Xn) by 15.14and 15.15, noting that

sup IA)a)(.) IF = max Is'klr (30.26)JG (0,8 lKkfrzl

The Functional Central Limit Theorem

for p 1. Now, given an arbitrary continuous martingale (X(/), T(/)), a discretemartingale is defined by setting

sknskj = (X(2-Q), F(2-&k)),k = 1,...,E2't/1. (30.27)For this case we have X(n)(,)= X(u) for u = 2-''(L2'' + 1), so that u 1 s as n

-->

x. Hence X(n)(J) --> Xsj for s E (0,/), by right continuity. .

The class of martingaleprocesses we shall be mainly concerned with satisfy twoextra conditions: almost sure continuity (PCXe C) = 1), and square integrability.

iA martingale X is said to be square-integrable if E(X ) < oo for each t G g0,1q.For such processes, the inequality

F(x(.)2I5tl) = xl + F(gx(-)-x(f)j2j

y(f)) k x(f)2 (g().28)2 i bmartin-holds a.s.(#q for s 2 t in view of (30.24),and it follows that X s a su

gale. The Doob-MayerbM) decompositionof an integrable submartingale, when itexists, is the unique decomposition

X(t4 = Mt) +A(/), (30.29)where M is a martingale and A an integrable increasing process.

The DM decomposition has been shown to exist, with M unifonnly integrable, ifthe set fX(/), t e 7') is unifonnly integrable, where 5 denotes the set of stoppingtimes of (@(f)) (see e.g. Karatzas and Shreve 1988: th. 4.10). ln particular,suppose there exists for a mmingale (X(/),T(l)) an increasing, adapted stochas-tic process ((A')(/), 5 l on g0,11,whose conditionally expected variations match

athose of X almost surely; that is,

f ((X('V) l7(/)) - (A')(J) = F(X(.)2I5t - X(/)2 a.s.g#J (30.30)for s k t. Rearranging (30.30)gives

Exls) - (A')(.) l@(/)) = Xl - (A')(f), a.s.(#J, (30.31)2 @(f)) is a martingale, and this process accord-which shows that fX(/) - (A')(/),

2 An increasing adapted processingly defines the DM decomposition of X .

((A')(f),@(/)l satisfying (30.30),which is unique a.s. if it exists, is called thequadratic variation process of X.

30.8 Example The Brownian motion process # is a square-integrable martingalewith respect to the filtration 5t) = c(#@), s < t). The martingale property isan obvious consequence of the independence of the increments of B. A specialfeature of B is that the quadratic variation process is deterministic. Definition

Page 523: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wctzk Convergence to Stochastic Integrals 503

27.7 implies that, for s k t,

f'(#(.)2IF(f)) - B(t)l = E((B(s) - #(4j2j @(/))

=x - t, a.s.j#), (30.32)2 is a martingale; that is,and rearrangement of the equality shows that #(/) - t

(#)(/) = t. Q

Two additional pieces of terminology often arise in this context. A Markovprocess is an adapted process (X(f),@(f)) having the property

Pxt +-)

e yt IT(/)) = Pxt +.)

e A lc(A/)) a.s.g#l (30.33)for 4 e B and t, s k 0. This means that all the information capable of predictingthe future path of a Markov process is contained in its current realized value. Adlffusionprocess is a Markov process having continuous sample paths. The samplepaths of a diffusion process must be describable in terms of a stochasticmechnism generating infinitesimal increments, although these need not beindependent or identically distribted, nor for that matter Gaussian. A Brownianmotion, however, is both a Markov process and a diffusion process. l&e shall notpursue these generalizations very far, but works such as Cox and Miller (1965)orKaratzas and Shreve (1988)might be consulted for further details.

The family Sn defined in (29.49)are diffusion processes. They are also martin-gales, and it is easy to verify that in this case (#n)= n. However, a diffusion

process need not bea martingale. An example with increments that are Gaussian butnot independent is X(f) = 0(f)#(/) (see27.9). Observe that

Elxt +.)

- X I@(p) = (0(f+ s4 - tljB

- (0((+,)')-

1)x(,) .o, a.s.g#q. (30.34)

A larger class of diffusion processes is defined by the scheme X = 0(f)#n(/),for eligible choices of 0 and n.The Ornstein-uhlenbeck process (27.10)is another

li he class B is the only one we shall be concerned with here.example. owever, t n

30.3 Stochastic lntegrals

ln this section we introduce a class of stochastic integrals on (0,1j. Let(M'(f),T(f)) denote a martingale having a deterministic quadratic variation process(M). For a function J e D, satisfying a prescribed set of properties to bedetailed below, a stochastic process on g0,1j will be represented by

' J(,I)#M(,I), E0,11, (30.35)Innt) = jomore compactly written as I = IJtdM. The notation corresponds, for fixed , towhat we would use for the Riemann-stieljtes integral of

.f(,.)

over g0,/Jwith

respect to M(%.). However, it is important to appreciate that, for almost every

Page 524: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

504

, this Riemann-stieljtes integral does not exist; quite simply, we have notrequired M((l),.) to be of bounded variation, and the example of Brownian motionshows that this requirement could fail for almost all . Hence, a differentinterpretation of the process I is called for.

The results we shall obtain are actually available for a larger class of inte-grator functions, including martingales whose quadratic variation is a stochasticprocess. However, it is substantially easier to prove the existence of the inte-gral for the case indicated, and this covers the applications of interest to us.

We assume the existence of a filtration (@(/),t e E0,11J, on a probabilityspace (f1,T,#). Let

(x: (0,1J F-- R

be a positive, increasing element of D, and let G(0) = 0 and a(1) = 1, with noloss of generality aj it turns out. For any t c (0,11, the restriction of (x to(0,/1 induces a finite Lebesgue-stieltjes measure. That is to say, a is a c.d.f.,and the function Jslat-lassigns a measure to each # e fm,/j. Accordingly we candefine on the product space ( x (0,/1,5(t4 @fgt),rj)the product measure gu,where

The Functional Central Limit Theorem

A) = Jjt1x(),.)#a(.)J#()) = E J'1x(,.)#G(.)yttxtn f) 0

for each A e @(/)(&fgtl ,j.

Let La denote the class of functions

.f: ('1 >--> S(0,11

whichare (a)progressively measurable (andhence adapted to (@(/))),and (b)square-integrablewith respect to pu; that is to say, 11.f1)< cxa where

(30.36)

1 1/2

11f11= E jofld (30.37)

It is then easy to verify that 11f-:11

is a pseudo-metric on La. nile 11.f- g 1i= 0does not guarantee that fll = gl for every (0 e f1, it does imply that theintegrals of f and g with respect to a Will be equal almost surely g#j.In thiscase we call functions f and g equivalent.

The chief technical result we need is to show that a class of simple functionsis dense in La. Let Ea c La denote the class such that J(f) = ft for t e(/s4+1), k = 0,...,- - 1 and f(1) = f41-), where f/1,...,/,,;)= H,u is a partitionof (0,11for sonne sv e :.

30.9 Lemma (afterKopp 1984) For each f e La, there exists a sequence (J(r,)e Ea,

n c s ) with ll/y)-.f11

--> 0 as n -- x.

Proof Let the domain of f be extended to R by setting f(f) = 0 for f e g0,1).Bysquare-integrability, J1:/(0,Y#q(/)< cxa a.s.(#1, and

+- ldg --.,

o a.s.rzr as h --+

0;J-.(.f(,f) -

.f(t9,f))

honno

Page 525: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

W-etz/cConvergence to Stochastic Integrals 505

+= olim E j..(J@ + h4 - flfds) = 0 (30.38)h.-O

by theboundedconvergence theorem.This holdsforany sequenceofpoints going to0, so, given a partition H,u(n)sueh that 11H,,0)11-- 0 as n

-->

x. and t e g0,11,consider the case h = kn - /, where

kn = ff, f e If, h+1),i = 0,...,v - 1, (30.39)lk(1) = /,,,-1. (30.40)

Clearly, knl -->

/. Hence, (30.38)implies that

j+-s(j' -fs-vkno)

- gs+/))2#a(f))Ja(,)

-x- 1

- j' s (j+-(y(,+/o(,))-

?.(,+,))ua(,)):a(,)-1 -x

' s(j+-(y(,+1-(,) -,)

- y(s))2Ja(,) dzo- f-1---

-- 0 as n--)

x, (30.41)

where the first equality is an application of Fubini's theorem. Since the innerintegral on the left-hand side is non-negative, (30.41)implies

l fs + knb - fs + /))2Ja(f) = 0lim E j.jn-x

(30.42)

for almost all s e R. Fixing J' et - s gives

r0,11 and making a change of variable from t to

1+J

1imE js-jft-lnt4) - J(/))2Ja(/) = 0,n--x

(30.43)

where ln = knt - J) - t - s4. Detine a function

ft- ln(f)), /+ ln e 19,11fn4l =

0 otherwise,

noting that hnj = J(/f+.)

for f e Eh+s, /s1 + J) f'-h g0,11 and hence hnje E(t. Given (30.43),the proof is completed by noting that g0,1Jc g, - 1, 1 + Jj,

and hence

(30.44)

s(j-'(.f.)(f)-.f(,))2Ja(/))- s(J'--(/(-)(,)-y(,))2#a(/))0 J-1

ss(j'+-(y(/+&(,))

-.f(/))2Ja(,)).

x-1(30.45)

Page 526: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

508 The Functional Central Limit Theorem

'sls - kz2(1)- 1).

o z(30.54)

Put t = 1 and compare this with 30.4. It is apparent (andwill be proved rigor-ly in 30.13 below) that the limit in (30.11) can be expressed as IoBdB + Xous

,1 1 - d2) as before. nwhere = z( ,

A form of It's rule holds for a general class of continuous semi-martingales. Theproof of the general result is lengthy (seefor example Karatzas and Shreve 1988or McKean 1969 for details) and we will give the proof just for the case of 30.11,to avoid complications with the possible unboundedness of g''. However, there islittle extra difficulty in extending from ordinary Brownian motion to the class ofdiffusion processes Sn.

B dBv = r1(#q(/)2- q(j)), a.s.30.12 Theorem j n0

Proof Let IL denote the partition of g0,/1in which q.= tjln for j = 1,...,n. UseTaylor expansions to second order to obtain the identity

n-1

# (f)2= F-)(#n(j+j)2- B)ltjllln

j:,zft

n-1 n-1

=277#c(/./)(#n(/./+l)-#n(/./))

+ 77(#c(J./+1) - #n(J./))2.

j= y=0(30.55)

We show the f,z convergence of each of the sums in the right-hand member. #1)eLn, so detine pn e En by

pn = #c(J./), s e L/./,#+1),j = 0,...,n - 1, (30.56)and pns) = #n(J) for t < s S 1. This is a construction similar to that used inthe proof of 30.9, and llpn- An11--> 0 as n

.--y

x. we may write

n- 1 rl- 1 t;+L

77#n(/./)(#c(/./+l)- #n(/;)) = 77'

#c(#)J#n(-)y)

.j=0 b

',-(-)#s.(.),- (30.57)

and it follows thatn-1 (t 2 :t l

E X#c(tJ(#n(#+1)- #n(fy)) -

JoBdB, = E Jopn- fnl#fn

j=0

=Il4,n-#n)1m,r,II2--> 0.

Considering the second sum on the right-hand side of (30.55),we haven-1 2

E 77(#c(+l)-#c(/./))2

- n(J)

(30.58)

Page 527: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

WretzkConvergence to Stochastic Integrals

n-1 2

=E 77E(#c(/sl)- #c(#))2 - (n(/./+1)- n(#))2j=

n-1

= F/E ((#c(f./+1)-#n(/./))2

- (n(/.j+1)-n(/y))12

j=0

n-l

=2 77(n(#+1)- n(f./))2

s 2nq) max fn(/s1)- ntf./l)0 %s n- l

<-->

0 as n-->

x.

509

(30.59)

The second equality here is due to the fact that the cross-products disappear inexpectation, thanks to the law of iterated expectations and the fact that

ELB(b+j) - #f:/7,)()21?77(p)(I= n(/sl)- n(/j). (30.60)The third equality applies the Gaussianity of the increments, together vvzith9.7for the fourth moments, and the inequality uses the continuity of n and the factthat IlHnll -->

0.2 b decomposedas thesumof sequencesconverging infu-nonm to,Thus,#n(/) can e

respectively, lotBjdBj and n(/).However, according to 18.6, L1 convergenceimplies copvergence with probability 1on a subsequence (nkJ.Since the choice ofpartitions is arbitrary so long as IlHnkII---> 0, the theorem follows. .

The special step in this result is of course (30.59).ln a continuous function ofbounded variation, the sum of the squared increments is dominated by the largestincrement and so must vanish by continuity, just as happens with n(/)in the lastline of the expression. It is because it is unbounded a.s. that the same sort ofthing does not happen with #n.

30.4 Convergence to Stochastic Integrals

Let fUnjj and l JI;,) be a pir of stochastic arrays, let Xn = J=ntl(%and l',,(f)

= (n(1< and suppose that Xn,Yn)-P-+ (#x,#y)where Bx and ly are a pair of7= njt

transformed Brownian motions from the class #n, with quadratic variation pro-X d 1' the latter being homeomorphisms on the unit interval. In whatcesses n an n ,

follows it is always possible for fixing ideas to think of Sx and Bv as simplex 1,Brownian motions, having n (/) = n (t4= t. However, the extensions required to

relax this assumption are firly trivial. The problem we wish to consider is theconvergence of the partial sums

n-1 jGn = Uni 1L,j+1

j= f=1

Page 528: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

510 The Functional Central flrrlf/ Theorem

n-1

= /xnlilnqrn''' 1)/n) - FnU/f)).j=$

(30.61)

This problem differs from those of j30.1 because it cannot be deduced merely fromcombining thefunctional CLT with thecontinuous mapping theorem. None the less,it is possible to show that the convergence holds unde? the conditions of 29.14.

The following theorem generalizes one given by Chan and Wei (1988).See interalia Strasser (1986),Phillips (1988),Kurtz and Protter (1991),Hansen (1992c),for alternative approaches to this type of restllt.

30.13 Theorem Let (UnpWnjkbe a (2x 1) stochastic array satisfying the condi-tions of 29.18 for the case Kn = g>l/).ln addition, assume that both Unj andWk are Q-NED of size

-1

on (Fa) . Then

-p-:jlpxdnv+ axsca 0(30.62)

where, with hnjk= EunjWn,n- 1 i- 1

Axy = lim h.aj-,,,,j+l . nn-jx f=1 pncu'o

(30.63)

An admissable case here is Unj = Bk, in which case the relevant joint distribu-tion is singular.

Setting the AZ-NED size at-1

ensures that the covariances are summable in thesense of 17.7. This strengthening of the conditions of 29.18 is typically onlynominal, in the light of the discussions that follow 29.6 and 29.18. However, becareful to see that summability is not required to ensure that lAxyl < x, whiehholds under the conditions of 29.18 merely by choice of normalization. Its rolewill become apparent in the course of the proof.

Proof The main ingredient of tllis proof is the'Skorokhod representation theorem,26.25, which at crucial steps in the argument allows us to deduce weak convergencefrom the a.s. convergence of a random sequence, and vice versa. Let Xn,Yn4be anlement of the separable, complete metric space (D2,#s2)(see j29.5). Sincee

(Xn,l%) '-C.'> Bx,BY4 (30.64)' h im lies the existence of a sequence ((X'',F&) e 172by 29.18, Skorokhod s t eorem p ,

n e KJ such that xnnj'n)is distributed like (.Y'',F'), and Js2((#1,Fl), (#x,#y))--F-&'A0. According to Egoroff's theorem (18.4)and the equivalence of ds and Js inD, (30.64)implies that, for a set Ce e 5 with PCe) 1 - E,

supJJ2((r((o),I'''((,))), (#x((o),#y((o)))--+

O (30.65)e Ce

for each ir > 0. Since Bx is a.s. continuous, there exists a set Ex with #(Fx) = 1and the following property; if f.tl e Fx, then for any n > 0 there is a constant 8 >n evlt..la that if dctxnj-Bajjj K 8.

Page 529: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Ftz/c Convergence to Stochastic Integrals 511

sup1A'n(,/) - Bxtb,tl 1t

S sup 1A'''((0,/)- B/ln,ktjl IJ

+ sup IB/,tl - B/,klttl lf

(30.66)

where 14.)) is the function from (28.7).The same result holds for F in respect of

a set Ev with PEy) = 1. It follows from (30.65)that, for (1 e C1 = Cz f''h Ex f-h Ey,dl((A'''(t,)),F''()), (#'(t,)),#i'(t'))))= ,,

-- 0, (30,67)where the equality defines n. Note too that PC* = P(C.

For each member of an increasing integer subsequence flk,n e IN), choose anordered subset (n1,n2,...,n) of the integers 1,...,n, with nkn = n, such thatminlsjxkaln./- nj-j )

--y

x. Use these sets to define partitions of (0,1J,Fln =

f /1,...,/u) , where (j = njln. Assume that fkn) is increasing slowly enough that2 0 d k In

-->

0 but note that provided kn 1- x it is always possible tokn&n--> an n ,

have IlIX11--- 0. For example, choosing nj = Lnkjlknjwill satisfy these conditions.The main steps to be taken are now basically two. Define

< 8+n,

knGn*= 77.Xk(f./-1)(1&(/./)- Fn(/./-1)),

j=1(30.68)

andalso let G*nrepresent the yame expression except that the Skorokhod vaiablesA''' and F'' are substituted for Xn and l%. ln view of 22.18 and the fact that G*n

d Gn*have the same distribution, to establish G1 .-C-yLBxdBv it will suffice toanthatprove

1G*n- j BxdBv -PC->0.

c(30.69)

The proof will then be completed, in view of 22.14(i), by showing

Gn - Gn*--P-r.-4Axy. (30.70)The Cauchy-schwartz inequality and (30.67)give, for each ) e C,

2,kn77(A'''(to,//-1) - #x(,/./-l))(l'n(,/./) - Ynkb,tj-tllj=1

kn kn

f 7(X''(,/./-1) - #x(?/./-1))277 (F'),/./) -

1M(),/./-1))2

/=1 /=1kn

< k :2 7)(Fn((l),/j) - F''(,j-1))2.nn

j=3(30.71)

Also the assumptions on Fn, and equivalence of the distributions, imply that

Page 530: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

512 Ihe Functional Central Limit Theorem

n 2

Evntp - ra(fj-1))2= E Iyuft'=nj-1+ 1

and hence from (30.71),i' y

-->n (Q-

n (tJ'- l) < cs (30.72)

k ln

E 7-7(A'''(f./-1)-Bxtj-jqqYnt - l'n(f./-l))1c)j=3

(30.73)

Closely similar arguments give

k 2n

E F7(1'a(f./)- BytjstBxb) - #x(f./-l))1cl/=l

(30.74)

and also

(30.75)

We now use the method of summation by parts' ; given arbitrary real numbersfaj,bj,aj,b, j = 1,...,/c1 with ao = bo = co = fstl= 0, we have the identity

f(OX1)(FX(1)-#y(1))1cp)2

f-->

0.

. k k

aj- 1(d7./- bj- 1) - xj-1($ - f- 1)

j=1./=l

k k

= (tn-1- j-tbj - Y-1)+ aklbk - X) - (bj- 1$)((x./- a./-1).j=1 j=j

Put k = kn, aj = Xnb,tp, bj = F''(,f.j), j = #x(,fJ, and I = Byl,tj). Thenthe left-hand side of (30.76)corresponds to G*n- Pn, where

(30.76)

knPn = T'-Bxftj-llByt? - #y(f./-1))

j=

kn q= 7) Bxjh-jldByt),

j=1 '- 1(30.77)

and the squares of the right-hand side terms correspond to te integrands in(30.73), (30.74),and (30.75).Since 6: is arbitrary, #4C) can be set arbitrarilyclose to 1, so that each of these terms vanishes in fu-norm. We may conclude thatlG*n - Pn l

-fJl+ 0. So, to get (30.69),it suffices to show that IJ'n - J(/xdBvf-il+ 0. But

2 kn tj l1E Pn -

joBxdBy= E X f (B/tj-jj - BxljdByt)

j= 1 *'

tj- 1

Page 531: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Weck Convergence to Stochastic Integrals

kn rfy

=771 (n-Xf)- nX(f./-l))Jn'k)jzzl *'

tj- 1

maxlnX(f./)

- nX(/j-1)lY(1)--->

0 (30.78)1<.j< kn

wherethe second equality applies (30.51)and then Fubini's theorem, and theX This completes the proof of (30.69).convergenceis by continuity of n .

To show (30.70),we use the fact thatl1j.--1

vntj)- 1-a(/./-1)= 77(I'n((f + 1)/n) - Ynilnq),icnj. j

and sokn nj- 1

Gn - Gn' = 7-7 77xnilnjYni + 1)/rl) - Ynilnj)j= izmj-L

- Xnb-qYnt? - Fn(f./-1))

kn nj- 1

=77 77 xnilnb--n(f./-1))(Fn((+ 1)/n) - Ynilnbbj= i=nj-L

kn nj- 1 i-nj-L

= Unj-s;Nn,j+l'=1 icmj-k rtl::::o7

kn ny-ny-l - 1 nj- 1

= Unj-m<a,j+1,

J'=1 mzu.l i=m-nj-L

(30.79)

wherewe have formally set Uno = 0. The final equality represents the shift fromsummingthe elements of a triangular array by rows to summing by diagonals, and

weuse whichever of the two versions of the expression is most convenient. ln view

of (30.63),Gn - Gn*- Axy = As - Bn, where (summingby diagonals)kn rly-nsl- 1 nj- 1

Xn = (Un,--Vn,f+1 - Xn,-rn,f+1)./=1 ,n=0 izzm-nj-b

(30.80)

and (summingby rows)

kn rj-l /-1

Bn =n j-w,j+l .

J.=1 izznj-k =f-a./-1+1

(30.81)

The problem is to show that both An .-F-r-> 0 and Bn--> 0.

choose a finite integer N and break up An intp N+ 1 additive componentsA n,...,Anx, where by taking n large enough that N ; minlspxkafnj - zn-l ) weN

Page 532: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

514

can write

The Functional Central Limit Theorem

kn ny- 1

Anm = un,i-mWn,i- - Xn,f--,f+l),J.=L i=m-vnj-t

(30.82)

for m = 0,..'.,N - 1, and

kn nj - nj- 1- 1

Anx = 77 )q/=1 m=N

For fixed finite m, the process

nj-l (&n-,?,W%,f+l- n,f-m,,.+l).

izzm-nj-L(30.83)

is, according to 17.11 and 17.6, an Ll-mixingale an-ay of size-1

with respect to& B' h fcU J and (cW J are the constantsthe constant array fncnvi-mcn,+1 ), w ere ni ni

specified by 29.18 for A = (1,0)'and (0,1)' respectively. We next show that theconditions of 19.11 are satisfied by these terms, so that

n-1+ x-x . x j- . oj ()Anm = 7 l(Jn

-,,,#p'n,f+l

-

n,i-m,i+3.3.---+

.

f=-+1(30.84)

First, for r > 2 in the a-mixing case (29.6)or for r 2 in the #-mixingcase(29.9),

r/2Uni-mWn,i-vl-

,t,f-,,,,/+1

supf& B,

fn Cn,i-mcn ,+1

r/2Un -,nGn,f+lr/2 ,

f 2 sup & 1Fi n Cn

-r,;crl

,/+1

r/2 r/2Un i-m Vn,f+lr/2 ,

S 2 sup U Wri n Cn,f-r?i C n

,/+1

' r r

< x, (30.85)where the first inequality maks use successively of Love's cr inequality andJensen's inequality, the second one is by the Cachy-schwartz inequality, and thefiniteness is because the aaays satisfy either 29.6(b) or 29.9(b') by assumption.ln the latter case, note that the assumptions include unifonn square-integrabil-ity. Therefore the array

Unf-,,,Wrn,f+1-

n,-r,,,f+f1

U WrCn,i-mcn ,+l

is unifonnly integrable in either case, and condition 19.11(a) is let.U d (cV ) satisfy condition 29.6(d) by assumption, whichNext, the arrays fcajl an n i

by the Cauchy-schwndz inequality implies that

Page 533: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Wetzk Convergence to Stochastic Integrals

ga(?+)J-11 u vsup limsup - ca,f-vcn.sl < x.

IG (0,l), e (0,1-?1

n--x i=(n/1+?1+l(30.86)

Setting t = 0 and 8 = 1 in (30.86)givesn-l

limsup7- c&af-puc''z,f..l

rl--yx f=??1+l

which is condition 19.11(b), whereas setting 8 = 1/n gives

(30.87)

fcui-mcwnd+l l = ojln),maX nv+1<f< n-1

(30.88)

and (30.87)and (30.88)together imply

n-1U W' 2

cn,i-mcn,sl)

-- 0,/=z,,,+1

(30.89)

*--l-y

0 is proved. But for m 2 1, accordingwhich is condition 19.11(c). So Anm.

to (30.82),kn,,r+ns1-1

A'IA';,,-A,QIS FIUn,i-mWn,i-vb-

z,f-,?,,f+1 I,

j=3 fxn./-l

= oknln),

(30.90)

L 0 lso holdswhere the order of magnitude is by (30.85)and (30.88),so Anm--t+

a ,

'for each m = 0,...,1- 1. Similarly.81

Un,i-mWn,i-vj-

n,f-,,,,sl I S 2 Il,,,f-,,,,s1l,

and applying 17.7 yieldskn n.j-nj-l - 1

FlAnxl = o 77 Xj=1 m=N

n.j-1U B?

C rl,/-rzlC n,+1(-+1

i=m-nj-.k(30.91)

o v)= (

for some > 0, where the order of magnitude follows by a combination of (30.87)with the fact that the sequence ((,,1 is of size

-1,

according to the mixing andQ-NED size assumptions. Thus, limn-oxflAa ISlimn-oxFlzlnxl , whichby takingNlarge enough can be made as close to 0 as desired. In the same manner, recallingN f minlusulrj - ?n-1 l ,

a nj- 1 j- 1

B I = o X X X cUn -,wW'n,sl(p,+1= os-b. (30.92)I n ,

.j=1 i=nj-L -=-'n./-1+1

* --I-:Z' 0 and this completes the proof of (30.70),andIt follows that Gn - Gn - Axi' ,

of the theorem. K

Page 534: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

516

Now let (Unit m x 1) be a vector array satisfying the conditions of 29.18, plusthe extra condition that the Q-NED size of the increments is - 1 for each element.Since 30.13 holds for each element paired with each element, including itself, theargument may be generalized in the following manner.

30.14 Theorem Let Snj = S=1I&f. Then

The Functional Central Limit Theorem

n-1 1

/s y&n,j+l.---:''

B,,dB, +A mxm)..j=1 0

(30.93)

Proof For arbitrary m-vectors of unit length, and p, the scalar arrays ('5k)and (p'Un,j+3 ) satisfy the conditions of 30.13. Letting Gn denote the matrix onthe left-hand side of (30.93),and G the matrix on the right-hand side, the resulth.'Gn#-P->X'G# is therefore given. A well-known matrix formula (seee.g. Magnusand Neudecker 1988: th. 2.2) yields

h.'Gn# = g' @A')Vec Gn, (30.94)where #' (B)A' is the Kronecker product of the vectors, the row vector (g1l1,...,

2 G ml x 1) is the vector consistingp'l&u,g,2l,..., ..., p.r,,,,,) (1x m ), and vec n

of the columns of Gn stacked one above the other. g' (&A' is of unit length, andapplying the Cramr-Wold theorem (25.5)in respect of (30.94)implies that Gn -P->G, as asserted in (30.93).wThis result is to be compared with 30.5. Between them they provide the intriguingincidental information that

1 1joBdBL+ LdBnBL- #&)(1)#n(1)' - Q. (30.95)

(Note that the stochastic matrix on the right has rank 1.) Of the two, 30.14 ismuch the stronger result, since it derives from the FCLT and is specific to thepattern of the incremeht variances.

Between them, 30.3 and 30.14 provide the basic theoretical tools necessary toanalyse the linear regression model in variables generated as partial-sum

processes (integratedprocesses). See Phillips and Durlauf (1986),and Park andPhillips (1988,1989) among many other recent references.

Page 535: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Notes

See Billingsley (1979,1986). The detinition of a-system

is given as 1.25 inBillingsley (1979),and as 1.26 in Billingsley (1986).The tprime' symbol ' denotes transposition../ is a column vector, written as arow for convenience.

An aftine transformation is a linear transfonnation x F-+ 4.x followed by atranslation, addition of a constant vector b. By an accepted abuse ofterminology, such transfonnations tend to be referred to as

<linear'.

4. That is, Ix+ y I S 1x1+ IyI. See j5. 1 for more details.

5. The notations JJ/V,J/g.tdl,Or simply j'f when the relevant measure isunderstood, are used synonymously by different authors.

6. 1 thank Elizabeth Boardman for supplying this proof.

7. Elizabeth Boardman also suggested this proof.

8. If there is a subset N c f such that either N or M is contained in everyT-set, the elements of N cease to be distinguishable as different outcomes. Anequivalent model of the random experiment is obtained by redetining f to haveN itself as an element, replacing its individual members.

9. Random variables may also be complex-valued', see j11.2.

10. In statements of general definitions and results we usually consider the caseof a one-sided sequence (X,)T.There is no difticulty in extending theconcepts to the case (xYr)=-x,and this is left implicit, except when themapping from I plays a specitk role in the argument.

11. We adhere somewhat reluctantly to the convention of defining size as anegative number. The use of terms such as

<large size' to mean a slow (orrapid'?) rate of mixing can obviously lead to confusion, and is best avoided.

12. This is a problem of $he weak corfvergence of distributions; see j22.4 forfurther details.

13. This is similar to an example due to Athreya and Pantula (1986a).14. In the theory of functions of a complex variable, an analytic function is one

possessing finite derivatives everywhere in its domain.

15. 1 am grateful to Graham Brightwell for this argument.

16. For convergence to fail, the discontinuities of f (whichmust be Borelmeasurable) would have to occupy a set of positive Lebesgue measure.

17. Conventionally, and for ease of notation, the symbol 5t is used here to denote

Page 536: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

518 Notes

what has been previously written as F-rx. No confusion need arise, since ac-subfield bearing a time subscript but no superscript will always beinterpreted in this way.

18. Some quoted versions of this result (e.g.Hall and Heyde 1980) are for p > ,

whereas the present version, adapted from Karatzps and Shreve (1988),extends

to 0 < p f as well.

19. The norm, Or length, of a k-vector X is 11#11= (Ar'm 1/2 To avoid confusionwith the Q nonn of a r.v., the latter is always written with a subscript.

20. The original St Petersburg Paradox, enunciated by Daniel Bernoulli in 1758,n-1if the first head appears onconsidered a gnme in wlch the player wins f 2

the nth toss for any n. The expected winnings in this case are infinite, butthe principle involved is the same in either case. See Shafer (1988).

21. See the remarks following 3.18. It is true that in topological spacesprojections are continuous, and hence measurable, under the product topology(see j6.5), but of course, the abstract space (f1,1) lacks topologicalstructure and this reasoning does not apply.

22. Since O is here a real k-vector it is written in bold face by convention,notwithstanding that 0 is used to denote the generic element of (0 ,p), in theabstract.

23. This is the basis of a method for generating random numbers having adistribution F. Take a drawing from the unifonn distribution on (0,1J(i.e.,arandom string of digits with a decimal point placed in front) and apply the

-1 i e a drawing from the desired distribution.transformation F (or F) to g v

24. . is used here as the argument of the ch.f. instead of the t used in Chapter11, to avoid confusion with the time subscript.

25. The symbol i appearing as a factor in these expressions denotes VZ. Thecontext istinguishes the use of the same symbol as an array index.

A

26. In practice, of course, Ut usually has to be estimated by a residual Ut,depending on consistent estimates of model parameters. In this case, a resultsuch as 21-6 is also required to show convergence.

27. More precisely, of course, Gmodels the projection of the motion of a particlein three-dimensional space onto an axis of the coordinate system.

28. A cautionary note: these combinations cannot be constlucted as residuals fromleast squares regressions. If E has f'ull rank, the regression of one elementof L onto the rest yields coefticients which are asymptotically random. Emust be estimated from the increments using the methods discussed in 925.1.

29. Compare Wooldridge and White (1988:Prop. 4.1). Wooldridge and White'sresult is inconrct as stated, since they omit the stipulation of almost sureo,-..s+1..k..1+.,

Page 537: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

References

Amemiya, Takeshi (1985),Advanced Econometrics, Basil Blackwell, Oxford.

Andrews, Donald W. K. (1984),ENon-strong mixing autoregressive processes',Journal of Applied Probability 21, 930-4.

(1987a), consistencyin nonlinear econometric models: ageneric unifonu law of large numbers', Econometrica 55, 1465-71.

(1988), ttaaws of large numbers for dependent non-identi-cally distributed random variables' , Econometric Fctp?'y 4, 458-67.

(1991), Elleteroscedasticity and avtocorrelation consistentcovariance matrix estimation' , Econometrica 59, 817-58.

(1992), EGeneric uniform convergence', EconometricF/lct??'y8, 24 1-57.

Apostol, Tom M. (1974),Mathematical Analysis (znd edn.) Addison-Wesley,Menlo Park.

Ash, R. (1972),Real Analysis and Probability, Academic Press, New York.

Athreya, Krishna B. and Pantula, Sastl'y G.(1986a), EMixing properties of Hanischains and autoregressive processes', Journal of Applied Probability 23,880-92.

(1986b), EAnote on strong mixing ofARMA processes', Statistics and Probability Letters 4, 187-90.

Azuma, K. (1967),Weighted sums of certain dependentrandom variables' , TohokuMathematical Journal 19, 357-67.

Barnsley, Michael (1988),Fractals Everywhere, Academic Press, Boston.

Bates, Charles and White, Halbert (1985),KAunified theory of consistent estima-tion for parametic models' , Econometric W/lct??'y1, 151-78.

Bernstein, S. (1927),Esur l'extension du thorme du calcul des probabilits auxsommes de quantits dependantes', Mathentatische Annalen 97, 1-59.

Bierens, Herman (1983),Eunifonn consistency of kernel estimators of a regression

function under generalized conditions', Journal of the American StatisticalAssociation 77, 699-707.

(1989), ELeast squares estimation of linear and nonlinearARMAX models under data eterogeneity', Working Paper, Depmment ofEconometrics, Free University of Amsterdam.

Page 538: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

520

Billingsley, Patrick (1968),Convergence of Probability Measures, John Wiley,New Yprk.

(1979), Probability and Measure, John Wiley, New York.

Borowski, E. J. and Borwein, J. M. (1989),The Collins Reference Dictionaryof Mathematics, Collins, London and GlasgoF,

References

Bradley, Richard C., Bryc, W. and Janson, S. (1987),<Ondominations betweenmeasures of dependence', Journal of Multivariate Analysis 23, 312-29.

Breiman, Leo (1968),Probability, Addison-Wesley, Reading, Mss.

Brown, B. M. (1971),'Martingale central limit theorems' , Annals of Mathemati-cal Statistics 42, 59-66.

Burkholder, D. L. (1973),tDistribution function inequalities for martingales',Annals of Probability 1, 19-42.

Chan, N. H. and Wei, C. Z. (1988),dLimiting distributions of least squares esti-mates of unstable autoregressive processes', Annals ofstatistics, 16, 367-401.

Chanda, K. C. (1974),<strong mixing properties of linear stochastic processes' ,

Journal of Applied Probability 11, 401-8.

Chow, Y. S. (1971), On the Lp convergence for n &, 0 < p < l , Annals ofMathematical Statistics 36, 393-4

and Teicher, H. (1978),Probability Theory: Independence, lnter-changeabilit.y and Martingales, Springer-verlag, Berlin.

Chung, Kai Lai (1974),A Course in Probability Fw?-y (2ndedn.), AcademicPress, Orlando, Fla.

Cox, D. R. and Miller, H. D. (1965), The Fdt?ry of Stochastic Processes,Methuen, London.

Cramr, Harald (1946),Mathematical Methods ofstatistics, Princeton UniversityPress, Princeton, NJ.

Davidson, James (1992), EA central limit theorem for globally nonstationarynear-epoch dependent functions of mixing processes', Econometric Fctpry, 8,313-29.

(1993a) $An fal-convergence theorem for heterogeneous mixin-gale arrays with trending moments', Statistics and Probabilit.y Letters 16,301-4

(1993b), t'l'he central limit theorem for globally non-stationary near-epoch dependent functions of mixing processes: the asymp-totically degenerate case', Econometric T/let)r.y 9, 402-12.

Page 539: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

References 521

deJong, R. M. (1992),4taaws of large numbers for dependent heterogeneousprocesses',Working Paper, Free University of Amsterdam (forthcominginEconometric Theory, 1995).

(1994), 4Astrong law forfu-mixingale sequences', Working Paper,Department of Econometrics, University of Tilburg.

Dellacherie, C. and Meyer, P.-A. (1978),Probabilities amd Potential, North-Holland, Amsterdam.

Dhrymes, Phoebus J. (1989)Topics in Advanced Econometrics, Springer-verlag,New York.

Dieudonn, J. (1969),Foundations of Modern Analysis, Academic Press, NewYork and London.

Domowitz, 1. and White, H. (1982),'Misspecified models with dependent obser-vations', Journal of Econometrics 20, 35-58

Donkker, M. D. (1951)<An invariance principle for certain probability limittheorems' , Memoirs of the American Mathematical ubcdfy, 6, 1-12.

Doob, J. L. (1953),Stochastic Processes, John Wiley, New York; Chapman & Hall,London.

Dudley, R. M. (1966),'Weak convergence of probabilities on nonseparable metric

spaces and empirical measures on Euclidean spaces', Illinois Journal ofMathe-matics 10, 109-26.

(1967), <Measures on non-separable metric spaces' , Illinois Journalof Mathentatics 11, 109-26.

(1989), RealAnalysis andprobability, Wadsworth and Brooks/cole,Pacific Grove, Calif.

Dvoretsky, A. (1972),%Asymptotic normality of sums of dependent randomvariables' , in Proceedings ofthe Sixth Berkeley Symposium on MathematicalStatistics and Probability, ii, University of California Press, Berkeley,Calif., 513-35.

q'

Eberlein, Ernst and Taqqu, Murad S. (eds.)(1986),Dependence in Probability andStatistics.. a skrvdyof Recent Results, Birkhauser, Boston.

Engle, R. F., Hendry, D. F. and Richard, J.-F. (1983), %Exogeneity', Econo-metrica 51, 277-304

Feller, W. (1971),An Introduction to Probabilit.y F/lct??'yand its Applications, ii,John Wiley, New York.

Gallant, A. Ronald (1987),Nonlinear St#istical Models. John Wiley, New York.

and White, Halbert (1988),.4

Un6ed Fctp?-y of Etimation ndInference for Nonlinear Dynamic Models, Basil Blackwll, Oxford.

Page 540: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

522

Gastwirth, Joseph L. and Rubin, Herman (1975), 'The asymptotic distributiontheory of the empiric CDF for mixing stochastic processes' , Annals ofStatistics 3, 809-24.

Rcferences

Gnedenko, B. V. (1967),The F/ldor.v ofprobability (4thedn.), Chelsea Publish-ing, New York.

Gorodetskii, V. V. (1977),<On the strong mixing property for linear sequences' ,

Theoty of Probability and its Applications, 22, 41 1-13.

Halmos, Paul R. (1956),Lectures in Ergodic Theoty, Chelsea Publishing, NewYork

(1960), Naive Set F/lwry, Van Nostrand Reinhold, New York.

(1974), Measure F/lc/ry, Springer-verlag, New York.

Hall, P and Heybe, C. C. (1980),Martingale Limit Fdtpr.y and its Applicatln,Academic Press, New York and London.

Hannan, E. J. (1970),Multiple Time Series, John Wiley, New York.

Hansen, L. P. (1982),4l-arge sample properties of generalized method of momentsestimators' , Econometrica 50, 1029-54.

Hansel), Bruce E. (1991),Estrong laws for dependent heterogeneous processes',

Econometric Fet??'y 7, 2 13-2 1.

(1992a), <Errata', Econometric F/letpr.y8, 421-2.

(1992b), Econsistentcovariancematrix estimation fordependentheterogeneous processes' , Econometrica 60, 967-72

(1992c), convergenceto stochastic integrals for dependentheterogeneous processes', Econometric F/ldt)ry 8, 489-500.

Herrndorf, Norbert (1984),'A f'unctional central limit theorem for weakly depen-dent sequences of random variables', Annals of Probabilit.v 12, 141-53.

(1985), <Afunctional central limit theorem for strongly mix-ing sequences of random variables' , Wahrscheinlichkeitstheorie vcrw'. Gebeite69, 540-50.

Hoadley, Bruce (1971),'Asymptotic properties of maximum likelihood estimatorsfor the independent not identically distributed case', Annals of MathematicalStatistics 42, 1977-91.

Hoeffding, W. (1963),tprobability inequalities for sums of bounded random vari-ables' , Journal of the American Statistical Association 58, 13-30.

lbragimov, 1.A. (1962),somelimit theorems for stationary processes', Theoty ofProbability and its Applications 7, 349-82.

Page 541: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

References

(1965), <Onthe spectrum of stationary Gaussian sequences satis-fying the strong mixing condition. 1: Necessary conditions', Theory ofproba-bilit.yand its Applications 10, 85-106.

and Linnik, Yu. V. (1971),lndependent and Stationary Se-quences of Random Variables. Wolters-Noordhoff, Groningen.

losifescu, M. and Theodorescu, R. (1969),Random Processes and fztzmn',Springer-verlag, Berlin.

Karatzas, Ioannis and Shreve, Steven E. (1988),Brownian Motion and StochasticCalculus, Springer-verlag, New York.

Kelley, John L. (1955),General Topology, Springer-verlag, New York.

Kingman, J. F. C. and Taylor, S. J. (1966), Introduction to Measure andProbability, Cambridge University Press, London and New York.

Kolmogorov, A. N. (1950),Foundations of the Tctp?'y of Probability, ChelseaPublishing, New York (publishedin German as Grundbegr#e der Wahrschein-lichkeitsrechnung, Springer-verlag, Berlin, 1933).

and Rozanov, Yu. A. (1960),'On strong mixing conditions forstationary Gaussian processes', Theot'y of Probabilit.y and its Applications 5,204-8.

Kopp, P. E. (1984),Martingales and Stochastic Integrals, Cambridge UniversityPress.

Kurtz, T. G. and Protter, P. (1991),Weak limit theorems for stochastic integralsand stochastic differential equations', Annals of Probability 19, 1035-70.

Love, Michel (1977),Probability F/lctpry, i (4thedn.), Springer-verlag, NwYork.

Lukacs, Eugene (1975),Stochastic Convergence (2ndedn.), Academic Press,New York.

Magnus, J. R., and Neudecker, H. (1988),Marix Dterential Calculus withApplications in Statistics and Econometrics, John Wiley, Chichester.

Mandelbrot, Benoit B. (1983),The Fractal Geometry ofNature, W. H. Freeman,New York.

Mann, H. B. and Wald, A. (1943a),<On the statistical treatment of linearstochastic difference equations' , Econometrica 11, 173-220.

(1943b), dOn stochastic limit and order relation-

ships' , Annals of Mathematical Statistics 14, 390-402.

McKean, H. P., Jr. (1969),Stochastic Integrals, Academic Press, New York.

McLeish, D. L. (1974),tDependent central limit theorems and invariance princi-ples' , Annals of Probability 2,4, 620-8.

Page 542: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

524

-- - (1975a),KAmaximal inequality and dependent strong laws' ,

Annals of Probability 3,5, 329-39.

References

(1975b), 'Invariance principles for dependent variables' , Z.Wahrscheinlichkeitstheorie vcnp. Gebeite 32, 165-78.

(1977), tOn the invariance principle for nonstationary mix-ingales' , Annals of Probability 5,4, 616-21. '

Nagaev, S. V. and Fuk, A. Kh. (1971),tprobability inequalities for sums of

independent random variables', Theory of Probability and its Applications6, 643-60.

Newey, W. K. (1991),'Uniform convergence in probability and stochastic equi-continuity', Econometrica 59, 1161-8.

and West, K. (1987),'A simple positive definite heteroskedastlc-ity and correlation consistent covariance matrix', Econometrica 55, 703-8.

Park, J. Y. and Phillips, P. C. B. (1988),istatistical inference in regressions

with integrated processes, Part 19, Econometrt'c F/lwr.y 4, 468-98.

(1989),Kstatistical inference in regressions

with integrated processes, Part 2', Econometric F/let?ry 5, 95-132.

Parthasarathy, K. R. (1967),Probability Measures on Metric Spaces, AcademicPress, New York and London.

Pham, Tuan D. and Tran, Lanh T. (1985),tsome mixing properties of time seriesmodels', Stochastic Processes and their Applications 19, 297-303.

Phillips, P. C. B. (1988),tWeak convergence to the matrix stochastic integralJ()#ldB' Journal of Multivariate Analysis 24, 252-64.

- and Durlauf, S. N. (1986),GMultiple time series regressionwith integrated processes', Review of Economic Studies 53, 473-95.

Pollard, David (1984),Convergence of Stochastic Processes, Springer-verlag,New York.

Ptscher, B. M. and Pruha, 1. R. (1989),KAuniform 1aw of large numbers fordependent and heterogeneous data processes' , Econometrica 57, 675-84.

- (1994), 'Generic uniform convergence andequicontinuity concepts for random functions: an exploration of the basicstructure', Journal of Econometrics 60, 23-63.

(1991a), Basic structure of the asymptotictheory in dynamic nonlinear econometric models, Part 1: Consistency andapproximation concepts', Econometric Rcvfdwx 10, 125-216.

Page 543: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

References

(1991b), tBasic structure of the asymptotictheory in dynamic nonlinear econometric models, Part 1I: Asymptotic nonnal-ity', Econometric Reviews 10, 253-325.

Prokhorov, Yu. V (1956),tconvergence of random processes and limittheoremd inprobability theory', Theoty of Probability and its Applications 1, 157-213.

Rao, C. Radhakrishna (1973),Linear Statistical Inference and its Applications(2nd edn.), John Wiley, New York.

Rvsz, Pl (1968),The fxzwu of fzzrge Numbers, Academic Press, New York.

Rosenblatt, M. (1956),KAcentral limit theorem and a strong mixing condition' ,

Proceedings of the National Academy of Science, USA, 42, 43-7.

(1972), unifonnergodicity and strong mixing', Z Wahrschein-lichkeitstheorie vcrw,. Gebeite 24, 79-84.

(1978), <Dependence and asymptotic independence for randomprocesses', in Studies in Probability F/let?ry (ed.M. Rosenblatt), MathematicalAssociation of America, Washington DC.

Royden, H. L. (1968),Real Analysis, Macmillan, New York.

Seneta, E. (1976),Regularly Varying Functions, Springer-verlag, Berlin.

Serfling, R. J. (1968),<contributions

to central limit theory for dependent vari-ables' , Annals of Mathetnatical Statistics 39, 1158-75.

(1980), Approximation Theorems of Mathentatical Statistics,John Wiley, New York.

Shafer, G. (1988),t'l'he St Petersburg Paradox', in Encyclopaedia of the Statis-tical Sciences, viii (ed.S. Kotz and N. L. Johnson), John Wiley, New York.

Shiryayev, A. N. (1984),Probability, Springer-verlag, New York.

Skorokhod, A. V. (1956), tmit theorems for stochastic processes', Theoty ofProbability and its Applications 1, 261-90.

(1957), ttaiinit theorems for stochastic processes with indepen-dent increments', Theory of Probability and its Applications 2, 138-71.

Slutsky, E. (1925), tber stochastiche Asymptoter und Grenzwerte' , Math.Annalen 5, 93.

Stinchcombe, M. B. and White, H. (1992),<some measurability results for ex-trema of random functions over random sets', Review ofEconomic Studies 59,495-514.

Stone, Charles (1963), 'Weak convergence of stochastic processes defined onsemi-infinitetimeinterkals', ProceedingsoftheAmericanMathematicalsocie#14, 694-6.

Page 544: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

526

Stout, W. F. (1974),Almost Sure Convergence, Academic Press, New York.

Strasser, H. (1986), KMartingale difference arrays and stochastic integrals' ,

Probability F/letpry and Related Fields 72, 83-98.

References

Varadarajan, V. S. (1958), Weak convergence of measures on separable metricspaces' , Sankhya 19, 15-22.

von Bahr, Bengt, and Essen, Carl-Gustav (1965),Klnequalities for the rth abso-lute moment of a sum of random variables, 1 S r K 2', Annals ofMathematicalStatistics 36, 299-303.

White, Halbert (1984),Asymptotic Fet?ry for Econometricians, AcademicPress, New York.

and DomoFitz, 1. (1984),KNonlinear regression with dependentobservations' , Econometrica 52, 143-62.

Wiener, Norbert (1923), dDifferential space' , Journal of Mathematical Physics2, 131-74

V/illard, Stephen (1970),General Topology, Addison-W-esley, Reading, Mass.

Withers, C. S. (1981a), dconditions for linea.r processes to be strong-mixing' , Z.Wahrscheinlichkeitstheorie vdrw. Gebeite 57, 477-80.

(1981b), tcentral limit theorems for dependent variables, 1' , ZWahrscheinlichkeitstheorie verw'. Gebeite 57, 509-34.

Wooldridge, Jeffrey M. and White, Halbert (1986),dsome invariance principlesand central limit theorems for dependent heterogeneous processes' , Universityof California (San Diego) Working Paper.

(1988), tsome invariance principlesand central limit theorems for dependent heterogeneous processes' , Econo-metric Fet??'y 4, 210-30.

Page 545: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

lndex

a-mixing, see strong mixingAbel's partial summation fonnula

34, 254absolute convergence 31absolute moments 132absolute value 162absolutely continuous function 120absolutely continuous measure 69absolutely regular sequence 2U9absact integral 128accumulation point 21, 94adapted process 500adapted sequence 229addition modulo 1 46adherent point 77affine transfonnation 126Aleph nought 8algebra of sets 14almost everywhere 38, 113almost sure convergence 178, 281

method of subsequences 295uniform 331

almost surely 113analytic function 328analytic set 328

of C 449Andrews, D. W. K. 216, 261, 301,

336, 338, 341antisymmetric relation 5Apostol, T. M. 29, 32, 33, 126approximable in probability 274approximable process 273

weak law of large numbers 304approximately mixing 262AR process, see autoregressive

processARMA process 215

an'ay 33array convergence 35Arzel-Ascoli theorem 91, 335, 439,

447, 469

asymptotic equicontinuity 90asymptotic expectation 357asymptotic independence 479asymptotic negligibility 375asymptotic unifonn equicontinuity

90, 335asymptotic unpredictability 247Athreya, Krishna B. 215atom 37

of a distribution 129of a p.m. 112

autocovariance 193autocovariances, summability 266autoregressive process 215

non-strong mixing 216non-uniform mixing 218

autoregressive-moving averageprocess 215

axiom of choice 47axioms of probabilityAzuma, K. 245Azuma's inequality 245

p-mixing 207backshift transformation 191ball 76band width 401Bartlett kernel 403, 407base 79

for point 94for space of measures 418for topology 93

Bernoulli distribution 122expectation 129

Bernoulli r.v.s 216Bernstein sums 386, 401Berry-Essen theorem 407betting system 233-4Bierens, Herman 261, 3365ig Oh 31j 187Bllingjley, Patrick 17, 18, 261,

Page 546: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

528

421, 422, 447, 448, 457, 466,469, 472, 473, 474-5, 496

Billingsley metric 462equivalent to Skorokod metric

464binary expansion 10binary r.v. 122, 133binary sequence 180binomial distribution 122, 348, 364bivariate Gaussian 144blocking argument 194, 299Borel tield

of C 440of D 465intinite-dimensional 181real line 22, 16metric space 77, 413topological space 413

Borel function 55, 57, 117expectation of 130

Borel sets 47Borel-cantelli lemma 282, 295, 307Borel's normal number theorem 290boundary point 21, 77bounded convergence theorem 64bounded function 28bounded set 22, 77bounded variation 29Brown, Robert 443Brownian bridge 445

mean deviations 498Brownian motion 443

de-meaned 498distribution of supremum 496transformed 445, 486vector 454with drift 444

Burkholder's inequality 242

C 437c.d.f., see cumulative distribution

functioncadlag function 90, 456cadlag process 488

progressively measurable 500

Index

Cantor, G. 10cardinal number 8cardinality 8Cartesian product 5, 83, 102Cauchy criterion 25Cauchy distribution 123

as ktabledistribution 362characteristic function 167

no expectation 129Cauchy family 124Cauchy sequence 80-2, 97

real numbers 25vectors 29

Cauchy-schwartz inequality 138central limit theorem

ergodic Ll-mixingales 385functional 450, 480independent sequences 368martingale differences 383NED functions of mixing

processes 386three series theorem 312trending variances 379

central moments 131central tendency 128centred sequence 231Cesro sum 31ch.f., see characteristic functionChan, N. H. 510Chanda, K. C. 215, 219characteristic function 53, 162

derivatives 164independent sums 166multivariate distributions 168series expansion 165weak convergence 357

Chebyshev's inequality 132Chebyshev's theorem 293chi-squared distribution 124chord 133Chow, Y. S. 298Chung, K. L. 407, 409closed under set operations 14closed interval 11closed set 77

Page 547: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Index

. . . . . 7(:('

jz.' t: :

J''f( ' . E

.

t''. .. ..

q'...

529

real line 21closure point 20, 77, 94cluster point 24, 80, 94coarse topology 93coarseness 438codomain 6coin tossing 180, 191collection 3compact 95compact set 22, 77compact space 77compactificaton 107complement 3complete measure space 39complete space 80completely regular sequence 209completely regular space 100completeness 97completion 39complex conjugate 162complex number 162composite mapping 7concave function 133conditional

characteristic function 171-3distribution function 143expectation 144, 147

linearity 151optimal predictor 150variance of 156versions of 148

Fatou's lemma 152Jensen's inequality 153Markov inequality 152modulus inequality 151monotone convergence theorem 152probability 113variance 238, 316

conditionally heteroscedastic 214consistency properties 183, 435consistency theorem

function spaces 436

sequences 184contingent probability 114continuity 112

of a measure 38continuous distribution 122continuous function 27, 84, 436continuous mapping 97continuous mapping theorem 355, 497

metric spaces 421continuous time martingale 501continuous time process 500continuous truncation 271, 309continuously differentiable 29contraction mapping 263

converge absolutely 31

convergence 94almost sure 178, 28 1, 331in distribution 347, 367in Lp norm 287, 331in mean square 179, 287in probability 179, 284, 349,

359, 367in probability 1aw 347metric space 80on a subsequence 284real sequence 23space of probability naeasures

4 18stochastic function 333transformations 285unifonu, 30, 331weak 179with probability 1 179, 331

convergence lemma 306convergence-detenuining class 420convex function 133, 339convolution 161coordinate 177coordinate projections 102, 434coordinate space 48coordinate transformation 126correlation coefficient 138correspondence 6countability axioms 94countable additivity 36countable set 8countable subadditivity 37, 111countably colpact 95

Page 548: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

530

covariance 136covariance matrix 137covariance stationarity 193covering 22, 77Cox, D. R. 503

cr inequality 140Cramr, Harald 141Cramr-Wold device 405, 455, 490,

4 516Cramr's theorem 355

cross moments 136cumulative distribution function

117cylinder set 48, 115

of R 434

D 456Billingsley metric on 464

Davidson, James 301, 386de Jong, R. M. 319, 326de Morgan's laws 4decimal expansion 10decreasing function 29decreasing sequence

of real numbers 23of sets 12

degenerate distribution 349degree of belief 111Dellacherie, C. 328dense 23dense jet 77density function 74, 120denumerable 8dependent events 113derivative 29

conditional expectation 153expectation 141

derived sequence 177determining class 40, 121, 127, 420diagonal argument 10diagonal method 35diffeomorplzism 126difference, set 3differentiable 29differential calculus 28

Index

diffusion process 503discontinuity, jump 457discontinuity of first kind 457discrete distribution 122, 129discrete metlic 76discrete subset 80discrete topology 93disjoint 3distzibutiondomain 6dominated convergence theorem 63Donsker, M. D. 450Donsker's theorem 450Doob, J. L. 196, 216, 235, 314Doob decomposition 231Doob's inequality 241

continuous time 501Doob-Mayer decomposition 502double integral 66, 136drift 444Dudley, R. M. 328, 457Durlauf, S. N. 490, 516dyadic rationals 26, 99, 439dynamic stability 263Dynkin, E. B. 18Dynkin's 7t-/, theorem 18

e-neighbourhood 20 see also spherey-net 78Egoroff s theorem 282, 510element, set 3embedding 86, 97, 105empirical disibution function 332empty set 3, 77Engle, R. F. 403ensemble average 195envelope 329equally likely 180equicontinuity 90, 335

stochastic 336strong stochastic 336

equipotent 6equivalence class 5, 46equivalence relation 5equivalent measures 69

Page 549: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Index

equivalent metrics 76equivalent sequences 307, 382ergodic theorem 200

law of large numbers 291ergodicity 199

asymptotic independence 202Cesro-summability of

autocovariances 201Essen, Carl-Gustav 171essential supremum 117, 132estimator 177Euclidean distance 20, 75Euclidean k-space 23Euclidean metric 75, 105evaluation map 105even numbers 8exogenous 403expectation 128exponential function 162exponential inequality 245extended functions 52extended real line 12extended space 117extension theorem 184

existence part 40uniqueness part 44, 127

Y-mixing,see uniform mixingT-analytic sets 449factor space 48fair game 289Fatou's lemma 63

conditional 152Feller, W. 32Feller's theorem 373field of sets 14filter 94filtration 500fine toplogy 93tineness 438tinite additivity 36tinite dimensional ylinder sets

181, 435finite dimensional distributions

of C 440, 442, 446

of D 466Wiener measure 446

tinite intersection property 95finite measure 36first countable 95fractals 443frequentist model 111

1aw of large numbers 292Fubini's theorm 66, 69, 125Fuk, A. Kh. 220function 6

convex 339of a real variable 27of bounded variation 29

function space 84, 434nonseparable 89

functional 84functional central limit theorem

450, 480martingale differences 440multivariate 454, 490NED functions of strong mixing

processes 481NED functions of unifonn mixing

processes 485

Gallant, A. Ronald 261, 263, 271,401-2

gnmbling policy 233gamma function 124Gaussian distribution 123

characteristic function 167stable distribution 363

Gaussian family 123expectation 129moments 131

Gaussian vector 126generic unifonn convergence 336geometric series 31Glivenko-cantelli theorem 332global stationarity 194, 388, 450,

486Gordin, M. 385Gorodetskii, V. 215, 219, 220graph 6

Page 550: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

532 Index

Hahn decomposition 70, 72half line 11, 21, 52, 118half-Gaussian distribution 385half-normal density 124half-open interval 11, 15, 118Hall, P. 250, 314, 385, 409Halmos, Paul R. 202Hansen, Bruce E. 318, 403, 510Hartman-Wintner theorem 408Hausdorff metric 83, 469Hausdorff space 98Heine-Borel theorem 23Helly-Bray theorem 353Helly' s selection theorem 360Heyde, C. C. 250, 314, 385, 409Hoadley, Bruce 336Hoeffding's inequality 245Hlder's inequality 138homeomorphism 27, 51, 86, 97, 105

i.i.d. 193Ibragimov, 1. A. 204-5, 210, 21 1,

215, 216, 261identically distributed 193image 6imaginary number 162inclusion-exclusion fonnula 37, 420increasing function 29increasing sequence

of real numbers 23of sets 12

independence 114independent Brownian mtions 455independent r.v.s 127, 154-5, 161independent sequence 192

strong 1aw of large numbers 31 1independent subfields 154index set 3, 177indicator function 53, 128inferior limit

of real sequence 25of set sequence 13

infimum 12infinite run of heads 180

infinite-dimensional cube 83, 105infinite-dimensional Euclidean

space 83, 104infinitely divisible distribution

362initial conditions 194inner measure 41innovation sequence 215, 231integers 9integrability 61integral 57integration by parts 58, 129, 134interior 21, 77intersection 3interval 11into 6invariance principle 450, 497invaiant event 195invariant nv. 196inverse image 6inverse projection 48inversion theorem 168-70irrational numbers 10, 80isolated point 21, 77isometry 86isomomhic spaces 51isomorphism 145iterated integral 66, 135lt integral 507

J1 metric 459Jacobian matrix 126Jensen's inequality 133

conditional 153Jordan decomposition 71jump discontinuities 457jump points 120

Karatzas, loannis 502-3, 508kernel estimator 403Khinchine's theorem 368Kolmogorov, A. N. 209-10, 311-2Kolmogorov consistency theorem 184Kolmogorov's inequality 240

rwnntln.xzxxn +:*'.-*.-- rn4

Page 551: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

lndex 533

Kolmogorov's zero-one law 204Kopp, P. E. 504Kronecker product 516Kronecker's lemma 34, 293, 307Kurtz, T. G. 510

l-system 18l'Hpital' s rule 361Lv convergence 287

uniform 331Lp norm 132fw-approximable 274fv-bounded 132fv-dominated 330largest element 5latent variables 1771aw bf iterated expectations 149law of large numbers 200, 289

Cauchy r.v.s 291definition of expectation 292frequentist model 292random walk 291unifonn 340

law of the iterated logarithm 408Lebesgue decomposition 69, 72

probability measure 120Lebesgue integral 57Lebesgue measure 37, 45-6, 74, 112

plane 66product measure 135

Lebesgue-integrable nv.s 132Lebesguecstieltjes integral 57-8,

128left-continuity 27left-hand derivative 29Lvy, P. 312Lvy continuity theorem 358Lvy's metric 424, 468lexicographic ordering 10Liapunov condition 373Liapunov's inequality 139Liapunov's theorem 372L 149liminf

of real sequence 25

of set sequence 13limit 80

expectation of 141of set sequence 13

limit point 26limsup

of real sequence 25of set sequence 13

Lindeberg condition 369, 371, 380asymptotic negligibility 376uniform integrability 372

Lindeberg theorem 369Lindeberg-Feller theorem, see

Lindeberg theorem; Feller'stheorem

Lindeberg-taevy theorem 366FCLT 449

Lindelf property 78-79, 95Lindelf space 98Lindelf's covering theorem 22linear ordering 5linear process 193, 247, 252

strong law of large numbers 326linearity

of conditional expectation 151of integral 62

Linnik, Yu. V. 204-5, 210, 215, 216Lipschitz condition 28, 86, 269

stochastic 338Little Oh 31, 187Love, M. 32, 407Love's cr inequality 140log-likelihood 177lower integral 57lower semicontinuity 86lower variation 71

MA process, see moving averageprocess

Magnus, J. R. 516Mandelbrot, Benoit 443Mann, H. B. 187mapping 6marginal c.d.f. 126marginal distributions of a

Page 552: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

534

sequence 186marginal measures 64marginal probability measures 115

tightness 430Markov inequality 132, 135

conditional 152Markov process 503martingale 229

continuous time 501convergence 235

mmingale difference array 232weak law of large numbers 298

mmingale difference sequence 230strong law of large numbers 314

maximal inequalitiesfor linear processes 256for martingales 240for mixingales 252

maximum metric 76McKean, H. P., Jr. 508Mcl-eish, D. L. 247, 261, 318, 380

mean 128mean reversion 214mean stationarity 193mean value theorem 340mean-square convergence 293measurability of suprema 327, 449measurable function 117measurable isomorphism 51measurable rectangle 50, 48measurable set 41measurable space 36measurable transformation 50measure 36naeasure space 36nAeasure-preserving transfornAation

191rnixes outconaes 200

memory 192method of subsequences 295metric 75metric space 75, 96metrically transitive 199metrizable space 93metrization 107

Index

Meyer. P.-A. 328Miller, H. D. 503Minkowski's inequality 139mixed continuous-discrete

distribution 129mixed?Gaussian distribution 404mixing 202

inequalities 211-4MA processes 215martini example 202measurable functions 21 1

mixing processstrong 1aw of large numbers 323

mixing sequence 204mixing size 210 see also sizemixingale 247

stationary 250strong 1aw of large numbers 318weak 1aw of large numbers 30l

mixingale array 249modulus 162modulus inequality 63

complex r.v.s 163conditional l5l

modulus of continuity 91, 335, 439,479

cadlag functions 458, 468moment generating function 162moments 131

Gaussian distribution 123monkeys typing Shakespeare 180monotone class 17monotone convergence theorem 60, &.

conditional 152envelope theorem 329

monotone function 29monotone sequence

of real numbers 23of sets 12

monotonicityof measure 37of p.m. 111

Monte Carlo 498moving average process 193

mixing 215

Page 553: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Index 535

strong law of large numbers 326multinonnal disibution, see

multivariate Gaussianmultinormal p.d.f. 126multivariate c.d.f. 125multivariate FCLT 490multivariate Gaussian

affine transfonnations 170characteristic function 168

mutually singular measures 69

Nagaev, S. V. 220nave set theol'y 47natural numbers 8near-epoch dependence 261

mixingales 264transfonnations 267

near-epoch dependent processstrong 1aw of large numbers 323weak 1aw of large numbers 302

nearly measurable set 329NED, see near-epoch dependencenegative set 70neighbourhood 20 see also spherenested subfields 155net 94Neudecker, H. 516Newey, W. K. 336, 401non-decreasing function 29non-decreasing sequence

of real numbers 23of sets 12

non-increasing function 29nn-increasing sequence

of real numbers 23of sets 12

non-measurable function 55non-measurable set 46

norm inequality 139for prediction errors 157

normal distribution 123normal law of error 364normal number theorel 290nonnal space 98null set 3, 70

odd-order moments 131one-dimensional cylinder 182one-to-one 6onto 6, 97open covering 22, 77open interval 11

open mapping 27

open rectangles 102

open set 77, 93of real line 20

order-preserving mapping 6ordered pairs 7orders of magnitude 31origin 10Ornstein-uhlenbeck process 445

diffusion process 503outer measure 41outer product 137

7:-1 theorem 18, 44, 49, 67zr-system 18p.d.f., see probability density

functionp.m., see probability measurepairwise independence 114pairwise independent nv.s 127, 136Pantula, Sastry G. 215parameter space 327Park, J. Y. 516Parthasarathy, K. R. 418, 422, 426,

427, 429, 469partial knowledge 145partial ordering 5partial sums 31partition 3

of g0,1) 438permutation of indices 436Pham, Tuan D. 215Phillips, P. C. B. 490, 510, 516piecewise linear functions 437pigeon hole principle 8Pitman drift 369pointwise convergence 30

h tic 331stoc as

Page 554: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

536

Poisson distribution 122, 348characteristic function 167expectation 129infinitely divisible 362

polar coordinates 162Pollard, David 457positive semi-definite 137positive set 70Ptscher, B. M. 261, 274, 277, 336,

342power set 13precompact set 78predictable component 231probability density function 122probability measure 111

weak topology 418probability space 111, 117product measure 64product space 7. 48, 102, 115prpduct topology 102

function spaces 453product, set 5progressively measurable 500progressively measurable functions

504projection 8, 48, 50

suprema 328not measurable 328

rojection c-field 415, 435Pof C 440of D 456

Prokhorov, Yu. V. 422, 423, 457Prokhorov metlic 424, 467Protter, P. 510Prucha, 1. 261, 274, 277, 336, 342pseudo-metric 75, 504

-dependence 215quadratic variation 238, 502

deterministic 503

p-mixing 207R, the space 434r.V., see random VariableD oazxn-Mit-naxzm aerivntive70. 74.

Index

120, 148Radon-Nikodym theorem 70, 72-4, 122random element 413, 111random event 111random experiment 111, 128random tield 178randorp pair 124random sequence 177

memory of 192random variable 112, 117random vector 137random walk 230, 291random weighting 316range 6Rao, C. R. 143rate of convergence 294rational numbers 9real numbers 10real plane 11real-valued function 27, 87, 434realization 179refinement 438reflexive relation 5regression coeftkientsregular measure 413regular sequence 204regular space 98regularly varying function 32relation 5relative compactness 422relative equencies 128relative topology 20, 93relatively compact 77remote c-field 203repeated sampling 179restriction of a measure space 37Riemann integral 57

expectation 141of a random function 496

Riemann zeta function 32Riemann-stieltjes integral 58

and stochastic integral 503right continuous 27

c.d.f. 119-20filtration 500

Page 555: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Index 537

right-hand derivative 29 size

ring 14 mixing 210Rozanov, Yu. A. 209-10 mixingale 247

near-epoch dependence 262c-algebra 15 Skorokhod, A. V. 350, 457, 459c-field 15 Skorokhod metric 459, 469c-finite measure 36 Skorokhod representation 350, 431,St Petersburg paradox 289 510sample 177 Skorokhod topology 461sample average 128 slowly varying function 32ample path 179 Slutsky's theorem 287s

sample space 111 smallest element 5scale transformation 178 Souslin space 328seasonal adjustment 262 spectral density function 209second countable 79, 95 MA process 215Serfling, R. J. 213 sphere 76self-similarity 443 of C 440semi-algebra 15 of D 466semi-ring 15, 40, 48, 49, 118 stable distribution 362semimartingale 233 stationarity 193

in continuous time 501 step function 119, 349separable set 78 Stinchcombe, M. B. 328separable space 95 stochastic convergence, sec under

separating function 98 convergenceseparation axioms 97 stochastic equicontinuity 336

sequence 9, 94 termwise 341metric space 80 stochastic integral 503real 23 stochastic process 177

sequentially compact 95, 422 continuous time 500serial independence 192 stochastic sequence 177series 31 stopped process 234set 3 stopping l'ule 234set function 36 stopping time 233set of measure zero 3$, 70 filtration 501shift transformation 191 Stout, W. F. 314, 409shocks 215 Strasser, H. 510Shreve, Steven E. 502-3, 508 strict ordering 5signed measure 70 strict stationarity 193simple function 53 strong law of large numbers 289

approximation by 54 for independent sequence 310integral of 59 for fw-bounded sequence 295,

simple random variables 128 312, 314'

. j.. . j ... .

singleton 11 C( .

,( fOr martingales 3 14-7ingular Gaussian distribu/ih r11')y9,) , fpr mixingales 319-23s , ,

jy, ttj.jjtyyyjtyjyrgylrytlttytjj,yyy;ytyy,y yfjjyxsp fuuctjous oj mjxjng:!i5;ii.212':11.kliir;rIIL.;IE.(11.CIEEII.III:--(IrF:II:'III.,i!r:il!Ell.1!!1;il:.;lr.(Irr-''d:izrrt!!il.111111.1!4111',t,-.(111.:::12,:11:1(714:

.':-

).';'yr'.;)...y-yqlys..Lj.;jjj;.jt;)),j.j..y..jy.)lrliylyj....)

q.(.-yyytj.y;yyyy.Ey.....yy(-t,;)y).-.j..yjy,y-yy,

y..j.-ty...--.j

. ... yq... y. . . ... . ..2.. . ..

...L..y...t. y.j-).t,jy.yty,jyjy;

y.y,y.-y;yj.y....t.,jy.t..,y..y; ...

.,.j. j-

... .yy . y.. . . ..(

,.j.yy. jygjg.. 4

Page 556: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

538 lndex

processes 324-6 termwise stochastic equicontinuitystrong mixing 341

autoregressive processes 216 three series theorem 31 1coefticient 206 tight measure 360, 427negligible events 207 time average 195smoothness of the p.d.f. 225 and ensemble average 200sufficient conditions in MA limii under stationarity 196

processes 219-27 time plot 437see also mixing time series 177

strong mixing sequence 209 Toeplitz's lemma 34, 35law of large numbers 295, 298 Tonelli' s theorem 68

strong topology 93 topological space 93strongly dependent sequences 210 topology 93strongly exogenous 403 real line 20sub-base 101, 103 weak convergence 418subcovering 22 torus 102subfield measurability 145 total independence 114subjective probability 111 total variation 72submartingale 233 totally bounded 78, 79

continuous time 501 totally independent r.v.s 127subsequence 24, 80 trace 112subset 3 Tran, Lanh T. 215subspace 20, 93 transformation 6summability 31 transformed Brownian motion 486superior limit diffusion process 503

of real sequence 25 transitive relation 5of st sequence 13 trending moments 262

supermartingale 233 triangle inequality 31, 139continuous time 501 triangular array 34

support 37, 119 stochastic 178supremum 12 trivial topology 93sure convergence 178 truncated kernel 403symmetric difference 3 truncation 298, 308symmetric relation 5 continuous 271, 309

two-dimensional cylinder 182Tl-space 98 Tychonoff space 100Tz-space 98 Tychonoff topology 103, 181, 461Ta-space 98 Tychonoff s theorem 104Tgj-space 100L-space 98 uncorrelated sequence, 230, 293tail sums 31 uncountable 10taxicab metric 75 uniform conditions 186'Taylor's theorem 165 uniform continuity 28, 85telescoping sum 250 uniform convergence 30

proof of weak law 301 unifonn distribution 112, 122, 364

Page 557: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Index

convolution 161expectation 129

unifonu equicontinuity 90uniform integrability 188

squared partial sums 257unifonu laws of large numbers 340uniform Lipschitz condition 28, 87uniform metric 87

parameter space 327unifonn mixing

autoregressive process 218coefficient 206moving average process 227see also mixing

unifonn mixing sequence 209strong law of large numbers 298wak 1aw of large numbers 295

unifonn stochastic convergence 331unifonn tightness 359, 427

Arzel-Ascoli theorem 447measures on C 447,448

uniformly boundedaas. 186in L;, norm 132, 186in probability 186

union 3universally measurable set 328upcrossing inequality 236upper integral 57upper semicontinuity 86, 470upper variation 71Urysohn's embedding theorem 106Urysohn's lemma 98Urysohn's metrization theorem 98usual topology of the line 20

Varadjaran, V. S. 425variable 117variance 131

539

sample mean 293variance-transfonned Brownian

motion 486vector 29vector Brownian motion 498vector martingale difference 234Venn diagram 4versions of conditional expectation

148von Bahr, Bengt 171

Wald, A. 187weak convergence 179, 347

in metric space 418of sums 361

weak dependence 210weak law of large numbers 289

for Al-approximable process 304for fal-mixingale 302for fq-bounded sequence 293for partial sums 312

weak topology 93, 101space of probability measures

418Wei, C. Z. 510well-ordered 5West, K. 401White, Halbert 210, 261, 263, 271,

328, 401-2, 480-1wide-sense stationarity 193Wiener, Norbert 442Wiener measure 442, 474

existence of 446, 452with probability 1 113Withers, C. S. 215Wooldridge, Jeffrey M. 480-1

zero-one 1aw 204

Page 558: Stochastic Limit Theory: An Introduction for Econometricians (Advanced Texts in Econometrics)

Recommended