+ All Categories
Home > Documents > Reproducing Kernel Hilbert Spaces in Probability and Statistics || Measures and Random Measures

Reproducing Kernel Hilbert Spaces in Probability and Statistics || Measures and Random Measures

Date post: 04-Dec-2016
Category:
Upload: christine
View: 212 times
Download: 0 times
Share this document with a friend
56
Chapter 4 MEASURES AND RANDOM MEASURES 1. INTRODUCTION Since its foundation by Borel and Lebesgue around the year 1900 the modern theory of measure, generalizing the basic notions of length, area and volume, has become one of the major fields in Pure and Applied Mathematics. In all human activities one collects measurements subject to variability and leading to the classical concepts of Probability and Mathematical Statistics that can modelize "observations": random vari- ables, samples, point processes. All of them belong to Measure Theory. The study of point processes and more generally of random measures has recently known a large development. It requires sophisticated math- ematical tools. The very definition of random measures raises delicate problems, just as the need for a notion of closeness between them. Here we will first adopt a naive point of view, starting with Dirac mea- sures and showing how reproducing kernels can be used to represent measures in functional spaces (Section 1). Then we will exploit the em- bedding of classes of measures in RKHS (Section 6 and 7) to define inner products on sets of measures. Finally we will show how random mea- sures can be treated as random variables taking their values in RKHS (Section 9). Applications will be given to empirical and Donsker mea- sures and to Berry-Esseen bounds (Section 8). The sections 2, 3, 4 and 5 deal with properties of variables taking their values in RKHS (measur- ability, gaussian measure, weak convergence in the set of probabilities over a RKHS, integrability). A. Berlinet et al., Reproducing Kernel Hilbert Spaces In Probability and Statistics © Springer Science+Business Media New York 2004
Transcript

Chapter 4

MEASURES AND RANDOM MEASURES

1. INTRODUCTION

Since its foundation by Borel and Lebesgue around the year 1900 themodern theory of measure, generalizing the basic notions of length, areaand volume, has become one of the major fields in Pure and AppliedMathematics. In all human activities one collects measurements subjectto variability and leading to the classical concepts of Probability andMathematical Statistics that can modelize "observations" : random vari­ables, samples, point processes. All of them belong to Measure Theory.The study of point processes and more generally of random measureshas recently known a large development. It requires sophisticated math­ematical tools. The very definition of random measures raises delicateproblems, just as the need for a notion of closeness between them.Here we will first adopt a naive point of view, starting with Dirac mea­sures and showing how reproducing kernels can be used to representmeasures in functional spaces (Section 1). Then we will exploit the em­bedding of classes of measures in RKHS (Section 6 and 7) to define innerproducts on sets of measures. Finally we will show how random mea­sures can be treated as random variables taking their values in RKHS(Section 9). Applications will be given to empirical and Donsker mea­sures and to Berry-Esseen bounds (Section 8) . The sections 2, 3, 4 and5 deal with properties of variables taking their values in RKHS (measur­ability, gaussian measure, weak convergence in the set of probabilitiesover a RKHS, integrability) .

A. Berlinet et al., Reproducing Kernel Hilbert Spaces In Probability and Statistics

© Springer Science+Business Media New York 2004

186 RKHS IN PROBABILITY AND STATISTICS

1.1. DIRAC MEASURESLet E be a fixed non-empty set and 'T be a a-algebra of subsets of

E. By a measure we mean a countably additive function J.L from T' intoC such that J.L(0) = O. This means that for any subset I of N and anyfamily {Ai, i E I} of pairwise disjoint elements of 'T one has

At some places we consider set functions satisfying the above propertyonly with finite index set I. We call them finitely additive measures.The simplest example of measure on (E, n is the Dirac measure bx

defined for z in E by

b (A) _ { 1 if x E Ax - 0 if x ¢ A

where A belongs to T', Any measurable complex function f on E isintegrable with respect to 8x and we have

Jf(t) dbx(t) = f(x).

The Dirac measure bx is a probability measure on (E, n assigning themass 1 to the set {x}. When f belongs to some Hilbert space H offunctions on E with reproducing kernel K , integrating f with respect tobx or computing < t, K(., x) > gives the same result f(x), value of fatthe point x . The mapping

by f---T K(., y)

embeds in H the set of Dirac measures on E and, if the function K(x, .)is measurable, the value K(x , y) of the function K( ., y) at the point xcan be written as the integral

JK(x, t) dby(t). (4.1)

More generally if Xl, .. • , X n are n distinct points in E and ai , . . . , an aren non null real numbers, a linear combination

of Dirac measures puts the mass a i (positive or negative) at the point Xi.

Such a linear combination can assign positive, negative or null measures

Measures and random measures 187

to elements of T. It is called a signed measure. As al, . . . , an are alldifferent from 0, the support of the measure I:~=l ai 8X i is equal to thefinite set {Xl, ... , x n } . For any measurable function f one has

(

n ) n nJf d ~ai 8Xi =~ai f(Xi) =~ai eXj(J),

where eX i is the evaluation functional at the point Xi. This extends theprevious remark on Dirac measures and exhibits the connection betweenRKHS and measures with finite support. Actually any Hilbert space 'Hof functions on E with reproducing kernel K contains, as a dense subset ,the set 1lo of linear combinations

n

Lai K("Xi), n ~ 1, (al, ... ,an) E en, (Xl,""Xn) E En,i=1

with the property that, for any measurable f in H,

< j ,tai K("Xi) > = tai j(Xi) = Jj du,i = l i=1

wheren

jt = :L: a; 8X i

i=1

is the discrete measure putting the mass ai at the point Xi. In Section 2we will see that any element of 1l is measurable whenever the kernel Kis measurable.Thus

the dense subset 1lo can be seen as the set of representers in 1l ofmeasures on E with finite support.

The mappingn n

L a; 8X i 1----+ L a, K(., Xi)i=1 i=1

embeds in 1l the set of measures on E with finite support and , with thesame measurability condition on K(x ,.) as above , the value

n

Lai K(x , Xi)i=1

188 RKHS IN PROBABILITY AND STATISTICS

of the function L::i=l a ; K(., Xi) at the point X can be written as theintegral

JK(x,l) d (~a; ox;) (I) =JK(x,l) d~(I) . (4.2)

Now, a measure J1 on E being given, suppose that all integrals

IJl(x) = JK(x, t) dJ1(t), x E E

exist and that they define a function IJl which belongs to 1£. Generalizing(4.1) and (4.2) we can define t he representer in 1£ of the measure J1 asbeing equal to the function

In this way, if all functions K(x, .), x E E are measurable, we can definea mapping

I M -+ 1£

J1 f---7 IJl =JK(., t) dJ1(t) ,

where M is the set of signed measures J1 for which the function IJl existsand belongs to 1£. The set M always contains the set M o of measureswith finite support.Properties of integrals of ll-valued mappings will be described in Sec­tion 5. At this stage assume that inner product and integrals can beexchanged. Then we can write formally, for J1 and v in M,

< IJl,Iv >11. < JK(., t) dJ1(t) ,JK(., s) dv(s) >11.

= J< K(. , t), JK(., s) dv(s) >11. dJ1(t)

J(J < K(., t) , K(. , s) >11. dV(S)) dJ1(t)

J(J K(s , t)dV(S)) dJ1(t).

Finally, assuming the validity of the Fubini formula for product mea­sures, one gets

Measures and random measures 189

the inner product of the representers in 1£ of two measures J1- and t/

is equal to the integral of the kernel of 1£ with respect to theproduct measure J1- 0 t/,

Under suitable assumptions the mapping I will therefore allow us toapply RKHS methods to measures .Some of the first studies considering inner products on sets of mea­sures and applications in Probability and Statistics were carried out inthe years 1975-1980 at the University of Lille under the leadership ofBosq . Guilbart (1977a, 1977b, 1978a, 1978b, 1979) studied the rela­t ionships between reproducing kernels and inner products on t he spaceM of signed measures on a measurable space. He exploited the embed­ding of M in a RKHS and characterized the inner products inducingthe weak topology on sets of measures. Guilbart also proved the conti­nuity of projections with respect to the reproducing kernel defining theinner product. He proved a Glivenko-Cantelli theorem that he appliedto estimation and hypothesis testing. Berlinet (1980a, 1980b) studiedthe weak convergence in the set of probabilities on a RKHS , the mea­surability and the integrability of RKHS valued variables. The firstapplications to random measures were given by Bosq (1979) who con­sidered the prediction of a RKHS valued variable and by Berlinet whoconsidered the representers in RKHS of random measures with finitesupport and proved a Central Limit Theorem and strong approxima­tion results . These last resul ts extended those of Ibero (1979 , 1980) whohad considered spaces of Schwartz distributions and sets of differentiablefunctions on compact sets. Then, the application of RKHS methods tothe study of general random measures was started by Suquet, supervisedat the beginning by Jacob. Suquet (1990 , 1993) used sequences of func­tions characterizing measures to study inner products on sets of signedmeasures and convergence of random measures. He studied particularcases in which signed measures are represented in the RKHS associatedwith Brownian motion (1993) and proved Berry-Esseen type theorems.Suquet and Oliveira (1994 , 1995) applied RKHS methods to prove in­variance principles under positive dependence and for non stationarymixing variables. Bensaid (1992a, 1992b) exploited the embedding thea-

190 RKHS IN PROBABILITY AND STATISTICS

rems in the study of point processesand their nonparametric prediction.We will review basic results and applications. For further developmentsthe reader is referred to the above references.

1.2. GENERAL APPROACHThe definition of the mapping I (embedding Minto 1l) in the above

subsection is a consequence of the simple particular property that inte­grating with respect to a Dirac measure is equivalent to evaluating atsome point. In the present subsection we will arrive at a similar defini­tion through a general approach which can be used in any context wherea RKHS framework has to be designed. This approach is implied in theIntroduction to Chapter 1. It is based on the fact t hat every Hilbertspace is isometric to some space f 2(X ) (space of square summable se­quences indexed by the set X , see Chapter 1). So when a probleminvolves elements of some abstract set S the first attempt to shift it in ahilbertian framework consists in associating an element of a space £2 (X)to any element of the originally given set S . If any element s of S ischaracterized by a family {so. Q E X} of complex numbers satisfying

the mapping

s ~ f 2(X)

S l----7 {so, Q EX}

defines the natural embedding of S into f2(X). Let us see now how toapply this general methodology to sets of measures.A measure J.L on (E ,1) is characterized by the set of its values

{J.L(A ) : A E T} = {J lA dJ.L : A E T}

or more generally by a set of integrals

where :F is some family of functions.For instance a probability measure P on Rd is uniquely determined byits characteristic function

¢>p(t) = r ei<t ,x> dP(x),JJRd

Measures and random measures 191

(4.3)

or equivalently by the set of integrals of the family

F = { ei<t,.> : t E JRd} .

Sets of power functions, of continuous bounded functions and many otherfamilies F can be considered to study measures and their closeness orconvergence.Suppose that to deal with some problem related to a set M of signedmeasures on a measurable space (E ,n we can consider a set F of com­plex functions on E and the families of integrals

III = {J f dJL : f E F }

where JL belongs to M. If, for any JL in M, we have

L IJ f dJLr < 00,IEF

we can work in t he Hilbert space £2(.1"). The inner product of III and I v

in this space is given by

< Ill' t; > £2(F)= L (J f dJL) (J7dV) .IEF

Assuming again that we may apply the Fubini theorem and exchangesum and integral one gets

< IIl,!v > £2(F) L (J f @7 d(JL 0 v))JEF

= J(L f 0 7) d(JL 0 v).JEF

Here III and Iv are not functions on E . They are sequences of complexnumbers indexed by the class F or equivalently they can be consideredas functions on .1". Setting formally

K= ~f07JEF

we get through the general approach the same expression as in the abovesubsection

192 RKHS IN PROBABILITY AND STATISTICS

We know by Theorem 14 that Formula (4.3) holds true whenever :F canbe chosen as a complete orthonormal system in some separable RKHS'H with reproducing kernel K.Let us now illustrate the beginning of this section by a simple example.

1.3. THE EXAMPLE OF MOMENTSLet E = [0,0,5], T be its Borel a-algebra and M be the set of signed

measures on (E, T) . Any element J.l of the set M is characterized by thesequence IIJ. = {J.li : i E N} where

J.li =l xidJ.l(x)

is the moment of order i of u: Here the class :F is equal to the set ofmonomials {xi: i E N}. As we have

Vi E N, 0 ~ J.li ~ TiJ.l(E),

the sequence IIJ. is in e2 (N), Identifying J.l and IIJ. we get , by using Fubinitheorem and exchanging sum and integral (the integrated functions arenonnegative) ,

For a in E and v = oa we have

Vi E N, Vi = ai

and

< J.l , s, >= r_1_ dJ.l(x) = L J.li ai,lE 1 - ax . ""I Ej~

As the sequence of moments I = {J.li , i EN}, the entire function

'PIJ.: E --+ R

x ~ 'P1J. (x ) = L J.li X i

iEN

Measures and random measures 193

characterizes the measure fl . From above it follows that the set of func­tions cI> = {cp~, fL E M} endowed with the inner product

< cP~, CPv >4>= L fLiViieN

induced by the inner product of £2(N) is a prehilbertian subspace withreproducing kernel

1K(x, y) = -- =< CPox' CPOy >4>=< Iox,Ioy > l2( N)=< 8x , 8y >M .

1- xy

In the present context the distance of two signed measures on E is equalto the £2-distance of their sequences of moments.Let us now summarize the first section of the present chapter. We haveseen how a set of signed measures on a measurable set (E, T) can beembedded in a RKHS 1{ of functions on E with reproducing kernel K.Under suitable assumptions we have the following formula

a particular case of which is

(4.4)

(x , y) E E x E. (4.5)

Formula (4.4) was derived in a formal way to give t he reader a first ideaof the application of RKHS methods to measure theory. We now have aset of problems to analyze more precisely.

1) Under what conditions are our formal derivations valid? (Oth­erwise stated, under what conditions can an inner product < .,. >M bedefined on a set M of signed measures?)

2) How does the inner product depend on the reproducing kernel?3) Is any inner product on a set of measures of the kind defined

above?4) What can be the limit of a sequence of measures converging in

the sense of the inner product?5) What are the relationships between the topology induced on M

by the inner product and other topologies on M such as the weak topol­ogy?

6) What kind of results can be obtained through RKHS methods?We will deal with these problems in Sections 6 and 7. Applications willbe given in Section 8.Before that let us give some basic results on measurability and integra­bility of RKHS-valued variables.

194 RKHS IN PROBABILITY AND STATISTICS

2. MEASURABILITY OF RKHS-VALUEDVARIABLES

As seen in Section 1, to embed a set M of measures on E into aRKHS 1£ of functions the measurability of all functions K(., x), x E E,is required. Therefore to study random measures as random elements ofH one needs measurability criteria for 1l-valued variables. The presentsection deals with questions of measurability.As above T is a a-algebra on E, 110 is the subspace of 1l spanned by(K(., t))tEE and B1i is the Borel a-algebra of 1l (B1i is generated by theopen sets). We will suppose in general that 1£ is a separable space. Forany 9 in 1l, < .,9 > will denote the continuous linear mapping

1l ---+ Cip 1----7 < <p, 9 > .

First of all we characterize the random variables with values in (1l, B1i)'

THEOREM 88 The Borel a-algebra of a separable RKHS is generatedby the evaluation functionals.

Proof. Let B be the a-algebra generated by (et)tEE. The continuity ofthe evaluation functionals implies Be B1i. Now, for any f E H we have

Ilfll = sup I < f, 9 > I= sup I < t, 9 > I,I l g lI ~ l IIgll~l ;gE'Do

where Va is a countable subset of 1£0 dense in 1l (see Lemma 11). If9 E Va the function < .,9 > is B - B]R measurable since 9 can be writtenas a linear combination of evaluation functionals. Let r E R,+ and fa E 1£

{f : Ilf - fall :s r} = n {f: I < f - fa, 9 > I :s r}IIg119; gEVo

The above right-hand side is a countable intersection of elements ofB. Therefore it belongs to B. It follows that any closed ball is in B.1l is separable thus any open set is a countable union of closed balls.Consequently B1i C B. •It is worth noting that the proof of Theorem 88 is valid for any totalsequence.

THEOREM 89 Let (9i)iEI be a total sequence in a separable Hilbert space1£. Then the Borel a-algebra of 1l is generated by the linear forms

«.,gi» iE[·

Proof. One has to prove that the space 1£~ spanned by the sequence(gi)iEI contains a countable subset 7)~ which is dense in 1£. This is doneby mimicking the proof of Theorem 88. •

Measures and random measures 195

From Theorem 88 it follows that B1i is the set of intersections 1-£ n Bwhere B runs through the product o-algebra of CE (generated by theevaluation functionals on eE ) and that we have the following corollaries.

COROLLARY 12 Let (0, A) be some measurable space. A mapping

X : (0 , A) ---t (1-£ , B1i)

w ~ X(w, .)

is measurable if and only if for any tEE the function

X(., t) =< X, K( ., t) >

is a complex random variable.

COROLLARY 13 A random function on a measurable space (0, A) takingits values in (1-£, B1i) is equivalent to a stochastic process (XtheE on(0, A) with trajectories in 1-£ .

Let 'lIK denote the mapping

E ---t 1£

t ~ 'lIK(t) = K( ., t).

The following theorem states the equivalence between the measurabilityof K, as function of two variables, the measurability of 'lIK and themeasurability of all elements of 1£.

THEOREM 90 The following four conditions are equivalent.a) K is T (8) T - Be measurable.b) 'lIK is measurable.c) 'tit E E, K(., t) is measurable.d) every element of 1£ is measurable.

Proof. We will prove the sequence of implications a) =:} b) =:} c) =:}

d) =:} a) .a) =:} b) As K is T 0 T - Be measurable, for any fixed s , the

mapping

t~ K(s , t) =< K(., t), K(., s) > (4.6)

is also measurable (Rudin , 1975) .b) =:} c) Same argument as above.c) =:} d) From assumption c) any element of 1£0 is a measurable

function . Being a pointwise limit of a sequence of measurable functionsany element of 1£ is also measurable.

196 RKHS IN PROBABILITY AND STATISTICS

d) =} a) Let (ei) ieN be an orthonormal basis of 1i. From Theorem14 we have,

K(s, t) = L ei(s) ei(t)ieN

As every function ei is measurable, K is T 0 T - Be measurable. •

3. GAUSSIAN MEASURE ON RKHS3.1. GAUSSIAN MEASURE AND GAUSSIAN

PROCESSCorollary 13 puts forward the equivalence between random variableswith values in RKHS and stochastic processes with trajectories in sucha space. Another consequence of the continuity of evaluation functionalson RKHS is the equivalence in those spaces between the notion of gaus­sian process and the notion of gaussian measure. Rajput and Cambanis(1972) have shown this equivalence in some functional spaces (spaces ofcontinuous functions, of absolutely continuous functions, L2-spaces). Asthey pointed out their results extend to any space of functions on whichthe evaluation functionals are continuous. We give hereafter the prooffor RKHS. For the correspondence between cylinder set measures andrandom functions in a more general setting see Mourier (1965).

DEFINITION 28 A stochastic process (XtheE is said to be gaussian ifany finite linear combination of the real variables X t, tEE, is a realgaussian random variable.

DEFINITION 29 A probability measure )l on (Ji , B1/.) is said to be gaus­sian if for any g E 'H the linear form < .,g > is a real gaussian randomvariable on (11. , B1/. ' /1) .

The equivalence between gaussian process and gaussian measure is for­mulated in the following theorem.

THEOREM 91a) If (XtheE is a gaussian process defined on (0, A, P) with trajec­

tories

E --7 IR

t f----7 Xt(w) = X(w,t) , wEn,

belonging to Ji then the measure P X-l induced on (Ji, B1/.) by the ran­dom variable

X : (n,A, P) --7 (Ji, B1/.)

w f----7 X.(w) = X(w ,.)

Measures and random measures 197

is a gaussian measure.b) Conversely, for any gaussian measure fL on (11., B1l) there exists

a probability space (0, A, P) on which can be defined a gaussian process(Xt}tEE with trajectories in 11. such that PX-l = fl.

Proof.a) We have to prove that for any 9 E 11. the real random variable

< .,g >: (1I.,B1l,PX- l) -+ (lR,BIR)

f f---t < I, 9 >

is gaussian. If B E BIR

PX- 1 « .,g >E B) = P« X ,g >E B)

thus we have to prove that < X, 9 > is gaussian. Since 11. = 11.0 thefunction 9 is the limit in 11. of a sequence of functions

ins«= L ai K( .,til,

i = O

n EN.

Thusin

< X, 9 >= lim '"" ai Xt~n--+oo~ I

i = O

and < X, 9 > is gaussian, as the limit (everywhere) of a sequence ofgaussian random variables.

b) For tEE and f E 1i let

Xt(J) = f(t).

Let k E N*, (tl, . .. , tk) E E k, (al, ... , ak) E lR k. The mapping

k

L a; x., : (1i, B1l, fL) -+ (JR, BlR)i= l

is the continuous linear form on 1i represented by

k

L a, K(. , ti)'i= l

Hence it is a gaussian real random variable. (Xt) tEE is a gaussian processwith trajectories in 11. such that X.(J) = f. If X is the associatedrandom function (Corollary 13) we have

fLX-1(B) = fL{9 : X .(g) E B} = fL(B)

198

and

RKHS IN PROBABILITY AND STATISTICS

X - IJ.L =J.L,

so that the process (XtheE on ('H, 81/., J.L) is appropriate to get the con­clusion. •

3.2. CONSTRUCTION OF GAUSSIANMEASURES

To construct a gaussian measure on a separable Hilbert space 1i onecan use a general method which consists in choosing an orthonormalbasis (Ii) in H, a sequence of real numbers (0";) in f2(N), a sequence (~i)

of independent real random variables on some probability space (Q, A, P)with the same N(O, 1) distribution and to set

X : (Q,A , P) -t (1i,81/.)00

w ~ X(w) = L a, f,(w)j;.i=1

X is well defined since the series converges almost surely. The mea­sure PX-l is a centered gaussian measure on (1i, 81/.). Conversely, anycentered gaussian measure on ('H ,81/.) is of this kind (Lifshits (1995),Example 5 p. 81).If 1i is a RKHS, X defines a gaussian process.However, as mentioned in Chapter 3, Subsection 4.4 , putting conditionson normality of continuous linear forms can lead to build finitely ad­ditive measures on the ring of cylinder sets which cannot be extendedto countably additive measures on the whole space (1i, 81/.). A classicalexample is the Gauss measure i.e. the cylindrical measure assigning thedistribution N(O, IIx1l2) to the linear form represented by x. This kindof difficulty gave rise to the theory of abstract Wiener spaces for whichthe reader is referred to Gross (1965, 1970) for a basic exposition.Another important feature of gaussian measures is defined through thenotion of "kernel" of such a measure. For the definition of the kernelof a measure and developments in the infinite dimensional setting thereader is referred to Lifshits (1995).

To study random variables with values in a RKHS and their convergenceone needs some background on weak convergence in the set of probabilitymeasures on a RKHS. This is the aim of the next section. We denote byPr('H) the set of probability measures on ('H,81/.) '

Measures and random measures 199

4. WEAK CONVERGENCE IN PR(ll)Recall the definition of weak convergence of measures on a topological

space.

DEFINITION 30 Let £ be a topological space and B its Borel a-algebra.Let M be the set of signed measures on (£, B) and let 11 EM . Asequence (I1n)nEN of elements of M is said to be weakly convergent to 11as n ---+ 00 if and only if

Jf dl1n ---+ Jf dp.

for any bounded continuous real function f defined on E .

The sets of finite dimension playa key role in the study of measures onfunctional spaces. They are defined as follows.

DEFINITION 31 For (tl "'" tk) E E k,1rtl ,...•tk denotes the mapping

1i ---+ lRk

f ~ (J(td ,·· ·, f(tk)).

A subset of 1i is a set of finite dimension (or a cylinder) if and only ifit can be written as 1rt;.~..•,t k (B), where k E N* and B E BJRk.

THEOREM 92 The class F of finite dimensional sets is a determiningclass, i.e. if two probabilities take the same values on F, they are equal.

Proof. It is enough to prove thata) F is a boolean algebrab) The a-algebra a(F) generated by F is equal to Bu.

a) is a straightforward consequence of the definition of F .To prove b) first remark that the mappings 1rtl ... ..tk are continuous on 1ibecause the evaluation functionals have this property. Hence F C Buand a(F) C Bu.As in the proof of Theorem 88 one can write any closed ball as a count­able intersection of sets of the form

{f; I < f - fo , g > I ~ r}

where g E 1)0. Such a set is of finite dimension: writing

k

g = I>i K( ., ti)i=l

200

we have

RKHS IN PROBABILITY AND STATISTICS

k

{J j I < 1- 10 ,g > I ~ r} = {J; IL ai < 1- 10, K(., ti) > I ~ r}i= l

-1 (B)= 1f t 1 , .• .,tk

where B is equal to ~-1 ([0, r]) and ~ is the measurable mapping

IRk ~ IRk

(a1l .. · ,ak) 1----7 L:adai - IO(ti))i= l

Thus the closed balls are in 0-(.1") and 81l C 0-(.1"). •Remark In general the class .1" does not determine the convergence:one may have

VA EF

for a sequence (Pn)nEN of Pr(l£) which does not converge weakly to P.Let us illustrate this remark by the following example.ExampleLet 1l = H 2([0 , 1]) = {J : I and I' are absolutely continuous on [0,1] andI" E £2 ([0 , I]) endowed with the inner prod uct

< l .s >= I (O)g(O) + 1' (O )g'(O)+ < I" ,g" >£2([0,1]) .

1i is a RKHS with kernel K defined on [0, IF by

if s < tif t ~ s

(see Chapter 6 and 7).For n E N* let In be the function of 'H defined by

if x E [0,2/n]if x E [2/n , 1] .

(1) 27and max In(x) = In - =-.xE[O,1J 2n 16

For any n E N, In satisfies

In(O) = In (~) = I~ (~) = I~ (~) = I~ (2~) = 0,In (~) = 1

Measures and mndom measures

1 . 5

1 . 25

0.75

0.5

0 . 25

0.2 0.4 0. 6 0 . 8

Figure 4.1: The functions fl, 12, 13, flO over [0,1].

2.2

201

2

1 .8

1 .6

1 .4

1 .2

0 .2 0 . 4 0.6 0 .8 1

Figure 4.2: The functions K(0.8, t) and K(t, t) over [0,1].

K is a bounded kernel and7

max (K(t, t)) = ­tE[O,l] 3

Hence, by the Cauchy-Schwarz inequality, convergence in 1£ implies uni­form convergence on [0,1] :

'rig E 1£, 'rIt E [0, I], Ig(t)1 = I < s,K(., t) > I

202 RKHS IN PROBABILITY AND STATISTICS

< Ilgll sup (K(t, t))1/2tE[O,I)

< /Illgll

Clearly the sequence (fn)nEN* does not converge uniformly to the nullfunction on [0,1]. Hence it does not converge to the null function in 11. .Therefore (8in) does not converge weakly to 80 • However, if (h , . . . , tk) ::f:(0, . .. ,0) and (2/n) ~ min{tl, .. . ,tk}, we have

8 (1r- 1 (B)) _ { 1i n tl ,...,tk - 0

thus for n large enough we have

if (0, ,0) E Bif (0, ,0) ¢ B

8/ (1rt-1 t (B)) = 80 (1rt-

1 t (B)) .n 1,..., k 1,···, k

The present example also shows that a sequence offunctions in a RKHS11. can converge pointwise to a function of 11. without converging in thenorm sense.

4.1. WEAK CONVERGENCE CRITERIONAs the evaluation functionals on RKHS are continuous one can get a

similar criterion of weak convergence as in C([O, 1]).

THEOREM 93 Let P and {Pn : n E N} be elements of Pr(1I.).The sequence (Pn)nEN converges weakly to P if and only if

a) (Pn)nEN is tight:

VE> 0, 3K compact such that Vn E N, Pn(K) > 1 - E

b) Vk E N*, V(tl, ... ,tk) E E k, Pn1r~~...,tk ==::;. P1r~~...,tk'

Proof. This result is well known for the space C([0, 1]) (Billingsley,1968). It can be proved for 1l in the same way: a) is equivalent, byProhorov's theorem, to the relative compacity of the sequence (Pn)nEN'Therefore Theorem 93 is a consequence of the continuity of the applica­tions 1rtl ,...,tk and of Theorem 92.

5. INTEGRATION OF 1-l-VALUED RANDOMVARIABLES

In this section we review basic definitions and properties about inte­gration of RKHS valued random variables. One of the most useful resultis the possibility of interchanging integrals and continuous linear forms.For detailed exposition see Hille and Phillips (1957) .

Measures and random measures 203

5.1. NOTATION. DEFINITIONSLet X (n, A, P) --t (1£, B1l) be a random variable. If Z is a realfunction defined on n, if A E A and if ZIA is P-integrable, we denote

the integral of Z on A and, if Z is integrable,

E(Z) = En(Z).

DEFINITION 32 (WEAK INTEGRAL) Let A E A. X is weakly integrableon A if

vf E 1£ , < x, f > is integrable on A

and if there exists XOA E 1£ such that

Vf E 1£, EA« X,f » =<x~ ,f >.

°XA is called the weak integral of X on A and noted

X is said to be Pettis-integrable if it is weakly integrable on n.

From the definition it is clear that

Vf E 1£, i < X, f > dP =<x~, f >=< t X dP, f > .

DEFINITION 33 (STRONG INTEGRAL) Let A E A. X is strongly inte­grable on A if IIXII is in tegrable on A. X is said to be Bochner-integrableif it is strongly integrable on n (then X is strongly integrable on any

element of A).

As I < X, f > I $ IIXllllfll, if X is strongly integrable on A, there existsXA in 1£,

XA = i xdP,

such thatVf E 1£, EA« x .t » =< xA,f >.

Thus X is weakly integrable on A and XOA = x A. For a 1-l-valued randomvariable, Bochner integrability implies Pettis integrability (with equalityof the integrals) but the converse is not true, as shown in the following

204 RKHS IN PROBABILITY AND STATISTICS

example.Example Let (Q, A , P) = (N*, P(N*), P) with P(n) =2-n , n E N* andlet 1{ = [2 (N). As seen in Chapter 1 the reproducing kernel of f2(N) isgiven by

K(i ,j) = c5ij .

Let

X (N*,P(N*), P) -+ (f 2(N), B~2(N))

t f---7 X(i) = ~ii) K(., i)

where S is the element of f2(N) such that

Vi E N*,

• X is clearly measurable and we have

1Si = -;-.

t

and00 00 1L IIX(i)11 P(i) = L -;- = 00

i==1 i==1 t

Thus X is not strongly integrable.• Now let hE f2(N). We have

. 1. h(i)< h, X(z) >=< h, iP(i) K(. , z) >= iP(i)

and

r< h, X(i) > dP(i) = L h~i) =< h, SA >JA iEA

where SA is the orthogonal projection of S on the subspace of f2(N)spanned by the evaluations {ei : i E A}. It is given by

SA = L < s, K(., i) > K(. , i) = L ~K(., i) .iEA iEA t

It follows that X is weakly integrable.

Measures and random measures

5.2. INTEGRABILITY OF X AND OF{XT : TEE}.

If the random variable

X : (11, A, P) ---t (ti, B1i)

is (weakly or strongly) integrable, any real random variable

205

x, lA =< X , K(. , t) > lA,

is also integrable and the function

A EA,

E ---t R

t I---t EA(Xt ) = Lx, dP

is the integral of X over A . It follows that the function EA(X.) belongsto 11. .Conversely, if any X, is integrable, X (which is therefore measurable) isnot necessarily weakly integrable. As shown in the following example,the function E(X.) may not belong to H.As in the example of the above subsection take H = l2(N). Let

X : (N*, P(N*), P) ---t (£2(N) ,Be2(N ))

z I---t X(i) = 2iK(., i)with P(n) = 2-n , n E N*

Let j E N*. The random function < X , K (., j) > transforms the integeri either into 0 if i =I j or into 2j if i = j. Therefore

E« X ,K(.,j) » = 1.

Clearly the constant sequence {E« X,K(.,j) »: j E N*} does notbelong to ti and therefore X is not weakly integrable on N*.We will give in Theorem 96 a necessary and sufficient condition for theweak integrability of X . Let us begin with two theorems on linear formson 'H defined by integrals.

THEOREM 94 For any A E A the following conditions are equivalent .a) The mapping

206 RKHS IN PROBABILITY AND STATISTICS

belongs to 1£.b) The mapping

'PA 1£0 --t R

f t---7 EA < X, f >

is a continuous linear form on 1£0.If these conditions are satisfied then the representee of 'PAin 1£ is equalto EA.

Proof.a) ==} b)

Letn

f = I: ai K(., ti)i=1

be any element of 1£0. Then

n

EA < X,f >= EALaiXtii=1

n

= La;EAXti;=1n

= I: aiEA(ti) =< EA, f > .i=1

b) ==? a)It follows from Hahn-Banach Theorem that 'PA can be extended to acontinuous linear form 'l/JA over 1£ with the same norm as 'PA. Let fA bethe representer of'l/JA in 1£. We then have, for t in E,

fA(t) < fA, K(., t) >= 'l/JA(K(., t))

= 'PA(K(., t)) = EAXt

so that

THEOREM 95 Let A in A such that for any t in E, EAXt exists. Let

IA: 1£0 --t R+

f t---7 LI < X, f > IdP.

The following three conditions are equivalent.a) IA is continuous at O.b) IA is continuous.c) IA is Lipschitz continuous.

Measures and random measures

Proof. IA is well defined: if

n

f = L aiK (., ti)i=l

thenn

< X , f >= L aiXti is integrable on A .i = l

207

Clearly we have c) ==} b) ==} a). It remains to prove that c) is impliedby a).Suppose that IA is continuous at O. There exists 17 > 0 such that

Hence, if f i= 0,

(4.7)

Now, let f and 9 in 1lo. We have

ILA(J) - LA(g)1 lEA (I < X , f > I-I < X ,g > DI< EA II < X , f > I- I < X,g > II< EAI<X,f-g>I=IA(J-9)

< .!.llf - gil,17

where the last inequality follows from (4.7). Hence the Lipschitz continu­ity of LA follows. Now we are in a position to give a sufficient conditionfor weak integrability.

THEOREM 96 Let A in A. If

is an element of 1l and if LA is continuous at 0 then X is weakly inte­grable on A and t XdP= CA.

Proof. Let f E H, We have to prove t hat < X , f > is integrable on Aand that EA < X,f >=< cA,f >.

208 RKHS IN PROBABILITY AND STATISTICS

Let (In)nEN be a sequence of elements of 1£0 converging in 1£ to thefunction f . By Theorem 95 we can write

LI < X, fn > - < X , fm > IdP = IA(Jn - fm) ~ 1] IIfn - fmll1lo

where 1] is a positive constant . Therefore « X, fn >)nEN is a Cauchysequence in £1(A) , the space of P-integrable functions over A. As itconverges everywhere to < X , f >, it converges also to < X, f > inL1(A). It follows that < X,f > is integrable over A and that

EA < X , f >= lim EA < X, I« > .n~oo

By Theorem 94,

and

•We will end this section by giving in Theorem 98 a necessary and suffi­cient condition for weak integrability of RKHS valued random variables.For that we need a definition and a theorem of Hille and Phillips (1957) .

DEFINITION 34 A set function B defined on a probability space (0, A, P)and taking its values in 1£ is said to be absolutely continuous if and onlyif, for any e > 0 there exists 1]1$ > 0 such that

VA EA

THEOREM 97 If X :set function

(P(A) < 1]1$ ==* IIB(A)II < €) .

(0 ,A, P) ----7 (1£, B1l) is weakly integrable the

A ----7 1£

A t---+ XA = t X dP

is absolutely continuous on (0 , A, P).

THEOREM 98 A random variable X : (0 , A, P) ----7 (1£, B1l) is weaklyintegrable if and only if the following two conditions are satisfied.

a) For any A E A the mapping

Measures and random measures

is an element 01 1l.b) The set [unction

209

is absolutely continuous on (Q, A, P).

Proof. First suppose that X is weakly integrable. Then for any A in Aand any t in E we have

[A(t) =L< X, K( ., t) > dP = XA(t)

and a) is true. By Theorem 97 b) is also satisfied.Let us now turn to the converse. a) and b) are supposed to be true.First note that X is measurable since the Xt's are measurable. We willprove that for A E A the mapping IA is continuous at O. Then the weakintegrability of X will follow by Theorem 96.Let (In)nEN be a sequence of 1lo converging to 0 in the norm sense. Letus show that (IA (In)) tends to 0 as n tends to infinity.Let e > O.On the one hand, by b) there exists 1]1(e) > asuch that , for any BE A,

P(B) < 1]1(e) :=} II£BII < e1/2.

On the other hand « X, In >)nEN converges to aeverywhere and there­fore in probability. So there exists N (e) such that

{

P(I< x.t; >1 > ~) < 1]1(e)n ~ N(e) :=} and

IIfnll < e1;ZFor n E N, let

A~ (e) = A n {I < X, f n>I > ~} n {< X, In >~ a}

and

We have

A; (e) = An {I< X, In >1 >~} n{< x.t; >< O}.

IA(ln) = L1< X, In >1 dP

~ r I<X,/n>ldP+~JAn{I<X.Jn>I>f} 2

210 RKHS IN PROBABILITY AND STATISTICS

~ r < X, In > dP - r < X, In > dP +iJA~(e) JA;:;(e)

Now, if n 2: N(€), we have the following inequalities

IIEA~(e)1I < €1/2

liEA;:; (e) II < €1/2

€1/2

Il/nll<4

so that

•Sufficient conditions for a given function on E to belong to 'H (conditiona)) can be found in Due-Jacquet (1973).

6. INNER PRODUCTS ON SETS OFMEASURES

In the rest of this chapter we will denote by(E , I) a measurable space,M the space of signed measures on (E, I),M+ the subset of M of positive measures,P the set of probability measures on (E, n,M o the space of measures with finite support,Po =pnMo,K a real bounded measurable reproducing kernel on Ex E,1-l the RKHS with kernel K,1-lo the subspace of 11 spanned by (K(. , X))xEE.From the properties of K, the mapping

(E, I, It) -+ (1-l ,B1l)

x ~ K(.,x)

is strongly integrable for all It in M and we can define a mapping

I: M-+ll

Jl ~ Ill- = JK(. , x)dJl(x)

Measures and random measures 211

We will suppose that K is such that the functions IIJ. and Iv are differentif JJ and v are not equal. A consequence is that

(IIJ. = 0)~ (JJ = 0)

since (JJ = 0) implies (IIJ. = 0). The next lemma expresses that thisproperty is shared by kernels that can be written

00

K(x, y) = "LJi(x)fi(Y)i=O

(4.8)

where the set of functions {fi : i E N} characterizes signed measures on(E, n as it is the case with the monomials in the moments example ofSubsection 1.3.

DEFINITION 35 A set of complex functions {fi : i E N} on E is said tocharacterize (or to determine, see Billingsley, 1968) the elements of Mif and only if for any JJ in M

(Vi E N, 1f idJJ = 0) ~ (JJ = 0) .

It is clearly equivalent to say that two different signed measures JJ and vproduce two different sequences {J fidJ.l : i E N} and {J fidv : i E N}.LEMMA 19 Let K be a kernel satisfying (4.8) with a set of functions{fi : i E N} characterizing the signed measures. Then

(IIJ. = 0)~ (JJ = 0) .

Proof. By the properties of integrals

< IIJ.,Iv >1i = <1K(., x)dJJ(x),1K( ., x)dv(x) >1i

1< K(. , y) ,1K (., x)dv(x) '>« dJJ(Y)

= 1(1 K(x ,Y)dV(X)) dJJ(Y)

~ J(J to /;(x If;(Y)dl'(X)) dl'(Y)·

On the other handn

L f i(X)fi(Y);=0

1 n

< 2L (Ji(x)2 + f ;(y)2)i= O

1< 2 (K(x, x) + K(y, y))

< supK.

212 RKHS IN PROBABILITY AND STATISTICS

Using the Hahn-Jordan decomposition of j.t and the Lebesgue dominatedconvergence Theorem one gets

IIIJL 11 2= f (J Jidj.t) 2

~=o

and the conclusion follows. •Let us now state the fondamental link between reproducing kernels andinner products on sets of measures.

THEOREM 99 The mapping

MxM --t lR

(j.t,v) f---t < j.t,V >M=< IJL,Iv >1{= JK(x ,y)d(j.t0v)(x,y)

defines an inner product on M for which Mo is dense in M.Conversely let << ., . >> be an inner product on M for which Mois dense in M. Suppose that the function K(x, y) =<< 8x,8y >> ismeasurable and bounded on E x E and that the corresponding mappingI is one-to-one from M to I(M). Then there exists a RKHS 1£ withkernel K and a unique linear mapping

h: I(M) ----+ 1£

such that

«j.t,V»M = < h(I(j.t)),h(I(v)) >1{

< h(j K(.,X)dj.t(x)),h(j K(. ,x)dv(x)) >1{.

If the mapping

has for any j.t a weak integral equal to u; then the mapping h is equal tothe identity and we have

«j.t,V»M < JK(.,x)dlJ(x),JK(.,x)dv(x) >1{

= j K d(1J ® v)

Measures and random measures 213

(by Fubini theorem)

< JL , u >

Proof. Direct part.From its very definition it is clear that the mapping < .,. > is bilinear.As K is symmetric and (JL 0 v)-integrable we have

JK(x , y)d(JL 0 v)(x , y)

JK(x, y)d(v 0 JL) (y, x)

JK(y,x)d(v 0JL)(Y,x)

< v,JL >

and < ., . > is symmetric.The positive definiteness follows from

and the equivalence(IJl = 0) {::::::::} (JL = 0) .

If JL and v are two probabilities we have

1< JL, v >1 ~ If K(x, y)d(JL 0 v)(x , y)1 ~ sup IKI·

As we have< JL, v >M=< IJl,Iv >tl

the mapping I is an isometry between M and I(M) and the densenessof M o in M is a consequence of the denseness of Jio in Ji.Converse part.By linearity we have

for JL and v belonging to the space M o so that the restriction of themapping I is an isometry from M o onto I(M o) = 1£0 but there is noreason for Formula (4.9) to hold for any elements JL and v in M (seeExercise 1). Let f be an element of I(M) and JL = I-1(J). As M o isdense in M there exists a sequence (JLn )nEN in M o converging to JL. Theisometry I transforms this converging sequence into a Cauchy sequencein 'H converging to some element 9J' Define

h: I(M) -+ 1£

f f----t h(J) = 9J·

214 RKHS IN PROBABILITY AND STATISTICS

< h(I(J.L)) , h(I(v)) >11. =

h is well defined, linear, and we have

lim < I(J.Ln) ,I(vn) >11.n-..+oo

lim «J.Ln,vn »Mn-..+oo

= << J.L , v >>M .

This ends the first part of the converse proof.Now, suppose that for any J.L in M we have

Then

«J.L,V»M « f OxdJ.L(x),f c5xdv(x) »M

= 1« ox,f Oydv(y) »M dJ.L(x)

1(1 «Ox,Oy »M dV(Y)) dJ.L(x)

1Kd(J.L Q9v)

= <1K(., x)dJ.L(x),1K(., x)dv(x) >11.

and h is the identity operator. •

RemarkConsider a random variable X taking its values in (E ,n with unknownprobability Px. Then the random Dirac measure Ox can be seen as anestimator of Px based on one observation X. For J.L = Px the condi­tion given on the mapping ~ is equivalent to the unbiasedness of theestimator Ox since its (weak) expectation is

f OxdJ.L(x) = J.L.

7. INNER PRODUCT AND WEAKTOPOLOGY

A major tool in the study of measures is the weak topology. It istherefore important to compare topologies on M defined by inner prod­ucts and the weak topology.In this section, adapted from Guilbart (1978a), E is a separable metricspace and T is its Borel a -algebra. Let < ., . >M be an inner product on

Measures and random measures 215

(4.10)

M such that the function K(x, y) =< 8x , 8y >M is bounded on Ex E .Recall that, by the Cauchy-Schwarz inequality, K(x, y) is bounded onEx E if and only if K(x, x) is bounded on E.The following theorem gives two important properties of the functionK when the inner product induces the weak topology on the set P ofprobabilities on (E, n.THEOREM 100 If the topology induced on P by the inner product <., . >M coincides with the weak topology then the function K is contin­uous and

< JL , v >M= JKd(JL 0 v).

Proof. Let (Xn)nEN and (Yn)nEN be two sequences in E convergingrespectively to x and y. Then the sequences (8X n )nEN and (8yn )nENconverge respectively to 8x and 8y in the sense of the weak topology(Parthasarathy, 1967) and therefore in the sense of the inner product.As we have

K(x,y) - K(xn,Yn) = < 8x,8y >M - < bX n ,8Yn >M

< s; - 8x n , 8y >M - < 8x n , 8Yn - 8y >M ,

we can write

and the continuity of K follows.Relation (4.10) is satisfied for Dirac measures and, by linearity, in theset M o of measures with finite support. Now let (JL, v) E M 2 • As M o isdense in M in the sense of the weak topology t here exist two sequences(JLn)nEN and (vn)nEN of elements of M o converging weakly respectivelyto JL and t/. By hypothesis this convergence also occurs in the sense ofthe norm on M so that the sequence of inner products

tends to < JL , v > .On the other hand K is bounded and continuous and (JLn 0 Vn)nEN tendsweakly to JL 0 v therefore

JKd(JLn 0 vn) tends to JKd(JL 0 v).

Hence, (4.10) is proved . •

216 RKHS IN PROBABILITY AND STATISTICS

Now let us see under what conditions the reciprocal of Theorem 100holds true. For this we need Lemma 20 on orthonormal systems in 1£characterizing signed measures on E and Lemma 21 on weak conver­gence.

LEMMA 20 Let 1£ be a Hilbert space of functions defined on a compactmetric space (E, d) with continuous reproducing kernel K. Then anyorthonormal system (ei) in 1£ characterizing signed measures is total inthe set Cb(E,C) of bounded continuous complex functions on E endowedwith the sup norm.

Proof. From Corollary 5 it is clear that 1£ c Cb(E,C). Let (ei) be anyorthonormal system in 1£ characterizing signed measures. The closedsubspace of Cb(E,C) spanned by (ei) is denoted by S. Suppose thatthere exists an element cp of Cb(E,C) that does not belong to S. By theHahn-Banach theorem (Rudin, 1975) there exists a continuous linearform L on Cb(E,C) which vanishes on S and takes a non-zero value atcp. By the Riesz representation theorem there exists J.l in M such that

L(J) = Jf du,

As L(cp) =f: 0, the measure J.l is not null and yet

Vi E N, Jei dJ.l = O.

We get a contradiction. Hence, cp does not exist and S = Cb(E,C). Thesystem (ei) is total in Cb(E,C). •

LEMMA 21 A sequence (e.) which is total in Cb(E, C) determines theweak convergence of probability measures, i.e. for any sequence (J.ln) inP and any J.l in P the weak convergence of (J.ln) to J.l is equivalent to

Vi EN, Je, dJ.ln --7 Jei dJ.l as n --7 00. (4.11)

Proof. Condition (4.11) is clearly necessary by definition of the weakconvergence.Let us now prove its sufficiency. By linearity, for any f belonging to thevector space E spanned by (ei) we have

Jf dJ.ln --7 Jf du as n --7 00.

Let e > 0 and let g be any element of Cb (E, C) . For f in £ such that

sup Ig - fl ~ c

Measures and random measures

we can write

217

If 9 dJ1n - f 9 dJ11 s If 9 dJ1n - f f dJ1nI+ 11 f dJ1n - 1f dJ1 j + 11 f dJ1 - 19 dJ11

and therefore

As the second term in the above right hand side member tends to zeroas n -t 00, we get that

The conclusion follows.

19 dJ1n -t19 dJ1.

•We end this section by stating the converse of Theorem 100 in the casewhere E is a compact metric space. The case where E is a non compactseparable metric space is treated by Guilbart (1978a) .

THEOREM 101 Suppose that E is a compact metric space, that the func­tion K(x,y) =< 8x,8y >M is continuous and that

Then the topology induced on P by the inner product < 0' . >M coincideswith the weak topology.

Proof. Applying Corollary 5 we can write

"Is E E, "It E E,00

K(s, t) = Lej(t) ej(s),i=O

(4.12)

where the convergence is uniform on Ex E, (ej) is an orthonormal systemin 1£ and each e, is uniformly continuous and bounded. For any signedmeasure J1 on (E ,T) we have

"J1"~ = 1K d(J1 0 J1)

= J(to e,(z)e,(y)) d(p.0 p.)(z , y)

= to (1 ej dJ1) 2

218 RKHS IN PROBABILITY AND STATISTICS

by uniform convergence and the Fubini theorem. It follows that thesystem (e.) characterizes signed measures and, by Lemma 20 and 21,that the weak convergence in M of a sequence (f-ln) to some element f-lis equivalent to

Vi EN, Je j df-ln -7 Jei df-l as n -7 00. (4.13)

Hence it is clear that convergence in the sense of the inner productimplies weak convergence.Conversely, if (f-ln) converges weakly to f-l then (f-ln 0f-ln) converges weaklyto f-l 0 f-l. The function K being bounded and continuous this impliesthe convergence of

IIf-lnIl 2= JK d(f-ln 0 f-ln)

to 1If-l1l 2 • Together with (4.13) this implies convergence in the sense ofthe inner product. •

As we have seen the functions e, appearing in the decomposition of thereproducing kernel playa key role in many proofs. When those functionssatisfy an additional condition on their upper bounds it is possible toderive a Glivenko-Cantelli type theorem for random variables takingtheir values in E. (See Guilbart , 1977a and Exercise 3). Such a theorem,with rate of convergence, is basic in applications to statistical estimationand hypothesis testing.

8. APPLICATION TO NORMALAPPROXIMATION

In the present section we give an example of application of the RKHSmethodology to the normal approximation of partial sums of randomvariables with rates of convergence (Berry-Esseen theorems). It origi­nates from a paper by Suquet (1994). The space of measures is embeddedin a reproducing kernel Hilbert space and in a L2 space using an integralrepresentation of the kernel. Then the weak convergence of probabilitymeasures can be metrized through a suitable choice of the kernel andrates of convergence in the Central Limit Theorem can be easily de­rived.Let X1"",Xn be independent real random variables with mean zeroand finite moments of order 3. Denote (1j the standard deviation ofXj,1 :S j :S n,

n

and S* = Snn Sn

Measures and random measures

wheren

sn = (L aD 1/2

i=l

219

is the standard deviation of Sn'Then the Berry-Esseen theorem (Shorack and Wellner , 1986) providesan upper bound for the Kolmogorov distance IIF~ - Fll oo between thedistribution function F~ of S~ and the distribution function F of N(O, 1).

THEOREM 102 (BERRY-ESSEEN THEOREM) There exists a un iversal con­stant C such that

n

IIF~ - Flloo =sup IP(S~ s x) - P(Z ~ x)1 s Cs;;3 L E IXjl3 (4.14)xER j = l

where Z has distribution N(O, 1).

Other distances have been considered (Rachev, 1991) for which similarbounds have been proved. Our goal here is to show that the RKHSframework is well adapted to prove such results.Consider a non negative real function q integrable with respect to theLebesgue measure A. The functions exp(ix.), x E JR, belong to the spaceL 2 (q) of square integrable functions with respect to the measure withdensity q on JR. Hence, by Lemma 1, the function

K(x,y) =< exp(ix.) ,exp(iy.) >£2(q)= Jexp(iu(x - y)) q(u) dA(u)

is a reproducing kernel on JR . Denote by dM the distance on the space Mof bounded signed measures associated with K , by £(Sn) the probabilitydistribution of Sn and let

a(q) =Ju6q(u) A(du).

Then we have the following bound (Suquet, 1994).

THEOREM 103 Suppose that Xl , .. "Xn are independent real randomvariables with mean zero and finite moments of order 3 and that thefunction q satisfies

o< a(q) < 00.

Then the distance dM meirizes the weak topology on the set of probabilitymeasures and we have

220 RKHS IN PROBABILITY AND STATISTICS

Under the same hypotheses, by considering the variables Xj / Sn , oneeasily gets the following bound

which takes the form

if the variables Xl, ... , X n satisfy, for 1 ;::; i ;::; n,

I 31 (2)1/2E Xi = T and EXi = a.

For the proof of Theorem 103 and extensions to multivariate and depen­dent cases the reader is referred to Suquet (1994).

9. RANDOM MEASURESAt first sight it seems natural to define a random measure as a ran­

dom variable with values in a set of measures M equipped with somea-algebra. However, the definition of this a-algebra possibly derivedfrom some topology on M is not a simple matter and the resulting the­ory involves delicate mathematical questions (Kallenberg (1983) , Karr(1986), Geffroy and Zeboulon (1975) , Jacob (1978, 1979)).As the theory of Hilbert space valued random variables is much easierto handle there is a great temptation to use the embedding

I: M--tJi

/-l 1---7 JK(. , x) d/-l(x)

introduced in this chapter to define and study random measures. Buteven in this framework difficult questions immediately come up. How tocharacterize the elements of I(M) among the elements of Ji? What canbe the limit of a sequence of elements of I(M)? How to characterizethe a-algebras on Ji containing the set I(M)? From Theorem 88 theBorel a -algebra B1i on a separable RKHS is generated by the evalua­tion functionals but there is a priori no reason for B1i to contain I(M)(even if I(M) is a dense subspace!). Under some additional conditionsI(M) can be proved to be a Borel set when M is either the set of signedmeasures or the set of positive measures on (E, BE) where E is a locallycompact or separable metric space (Suquet, 1993).

Measures and random measures 221

Let us adopt the same route as in the above sections and start withDirac and finite support measures. As we will see this route leads to afunctional theory rather than a set theory of random measures.The notion of random measure on a measurable set (E ,T) can be in­troduced through the simple and understandable example of the Diracmeasure <5y , where

Y: (O,A,P) -+ (E , T)

is a random variable. For any set A in the e-algebra T , we have

J { I if YEA<5y(A) = lA d<5y = lA(Y) = 0 if Y rf. A

and , more generally, for any measurable function

f : (E, T) I-----T (C,Be),

the random integral Jf d<5y is equal to the random variable

f(Y) : (0, A, P) I-----T (C,Be) .

11. being a RKHS of func tions on E with measurable kernel K, K(., Y)is a 1I.-valued random variable. More generally, two triangular arraysbeing given (all random variables are defined on (0, A, P))• one of complex random variables

A1,n, A2 ,n, ... , Ak(n),n,

• one of random points in E

the sumken)

j.ln = ~ Ai,n<5Y; ,ni=l

is a sequence of measures on E with finite support associating with f arandom integral

ken)Jf dj.ln = ~ Ai,nf(Yi,n)'i= l

Each of these measures is represented by the 1I.-valued random variable

k(n)

~ Ai ,nK(. , Yi,n)i = l

222 RKHS IN PROBABILITY AND STATISTICS

in the sense that for any ! in H, we have,

k(n) k(n)! ! dJ1-n = L Ai,n!(li,n) =< I, L Ai,nK( ., li ,n) >1{ .i=l i = l

Note that the dimension k(n) of the above triangular arrays can itselfbe a random variable so that the present setting covers many examplesof sequences of discrete random measures. Let us briefly mention a fewof them.Example 1. Empirical measure. The empirical measure

1 nJ1-n = - ~ 8y,n L...J •

i=l

associated with a sample (Y1 , ... , Yn ) of n random variables is dealt within Subsection 9.1 below. It corresponds to

1k(n) = n, lin = li and Ain =-., 'n

Example 2. Donsker measure. It can be written as

1 n

J1-n = vn L li8{i/n}t=1

and therefore corresponds to

k(n) = n, li ,n = i/n and Ai,n = ~'

The random functions involved in the Donsker theorem (Billingsley,1968) can be written as integrals with respect to Donsker measures.Main applications are invariance principles in RKHS and L2 spaces (Su­quet, 1993, Suquet and Oliveira, 1994).Example 3. Point process. A point process on E can be defined as

N

J1-n = L8Yii=l

where N is the random number of points Y1 , • •• , YN falling into E. Itcorresponds to

k(n) = N, li ,n = Yi and Ai,n = 1.

Other models can be considered, for instance thinned point processes forwhich each observation li is deleted with some probability p(li).

Measures and random measures 223

RKHS methods can be used for estimating and predicting point pro­cesses (Bensaid, 1992a) and random measures (Bosq, 1979). In thefollowing subsection we consider the empirical measure. Strong approx­imations of empirical processes will be presented in Chapter 5.

9.1. THE EMPIRICAL MEASURE AS1/.-VALUED VARIABLE

The present subsection deals with the simple case of the empiricalmeasure. It can serve as an introduction to the general theory of ran­dom measures as RKHS valued random variables.Let Y : (n, A , P) ----+ (E,7) be a random variable with unknown prob­ability measure Jl on E. For any tEE the Dirac measure Ot defines acontinuous linear form on 1l :

f ~kf dOt = f (t)

which is nothing else than the evaluation functional et represented in1l by the function K(., t) . Thus the random Dirac measure Oy is rep­resented in H by the variable K(., Y) . Let (l'i)i>l be a sequence of in­dependent random variables taking their values in (E,7), defined on(n, A, P) with common probability measure u, The natural estimate ofJl associated with this sample is the empirical measure

1 n

Jln = - LOYk'n k=l

It is represented in 1l by

The RKHS theory provides a functional framework to study how Jlnapproximates Jl or rather how its representer in 1{ approximates therepresenter of Jl which is

Ill- = JK(. , x) dJl(x).

From Corollary 12 the mapping

(n, A, P) ----+ (1l, B1i)1 n

W ~ -;;: I:K(. ,Yk(W))k=l

224 RKHS IN PROBABILITY AND STATISTICS

is measurable for any n E N* if and only if K(t , Y) is a real randomvariable for any tEE (to simplifly we consider here that 1l is a space ofreal functions on E). When this last condition is fulfilled the empiricalmeasure can be considered as a 1l-valued random variable. This is thecase if the mapping

'11 K : (E, n -7 (1l , B'H)t ~ K(.,t)

is measurable.In the rest of this subsection we will make the assumption that K( .,Y)is a second order random variable and that two elements f and 9 of 'H.have the same integral with respect to p if and only if f = g.

9.1.1 INTEGRABLE KERNELSLet us summarize the assumptions made on K , 1l and p:

(Hj ) K is ('(2 - BIR)-measurable.(H2) The mapping

E -7 a+x ~ K(x ,x)

belongs to £1(p).(H3) the null function is the only one function in 1l that is null p­

almost everywhere. As FK(X) = K(., x), IIFK(x)1I 2 = K(x, x) and wehave

JK(x, x)dp(x) = JIIFK(x)11 2dp(x) = JIIK( .,Y)11 2dP.

K(. ,Y) is a second order random variable on (0, A , P) if and only if '11K

is a second order random variable on (E, T, p) and this is equivalent tohypothesis (H2) .

Before giving properties of the natural estimate of Ip. = JK( ., x) dp(x)let us draw some consequences of our assumptions. For definitions ofhilbertian subspaces and Schwartz kernels see Subsection 6.1 in Chapter1.

THEOREM 104 H is a hilbertian subspace oj £2(p).

Proof. Let 9 E 'H, As we have

Ig(x)1 2 = I < g, K(. , x) >'H 12 :S IIgll~ K(x, x)

the function 9 belongs to £2 (p) and

IlgIIL2(JL) $ IIgll'H (J K(x ,x)dp(x) Y/2

Measures and random measures

The conclusion follows.

225

•For any square integrable real function 9 on (E t I, J.L), denote by 9 itsclass in L2(J.L). Assumption (H3 ) implies that t he natural embedding

is an isomorphism of Hilbert spaces between 1l and its image it.

THEOREM 105 For any 9 in L2(J.L) , the mapping

1l -t lR

f r----t j fg dJ.L

is a continuous linear form on H. It is represented in 1l by

Lg./-l = JK(., x)g(x )dJ.L(x) t

where g.J-t stands for the measure with density 9 with respect to u ,

Proof. Let 9 in L2(J.L) . Then the mapping

E -t 1l

x r----t g(x)K( ., x )

is defined JL-almost everywhere, measurable and Bochner integrable byAssumption (H2) . Thus

Jtci. ,x)g(x) dJ.L(x)

does exist and belongs to H. By the properties of Bochner integral wehave, for any f in 1l

< I, JK(., x)g(x) dJ.L(x) >11.= J< It K( ., x)g(x) > dJ.L(x) = JIgdJ.L

and the theorem follows.

Remark The continuity of the linear form

1l -t lR

I r----t jfg dJ.L=<f,g>£2(/-l)

226 RKHS IN PROBABILITY AND STATISTICS

is an immediate consequence of Theorem 104. By Lemma 10 this con­tinuous linear form is represented in 1l by the function

t t-t JK(t, x)g(x) dJ-L(x).

Hence we can conclude that the weak integral

f K(., x)g(x) dJ-L(x)

exists. Then Assumption (H2 ) has to be invoked to obtain strong inte­grability.We are now in a position to describe the Schwartz kernel of H,

THEOREM 106 The mapping

N : L2(J-L)t-t 1l

g t-t N(g) = Ig./l- =JK(. , x)g(x) dJ-L(x)

is the Schwart z kernel of 1l considered as hilbertian subspace of £2 (J-L) .

Proof. The kernel L of 1l is characterized by

From Theorem 105 it follows that

thus L =N. •It is worth noting that the restriction to 1l of the Schwartz kernel Nis the covariance operator CK( .,Y) of the 1l-valued random variableK(. ,Y). To see this, let f and g be two elements of 1l. Then we have

< CK( .,Y) (g), f >7-{ = E(f(Y)g(Y)) = Jfg dJ-L

< Ig ./l-' f >7-{=< N(g) , f > 1l

and therefore

Vg E 1l ,

See Exercise 7.In the particular case where the inner product of 1l coincides with the

Measures and random measures 227

inner product of £2(/.t) , I g . /1- is the orthogonal projection II1i(g) of thefunction 9 on 1i. It is easily seen by writing

I g . /1- JK(., x)g(x) df.l(x)

= JK(., x) II1i (g)(x) df.l(x)

= II1i(g).

This situation is encountered when unknown functions are estimated byorthogonal functions methods.The norm of I g . /1- can be computed as an integral of the kernel as statedin the following theorem.

THEOREM 107 For any 9 in £2(f.l) K is g.f.l Q9 g.f.l integrable and

Proof. Let (x, y) E E 2. The integrability of K with respect to g.f.l Q9 g.f.lfollows from the inequality

IK(x, y)g(x)g(y)1 ~ IIK(., x)IIIIK(., y)lIlg(x)llg(y)l·

Now,

l/.1"g ./1-11 2 = < JK(., x) g(x) df.l(x),JK(., y) g(y) df.l(Y) '>«

JK (g Q9g)d(f.l Q9f.l)

by the properties of the inner product and integrals and Fubini theo­rem. When K is bounded the condition in (H2 ) is satisfied for any f.lin the space M of bounded measures on E. Thus for any real boundedmeasurable function 9 on E one can define a linear mapping

.1"[g] : M ---t 11.

f.l ~ I[g](f.l) = I g . /1- = JK(., x)g(x) df.l(x).

The case where 9 is identically the constant 1 on E provides undersuitable assumptions the embedding of M in 11. which is exploited inthe present chapter.

228 RKHS IN PROBABILITY AND STATISTICS

Since

9.1.2 ESTIMATION OF I~

Recall that the probability J1- is unknown and that we estimate itsrepresenter

LIJ, = JK(., x) dJ1-(x)

from a sequence (Yi)i~l of independent random variables with probabil­ity law J1- on E.The random variables K(. ,Yk) : (fl, A ,P) I--T (1£,B1/.) are integrable,independent and have the same distribution. Let , for n ~ 1,

n

s; = L K(., Yk)k=l

and

An = vn (~n - LIJ,) .

LIJ, =JK (.,Yk)dP = E(K( ., Yk))

we have, by the strong law of large numbers, almost sure convergenceof Sn/n towards LIJ, as n tends to infinity. Now, as K(., Y) is a secondorder 1£-valued variable one can prove by using the Hilbert space ver­sion of the Central Limit Theorem that the sequence (An)n>l convergesweakly to a centered gaussian variable (Ledoux and Talagrand, 1991).We give hereafter a direct proof of this result to illustrate the simplicityof calculations in RKHS and conditions of relative compactness.For n ~ 1, An will denote the law of probability of An on 1£.Let us first prove two lemmas.

LEMMA 22 An is a second order random variable and

Proof. An is a sum of second order random variables. Hence it is asecond order variable. As

we have

Measures and random measures

Expanding the inner product in the first sum we get

229

But

and

Therefore

x E 1£,

and the formula follows.

LEMMA 23 The sequence P'n)nEN* is relatively compact in Pr(1l) equippedwith the weak topology.

Proof. In the case where 1£ has a finite dimension, the conclusion followsfrom Lemma 22 and Tchebychev inequality:

Let e > O. For R large enough the closed ball B* (0, R) is a compact setwith An-measure greater than 1 - e, for any n ~ 1.In the case where 1£ has infinite dimension, a sufficient condition forrelative compactness is (Parthasarathy (1967) and Suquet (1994»):

supjrN(x) dAn (X) -70 as N -7 00n~l

where00

rN(x) = L < x, e; >~,i=N

and (ei)iEN is an orthonormal basis in 1£. Let n ~ 1. As An is a secondorder variable, < An' e, >~ is P-integrable for all i 2: 0 and

00

L E < An ' ei >~= E(/iAn /l 2) .

i= O

(4.15)

230

Now,

If k i= e

hence

RKHS IN PROBABILITY AND STATISTICS

J< An, ei >~ dP = J[ei(Y)- < IJ1-' e, >1£]2 dP = Var (ei(Y))

E(ei(y))2 - (IJ1-,ei?

The last quantity is independent of n so that E(rN(An ) ) is independentof n and tends to 0 with 1/N since it is the rest of order N in the seriesin (4.15). From Lemma 22 and 23 we get the following weak convergencetheorem.

THEOREM 108 The sequence (An)n>l of laws of probability of the ran­dom variables (An)n>l converges weakly to a centered gaussian probabil­ity on 1i with covariance function given by

C(J,g) = JfgdJl- JfdJl Jq du, (J,g) E 1i2.

Proof. For n ;:::: 1 the characteristic functional of An is denoted ~n' For9 in 1i we have

~n(g) = Jexp (i < f, 9 >1£) dAn(J)

E(exp(i < An,g >1£))

E (exp (In) It.9 (Y, l- <:[.,9 >1<])

[E (exp ()n[g(y)- < IJ1-' 9 >1£]) ) rLet <PZg be the characteristic function of the real variable

Since

Measures and mndom measures

Zg is a centered variable. Now,

E(Z;) = Var(g(Y)) = E(g(y)2)_ < t.;9 >~

therefore a Taylor series expansion at the point 0 yields

CPZg(t) = 1 _ Var(;(Y)) t 2 + o(t 2 )

and

\ ( ) _ [ (_1)]n _( _var(g(y)))n (~)An 9 - 'PZg rz; - 1 2 + 0 .

yn n n

Hence

231

lim >'n(g) = exp (_ var(g(y))) .n~oo 2

Applying Lemma 2.1 of Parthasarathy (1967) we can conclude that thereexists AO in Pr(Ji) such that

An -+ AO weakly as n --t 00.

As the characteristic functional of AO is given by

Ao(g) = exp (-~var(g(y))), 9 E Ji ,

AO is a centered gaussian distribution on Ji. Let C be its covariancefunction, 5 be its covariance operator and let 1 and 9 in H: We have

C(j,g) =< 51,g >1i, < 5g,g >1i= Var(g(Y))

C(j,g) =< 5j,g »u

and1

< 51,g >1i= 2[< 5(j +g) ,1 +9 >1i - < 51, j >1i - < 5g,g >1i]'

Hence12[Var«(j +g)(Y)) - Var(j(Y)) - Var(g(Y))]

Cov(j(Y) , g(Y))

E(jg(Y)) - E(j(Y)) E(g(Y)),

and the conclusion follows. •

The covariance operator 5 of AO and the covariance operators of thevariables An and (K(. , Y) - IJ-!) are equal. Their kernel (in the sense ofDefinition 5 of Chapter 1) is the function associating with (s, t) in E 2

the real

C(K( ., t), K( ., s)) = E[K(t, Y)K(s, V)] - LJ-!(t)LJ-!(S)

= JK(t, x)K(s, x) dJL(x) - JK(t, x) dJL(x) JK(s,x) dJL(x) .

232 RKHS IN PROBABILITY AND STATISTICS

9.2. CONVERGENCE OF RANDOMMEASURES

After the works by Bosq (1979) and Berlinet (1980), Suquet (1986)presented a general framework to define and study random measures asRKHS valued random variables. A first gain over the classical theory isthat there is no need to define a priori a topology on E. The counterpartis that we need an inner product on the set M of signed measures on(E, n or equivalently a mapping from M into some RKHS 1£ of func­tions on E. The hilbertian construction of random measures carried outby Suquet needs a metric on the set E. Three cases are distinguishedin his paper according to whether E is compact, locally compact orseparable. In the classical theory E is usually supposed to be locallycompact with countable basis (Kallenberg, 1983). In this subsection wewill limit ourselves to the case where E is a compact metric space andT is its Borel a-algebra and give one theorem about convergence in law.For weaker conditions and convergence with respect to other stochasticmodes the reader is referred to Suquet (1993).As before we consider a sequence (ei) of measurable functions charac­terizing signed measures on (E, n (Definition 35) and satisfying

00

2: Ileill~ < 00.

i = O

As E is a compact space, it is always possible to build such a sequencefrom a sequence that is total in the separable space of continuous func­tions on E.Defining the function K on E2 by

00

K(x, y) =I: ei(x)ei(Y)i = O

we can define an inner product < .,. >M on M by setting

< 11, v >M= / K d(ll Q9 v) = to (/ e; dll) (/ ei dV) .

The function K is the reproducing kernel of a space 1£, the mapping

I: M-+1£

I-l t---t / K(., y) dl-l(Y)

is an isometry from M onto I(M) and we have

Vf E 1£, VIl E M , < f ,I(Il) >M= Jf du,

Measures and random measures 233

I(M) is dense in H and the topology defined by < ., . >M coincides withthe weak topology. The sets I(M) and I(M+) , M+ denoting the set ofbounded positive measures on E , belong to the a-algebra B1£ of H, Thislast property is not true in the general case. It is of great importancefor it makes the following definition possible.

DEFINITIO N 36 A random measure (respectively a posit ive random mea­sure) on (E,7) based on a probability space (0, A , P) is a random vari­able defined on (0, A, P) and taking almost surely its values in I(M)(respectively I(M+)) equipped with the a-algebra induced by B1l.

It follows from Corollary 12 that a variable /-l(') defined on (0, A,P)taking almost surely its values in I(M) is a random measure if and onlyif for any y E E the function

defines a real random variable on (0, A, P). This last condition is equiv­alent to the measurability of

for any element f of 1l. Indeed, in the present setting the role of therandom variables {J f d/-l(') : f E 1l} is similar to the role played by thevariables {/-l( ')(A) : A E C} in t he classical t heory of random measures,C being some subset of T',Now, the notion of convergence of a sequence of random measures, withrespect to some stochastic mode (almost surely, in probability, in law)can be easily derived from the same notion of convergence of Hilbertvalued random variables. But rare are the theorems giving a characteri­zation of convergence by means of convergence of the random variables{J f dJ-L(') : f E 1l} without an additional condition of compactness. Wecite the following theorem under the hypothesis made in this subsection.

THEOREM 109 Let (J-L~») be a sequence of positive measures and J-L(') be

a positive measure. Then (J-Lk») converges in law to /-l(') if and only if for

any f in H. the sequence of real random variables (J f dJ-L~») converges

in law to J f dJ-L( ·).

See Suquet (1993) for the proof and other modes of convergence.

234 RKHS IN PROBABILITY AND STATISTICS

10. EXERCISES1 From Guilbart (1978a).

Let E = [0,0.5] , 'T be its Borel a-algebra and M be t he set of signedmeasures on (E, T) , as in the moments example (Subsection 1.3).By the Lebesgue decomposition Theorem, any element J.L of M canbe written

J.L = S(J.L) + s(J.L)

where s(J.L) is a measure with finite or countable support and S(p) isa measure giving the mass 0 to any singleton. Consider the mapping

«·,·»M:

so that we can write

M xM -+ R

(J.L, v) ~ < J.L, u > + < S(J.L) , s(v) >

00 00

« J.L, u > M= L J.LilIi +L S(J.L)iS(V)ii=O i=O

where

J.Li =JxidJ.L(x)

denotes the moment of order i of the measure J.L.Prove that « .,. »M is an inner product on M but that

II J.LII~ # J«bx , by» d(J.L 0 J.L)(x, y)

whenever s(J.L) = 0 and S(J.L ) # O.Prove that M o is not dense in M.

2 From Guilbart (1978a).Let (E ,T) be a measurable space and J( be the set of bounded mea­surable reproducing kernel on E x E such that the formula

< J.L, v >K=JKd(J.L 0 v)

defines an inner product on the set M of signed measures on E.Let J.Ll, ... , J.Ln be n elements of M linearly independent, M n be thesubspace of M spanned by J.Ll, ... , J.Ln and 11K be the projection ontoM n in the sense of < J.L, v >K . The set J( is endowed with theuniform metric

du(Kl , K 2 ) = sup IKl (x, y) - K 2(x ,y)\(x,y)EE xE

Measures and random measures

1) Show that, for any v in M, the mapping

235

is continuous.2) Let K be a subset of IC such that for any K in K, the matrix

has its determinant lower bounded by some constant a > O. _Show that, for any v in M, the restriction of the mapping 7r'1I to IC isLipschitz continuous.

3) Let 11 .11 be any norm on M n and M be a part of M such that

sup I~I (E) < 00

J.LEM

where I~I = ~+ - ~- is the total variation of ~ (Rudin , 1975).Prove the existence of a constant C such that

sup IIIlKl(V) - IlK2(v)II::; C du(K1,K2) .

IIEM

4) Let (Kn)nEN be a sequence in Kconverging pointwise to K.Then

Vv EM,

3 Glivenko-Cantelli type theorem. From Guilbart (1977a).Let (E, T) be a measurable space and (Xi)i>1 be a sequence of in­

dependent random variables defined on a probability space (0, A, P) ,taking their values in (E, T) and having the same probability distri­bution Px1 • We denote by

1 nP« = - ~8xn L...J I

i=1

the empirical measure associated with XI, . .. , X n •

Suppose that Udi>1 is a sequence of bounded measurable real func­tions defined on (E,T) such that

sup Ifi(x)1 < 00

'>1 E 1/21_ ,xE a i

236 RKHS IN PROBABILITY AND STATISTICS

where (ai) is a sequence of positive numbers with finite sum

Let K be defined on E X E by

00

K(x , y) = ~ f i(x)fi(Y)'i= 1

Finally suppose that the formula

< p"v >M= f K d(p,® v)

defines an inner product on the space M of signed measures on(E,I).

I) Prove that the mapping

is measurable.2) Let e be a positive number and define the following "neighbor­

hood" of PXl in M:

v = {v: Ifs. du - f 9i dPX11 < s , Vi E {I, .. .,f} },where 91 , ... , ge are f bounded measurable real functions defined on(E ,I).Prove t hat c

P [\In 2: N, r; E V] 2: I - N '

where

c = SfMJ and M o = max sup 19i(X)I.€2 l~z~exEE

3) Prove that

whereIfi(x)1

M= sup ~i~l ,xEE ai

Measures and random measures

and no satisfies

4 From Guilbart (1977a).Let (O!i) be a sequence of positive numbers with finite sum

and let , for (x, y) in the set 1R+ of non negative real numbers,

00

K(x, y) = L O!iexp (-ix) exp (-iy).i= O

237

(4.16)

The set 1R+ is equipped with the Euclidean distance and its Borelo-algebra is denoted by 8;. Let M be the space of signed measureson (1R+, 8 lR+).

l)Prove that the mapping defined on M 2 by

(f-L, v) f-7< f-L, v>M= JK d(f-L 0 v)

is an inner product on M and that the topology associated with thisinner product coincides with the weak topology.

2)Prove the existence of a constant k > 0 such that

5 From Suquet (1994).Let E be a metric space and M be the space of signed measures on itsBorel o-algebra T. Let p be a positive measure on some measurablespace (U, U) and r a complex function defined on Ex U such that

~~~ Ilr(x, .)/b(U,p) < 00.

Define K on E x E by

K(x, y) = Jrex, u)r(y, u)dp(u).

1) Show that K is a bounded reprod ucing kernel.2) Denote by 1£ the Hilbert space of functions on E with re­

producing kernel K. Prove that a complex function h defined on E

238 RKHS IN PROBABILITY AND STATISTICS

belongs to 11. if and only if there exists an element cp of L2(p) suchthat

h(x) = Jcp(u)r(x, u) dp(u). (4.17)

3) Let n be the closed subspace of L2(U,p) spanned by the func­tions {r(x, .) : x E E}. Show that there is a unique element ip of nsatisfying (4.17). Denote it by cp(l) and prove that the mapping

cp : 11. ---t nf f-----t ip(I)

is an isometry.4) Let j.L be an element of M. Show that the function r(.,u) is

j.L-integrable except possibly for a set EfJ. of u such that p(EfJ.) = O.In other words r(., u) is j.L-integrable p-almost everywhere.

5) Now suppose that for any j.L in M

(J r(x,.) dj.L(x) = 0 p-a. s') ==} (j.L = 0).

Show that the mapping defined on M 2 by

(IL, v) f-----t< IL, v >M= JK d(j.L 0 v)

is an inner product on M.

6 From Suquet (1994).Under the assumptions of Subsection 9.2 prove that

1) for any random measure j.L( .) the function

defines a real random variable whenever f is continuous on E.2) for any random measure j.L( .), its total variation, its positive

part and its negative part are also random measures.

7 The notation and the assumptions are those of Subsection 9.1.1) Let v belong to the set M of bounded signed measures on E.

Prove that 'l1K is v-integrable and that the set

{J K(., t) dv(t) : v E M}

Measures and random measures

is a dense subset of H.2) Prove that the image ImN of the Schwartz kernel

N : L2 (J.L) ----+ L2 (J.L)

9 f.---t JK(., t)g(t)dJ.L(t)

239

is dense in 'H:3) Prove that an element j of H belongs to ImN if and only if the

linear form If : k f.---t< k, j >1i is continuous on 1i for the topologyinduced by the topology of L 2 (J-L ).

8 The notation is the same as in Subsection 9.1 but stronger assump­tions are made. E is supposed to be a compact topological space withBorel a-algebra T and the reproducing kernel K is supposed to becontinuous on E 2

• The measure J.L is supposed to have a support equalto E. Then the conditions in HI , H2 and H3 are clearly satisfied . AsK belongs to L2(J.L 0 J.L) the Schwartz kernel N (see Exercise 7) isa Hilbert-Schmidt operator. Therefore there exists an orthonormalbasis (hn)nEN of L2(J.L) such that each hn is an eigenfunction of Nassociated with an eigenvalue An and

00

Moreover the sequence (~bn)nENo where No = {nlAn -# O} is anorthonormal basis of 1i and

K(s, t) = L Anhn(s)hn(t) ,nENo

the last series converging in 1i

WEE K(., t) =LAo hn(t) hn)nEo

but also uniformly on E 2 by Mercer's theorem.1) Prove that an element f of 11 belongs to ImN if and only if

L < j,hn >~< 00.

nENo

2) Prove that the condition given above is equivalent to

'" < j, hn >;'2(1l)L..J A2 < 00.

nENo n

240 RKHS IN PROBABILITY AND STATISTICS

3) Let 8 1 (respectively 8 2) be the closed subspace of £2(J.l) spannedby (K(., S))sEE (respectively (hn)nENo)' Let 83 be the subspace of£2 (J.l) orthogonal to the null space of N. Prove that

81 = 82 = 83 •

9 Let K be a real reproducing kernel on a metric space E. Supposethat K is bounded, measurable and defines an inner product on theset M of signed measures on E. Prove that K has a unique supportequal to E (See Chapter 1, Subsection 4.3) for the definition of thesupport of a reproducing kernel).


Recommended