Chapter 4
MEASURES AND RANDOM MEASURES
1. INTRODUCTION
Since its foundation by Borel and Lebesgue around the year 1900 themodern theory of measure, generalizing the basic notions of length, areaand volume, has become one of the major fields in Pure and AppliedMathematics. In all human activities one collects measurements subjectto variability and leading to the classical concepts of Probability andMathematical Statistics that can modelize "observations" : random variables, samples, point processes. All of them belong to Measure Theory.The study of point processes and more generally of random measureshas recently known a large development. It requires sophisticated mathematical tools. The very definition of random measures raises delicateproblems, just as the need for a notion of closeness between them.Here we will first adopt a naive point of view, starting with Dirac measures and showing how reproducing kernels can be used to representmeasures in functional spaces (Section 1). Then we will exploit the embedding of classes of measures in RKHS (Section 6 and 7) to define innerproducts on sets of measures. Finally we will show how random measures can be treated as random variables taking their values in RKHS(Section 9). Applications will be given to empirical and Donsker measures and to Berry-Esseen bounds (Section 8) . The sections 2, 3, 4 and5 deal with properties of variables taking their values in RKHS (measurability, gaussian measure, weak convergence in the set of probabilitiesover a RKHS, integrability) .
A. Berlinet et al., Reproducing Kernel Hilbert Spaces In Probability and Statistics
© Springer Science+Business Media New York 2004
186 RKHS IN PROBABILITY AND STATISTICS
1.1. DIRAC MEASURESLet E be a fixed non-empty set and 'T be a a-algebra of subsets of
E. By a measure we mean a countably additive function J.L from T' intoC such that J.L(0) = O. This means that for any subset I of N and anyfamily {Ai, i E I} of pairwise disjoint elements of 'T one has
At some places we consider set functions satisfying the above propertyonly with finite index set I. We call them finitely additive measures.The simplest example of measure on (E, n is the Dirac measure bx
defined for z in E by
b (A) _ { 1 if x E Ax - 0 if x ¢ A
where A belongs to T', Any measurable complex function f on E isintegrable with respect to 8x and we have
Jf(t) dbx(t) = f(x).
The Dirac measure bx is a probability measure on (E, n assigning themass 1 to the set {x}. When f belongs to some Hilbert space H offunctions on E with reproducing kernel K , integrating f with respect tobx or computing < t, K(., x) > gives the same result f(x), value of fatthe point x . The mapping
by f---T K(., y)
embeds in H the set of Dirac measures on E and, if the function K(x, .)is measurable, the value K(x , y) of the function K( ., y) at the point xcan be written as the integral
JK(x, t) dby(t). (4.1)
More generally if Xl, .. • , X n are n distinct points in E and ai , . . . , an aren non null real numbers, a linear combination
of Dirac measures puts the mass a i (positive or negative) at the point Xi.
Such a linear combination can assign positive, negative or null measures
Measures and random measures 187
to elements of T. It is called a signed measure. As al, . . . , an are alldifferent from 0, the support of the measure I:~=l ai 8X i is equal to thefinite set {Xl, ... , x n } . For any measurable function f one has
(
n ) n nJf d ~ai 8Xi =~ai f(Xi) =~ai eXj(J),
where eX i is the evaluation functional at the point Xi. This extends theprevious remark on Dirac measures and exhibits the connection betweenRKHS and measures with finite support. Actually any Hilbert space 'Hof functions on E with reproducing kernel K contains, as a dense subset ,the set 1lo of linear combinations
n
Lai K("Xi), n ~ 1, (al, ... ,an) E en, (Xl,""Xn) E En,i=1
with the property that, for any measurable f in H,
< j ,tai K("Xi) > = tai j(Xi) = Jj du,i = l i=1
wheren
jt = :L: a; 8X i
i=1
is the discrete measure putting the mass ai at the point Xi. In Section 2we will see that any element of 1l is measurable whenever the kernel Kis measurable.Thus
the dense subset 1lo can be seen as the set of representers in 1l ofmeasures on E with finite support.
The mappingn n
L a; 8X i 1----+ L a, K(., Xi)i=1 i=1
embeds in 1l the set of measures on E with finite support and , with thesame measurability condition on K(x ,.) as above , the value
n
Lai K(x , Xi)i=1
188 RKHS IN PROBABILITY AND STATISTICS
of the function L::i=l a ; K(., Xi) at the point X can be written as theintegral
JK(x,l) d (~a; ox;) (I) =JK(x,l) d~(I) . (4.2)
Now, a measure J1 on E being given, suppose that all integrals
IJl(x) = JK(x, t) dJ1(t), x E E
exist and that they define a function IJl which belongs to 1£. Generalizing(4.1) and (4.2) we can define t he representer in 1£ of the measure J1 asbeing equal to the function
In this way, if all functions K(x, .), x E E are measurable, we can definea mapping
I M -+ 1£
J1 f---7 IJl =JK(., t) dJ1(t) ,
where M is the set of signed measures J1 for which the function IJl existsand belongs to 1£. The set M always contains the set M o of measureswith finite support.Properties of integrals of ll-valued mappings will be described in Section 5. At this stage assume that inner product and integrals can beexchanged. Then we can write formally, for J1 and v in M,
< IJl,Iv >11. < JK(., t) dJ1(t) ,JK(., s) dv(s) >11.
= J< K(. , t), JK(., s) dv(s) >11. dJ1(t)
J(J < K(., t) , K(. , s) >11. dV(S)) dJ1(t)
J(J K(s , t)dV(S)) dJ1(t).
Finally, assuming the validity of the Fubini formula for product measures, one gets
Measures and random measures 189
the inner product of the representers in 1£ of two measures J1- and t/
is equal to the integral of the kernel of 1£ with respect to theproduct measure J1- 0 t/,
Under suitable assumptions the mapping I will therefore allow us toapply RKHS methods to measures .Some of the first studies considering inner products on sets of measures and applications in Probability and Statistics were carried out inthe years 1975-1980 at the University of Lille under the leadership ofBosq . Guilbart (1977a, 1977b, 1978a, 1978b, 1979) studied the relat ionships between reproducing kernels and inner products on t he spaceM of signed measures on a measurable space. He exploited the embedding of M in a RKHS and characterized the inner products inducingthe weak topology on sets of measures. Guilbart also proved the continuity of projections with respect to the reproducing kernel defining theinner product. He proved a Glivenko-Cantelli theorem that he appliedto estimation and hypothesis testing. Berlinet (1980a, 1980b) studiedthe weak convergence in the set of probabilities on a RKHS , the measurability and the integrability of RKHS valued variables. The firstapplications to random measures were given by Bosq (1979) who considered the prediction of a RKHS valued variable and by Berlinet whoconsidered the representers in RKHS of random measures with finitesupport and proved a Central Limit Theorem and strong approximation results . These last resul ts extended those of Ibero (1979 , 1980) whohad considered spaces of Schwartz distributions and sets of differentiablefunctions on compact sets. Then, the application of RKHS methods tothe study of general random measures was started by Suquet, supervisedat the beginning by Jacob. Suquet (1990 , 1993) used sequences of functions characterizing measures to study inner products on sets of signedmeasures and convergence of random measures. He studied particularcases in which signed measures are represented in the RKHS associatedwith Brownian motion (1993) and proved Berry-Esseen type theorems.Suquet and Oliveira (1994 , 1995) applied RKHS methods to prove invariance principles under positive dependence and for non stationarymixing variables. Bensaid (1992a, 1992b) exploited the embedding thea-
190 RKHS IN PROBABILITY AND STATISTICS
rems in the study of point processesand their nonparametric prediction.We will review basic results and applications. For further developmentsthe reader is referred to the above references.
1.2. GENERAL APPROACHThe definition of the mapping I (embedding Minto 1l) in the above
subsection is a consequence of the simple particular property that integrating with respect to a Dirac measure is equivalent to evaluating atsome point. In the present subsection we will arrive at a similar definition through a general approach which can be used in any context wherea RKHS framework has to be designed. This approach is implied in theIntroduction to Chapter 1. It is based on the fact t hat every Hilbertspace is isometric to some space f 2(X ) (space of square summable sequences indexed by the set X , see Chapter 1). So when a probleminvolves elements of some abstract set S the first attempt to shift it in ahilbertian framework consists in associating an element of a space £2 (X)to any element of the originally given set S . If any element s of S ischaracterized by a family {so. Q E X} of complex numbers satisfying
the mapping
s ~ f 2(X)
S l----7 {so, Q EX}
defines the natural embedding of S into f2(X). Let us see now how toapply this general methodology to sets of measures.A measure J.L on (E ,1) is characterized by the set of its values
{J.L(A ) : A E T} = {J lA dJ.L : A E T}
or more generally by a set of integrals
where :F is some family of functions.For instance a probability measure P on Rd is uniquely determined byits characteristic function
¢>p(t) = r ei<t ,x> dP(x),JJRd
Measures and random measures 191
(4.3)
or equivalently by the set of integrals of the family
F = { ei<t,.> : t E JRd} .
Sets of power functions, of continuous bounded functions and many otherfamilies F can be considered to study measures and their closeness orconvergence.Suppose that to deal with some problem related to a set M of signedmeasures on a measurable space (E ,n we can consider a set F of complex functions on E and the families of integrals
III = {J f dJL : f E F }
where JL belongs to M. If, for any JL in M, we have
L IJ f dJLr < 00,IEF
we can work in t he Hilbert space £2(.1"). The inner product of III and I v
in this space is given by
< Ill' t; > £2(F)= L (J f dJL) (J7dV) .IEF
Assuming again that we may apply the Fubini theorem and exchangesum and integral one gets
< IIl,!v > £2(F) L (J f @7 d(JL 0 v))JEF
= J(L f 0 7) d(JL 0 v).JEF
Here III and Iv are not functions on E . They are sequences of complexnumbers indexed by the class F or equivalently they can be consideredas functions on .1". Setting formally
K= ~f07JEF
we get through the general approach the same expression as in the abovesubsection
192 RKHS IN PROBABILITY AND STATISTICS
We know by Theorem 14 that Formula (4.3) holds true whenever :F canbe chosen as a complete orthonormal system in some separable RKHS'H with reproducing kernel K.Let us now illustrate the beginning of this section by a simple example.
1.3. THE EXAMPLE OF MOMENTSLet E = [0,0,5], T be its Borel a-algebra and M be the set of signed
measures on (E, T) . Any element J.l of the set M is characterized by thesequence IIJ. = {J.li : i E N} where
J.li =l xidJ.l(x)
is the moment of order i of u: Here the class :F is equal to the set ofmonomials {xi: i E N}. As we have
Vi E N, 0 ~ J.li ~ TiJ.l(E),
the sequence IIJ. is in e2 (N), Identifying J.l and IIJ. we get , by using Fubinitheorem and exchanging sum and integral (the integrated functions arenonnegative) ,
For a in E and v = oa we have
Vi E N, Vi = ai
and
< J.l , s, >= r_1_ dJ.l(x) = L J.li ai,lE 1 - ax . ""I Ej~
As the sequence of moments I = {J.li , i EN}, the entire function
'PIJ.: E --+ R
x ~ 'P1J. (x ) = L J.li X i
iEN
Measures and random measures 193
characterizes the measure fl . From above it follows that the set of functions cI> = {cp~, fL E M} endowed with the inner product
< cP~, CPv >4>= L fLiViieN
induced by the inner product of £2(N) is a prehilbertian subspace withreproducing kernel
1K(x, y) = -- =< CPox' CPOy >4>=< Iox,Ioy > l2( N)=< 8x , 8y >M .
1- xy
In the present context the distance of two signed measures on E is equalto the £2-distance of their sequences of moments.Let us now summarize the first section of the present chapter. We haveseen how a set of signed measures on a measurable set (E, T) can beembedded in a RKHS 1{ of functions on E with reproducing kernel K.Under suitable assumptions we have the following formula
a particular case of which is
(4.4)
(x , y) E E x E. (4.5)
Formula (4.4) was derived in a formal way to give t he reader a first ideaof the application of RKHS methods to measure theory. We now have aset of problems to analyze more precisely.
1) Under what conditions are our formal derivations valid? (Otherwise stated, under what conditions can an inner product < .,. >M bedefined on a set M of signed measures?)
2) How does the inner product depend on the reproducing kernel?3) Is any inner product on a set of measures of the kind defined
above?4) What can be the limit of a sequence of measures converging in
the sense of the inner product?5) What are the relationships between the topology induced on M
by the inner product and other topologies on M such as the weak topology?
6) What kind of results can be obtained through RKHS methods?We will deal with these problems in Sections 6 and 7. Applications willbe given in Section 8.Before that let us give some basic results on measurability and integrability of RKHS-valued variables.
194 RKHS IN PROBABILITY AND STATISTICS
2. MEASURABILITY OF RKHS-VALUEDVARIABLES
As seen in Section 1, to embed a set M of measures on E into aRKHS 1£ of functions the measurability of all functions K(., x), x E E,is required. Therefore to study random measures as random elements ofH one needs measurability criteria for 1l-valued variables. The presentsection deals with questions of measurability.As above T is a a-algebra on E, 110 is the subspace of 1l spanned by(K(., t))tEE and B1i is the Borel a-algebra of 1l (B1i is generated by theopen sets). We will suppose in general that 1£ is a separable space. Forany 9 in 1l, < .,9 > will denote the continuous linear mapping
1l ---+ Cip 1----7 < <p, 9 > .
First of all we characterize the random variables with values in (1l, B1i)'
THEOREM 88 The Borel a-algebra of a separable RKHS is generatedby the evaluation functionals.
Proof. Let B be the a-algebra generated by (et)tEE. The continuity ofthe evaluation functionals implies Be B1i. Now, for any f E H we have
Ilfll = sup I < f, 9 > I= sup I < t, 9 > I,I l g lI ~ l IIgll~l ;gE'Do
where Va is a countable subset of 1£0 dense in 1l (see Lemma 11). If9 E Va the function < .,9 > is B - B]R measurable since 9 can be writtenas a linear combination of evaluation functionals. Let r E R,+ and fa E 1£
{f : Ilf - fall :s r} = n {f: I < f - fa, 9 > I :s r}IIg119; gEVo
The above right-hand side is a countable intersection of elements ofB. Therefore it belongs to B. It follows that any closed ball is in B.1l is separable thus any open set is a countable union of closed balls.Consequently B1i C B. •It is worth noting that the proof of Theorem 88 is valid for any totalsequence.
THEOREM 89 Let (9i)iEI be a total sequence in a separable Hilbert space1£. Then the Borel a-algebra of 1l is generated by the linear forms
«.,gi» iE[·
Proof. One has to prove that the space 1£~ spanned by the sequence(gi)iEI contains a countable subset 7)~ which is dense in 1£. This is doneby mimicking the proof of Theorem 88. •
Measures and random measures 195
From Theorem 88 it follows that B1i is the set of intersections 1-£ n Bwhere B runs through the product o-algebra of CE (generated by theevaluation functionals on eE ) and that we have the following corollaries.
COROLLARY 12 Let (0, A) be some measurable space. A mapping
X : (0 , A) ---t (1-£ , B1i)
w ~ X(w, .)
is measurable if and only if for any tEE the function
X(., t) =< X, K( ., t) >
is a complex random variable.
COROLLARY 13 A random function on a measurable space (0, A) takingits values in (1-£, B1i) is equivalent to a stochastic process (XtheE on(0, A) with trajectories in 1-£ .
Let 'lIK denote the mapping
E ---t 1£
t ~ 'lIK(t) = K( ., t).
The following theorem states the equivalence between the measurabilityof K, as function of two variables, the measurability of 'lIK and themeasurability of all elements of 1£.
THEOREM 90 The following four conditions are equivalent.a) K is T (8) T - Be measurable.b) 'lIK is measurable.c) 'tit E E, K(., t) is measurable.d) every element of 1£ is measurable.
Proof. We will prove the sequence of implications a) =:} b) =:} c) =:}
d) =:} a) .a) =:} b) As K is T 0 T - Be measurable, for any fixed s , the
mapping
t~ K(s , t) =< K(., t), K(., s) > (4.6)
is also measurable (Rudin , 1975) .b) =:} c) Same argument as above.c) =:} d) From assumption c) any element of 1£0 is a measurable
function . Being a pointwise limit of a sequence of measurable functionsany element of 1£ is also measurable.
196 RKHS IN PROBABILITY AND STATISTICS
d) =} a) Let (ei) ieN be an orthonormal basis of 1i. From Theorem14 we have,
K(s, t) = L ei(s) ei(t)ieN
As every function ei is measurable, K is T 0 T - Be measurable. •
3. GAUSSIAN MEASURE ON RKHS3.1. GAUSSIAN MEASURE AND GAUSSIAN
PROCESSCorollary 13 puts forward the equivalence between random variableswith values in RKHS and stochastic processes with trajectories in sucha space. Another consequence of the continuity of evaluation functionalson RKHS is the equivalence in those spaces between the notion of gaussian process and the notion of gaussian measure. Rajput and Cambanis(1972) have shown this equivalence in some functional spaces (spaces ofcontinuous functions, of absolutely continuous functions, L2-spaces). Asthey pointed out their results extend to any space of functions on whichthe evaluation functionals are continuous. We give hereafter the prooffor RKHS. For the correspondence between cylinder set measures andrandom functions in a more general setting see Mourier (1965).
DEFINITION 28 A stochastic process (XtheE is said to be gaussian ifany finite linear combination of the real variables X t, tEE, is a realgaussian random variable.
DEFINITION 29 A probability measure )l on (Ji , B1/.) is said to be gaussian if for any g E 'H the linear form < .,g > is a real gaussian randomvariable on (11. , B1/. ' /1) .
The equivalence between gaussian process and gaussian measure is formulated in the following theorem.
THEOREM 91a) If (XtheE is a gaussian process defined on (0, A, P) with trajec
tories
E --7 IR
t f----7 Xt(w) = X(w,t) , wEn,
belonging to Ji then the measure P X-l induced on (Ji, B1/.) by the random variable
X : (n,A, P) --7 (Ji, B1/.)
w f----7 X.(w) = X(w ,.)
Measures and random measures 197
is a gaussian measure.b) Conversely, for any gaussian measure fL on (11., B1l) there exists
a probability space (0, A, P) on which can be defined a gaussian process(Xt}tEE with trajectories in 11. such that PX-l = fl.
Proof.a) We have to prove that for any 9 E 11. the real random variable
< .,g >: (1I.,B1l,PX- l) -+ (lR,BIR)
f f---t < I, 9 >
is gaussian. If B E BIR
PX- 1 « .,g >E B) = P« X ,g >E B)
thus we have to prove that < X, 9 > is gaussian. Since 11. = 11.0 thefunction 9 is the limit in 11. of a sequence of functions
ins«= L ai K( .,til,
i = O
n EN.
Thusin
< X, 9 >= lim '"" ai Xt~n--+oo~ I
i = O
and < X, 9 > is gaussian, as the limit (everywhere) of a sequence ofgaussian random variables.
b) For tEE and f E 1i let
Xt(J) = f(t).
Let k E N*, (tl, . .. , tk) E E k, (al, ... , ak) E lR k. The mapping
k
L a; x., : (1i, B1l, fL) -+ (JR, BlR)i= l
is the continuous linear form on 1i represented by
k
L a, K(. , ti)'i= l
Hence it is a gaussian real random variable. (Xt) tEE is a gaussian processwith trajectories in 11. such that X.(J) = f. If X is the associatedrandom function (Corollary 13) we have
fLX-1(B) = fL{9 : X .(g) E B} = fL(B)
198
and
RKHS IN PROBABILITY AND STATISTICS
X - IJ.L =J.L,
so that the process (XtheE on ('H, 81/., J.L) is appropriate to get the conclusion. •
3.2. CONSTRUCTION OF GAUSSIANMEASURES
To construct a gaussian measure on a separable Hilbert space 1i onecan use a general method which consists in choosing an orthonormalbasis (Ii) in H, a sequence of real numbers (0";) in f2(N), a sequence (~i)
of independent real random variables on some probability space (Q, A, P)with the same N(O, 1) distribution and to set
X : (Q,A , P) -t (1i,81/.)00
w ~ X(w) = L a, f,(w)j;.i=1
X is well defined since the series converges almost surely. The measure PX-l is a centered gaussian measure on (1i, 81/.). Conversely, anycentered gaussian measure on ('H ,81/.) is of this kind (Lifshits (1995),Example 5 p. 81).If 1i is a RKHS, X defines a gaussian process.However, as mentioned in Chapter 3, Subsection 4.4 , putting conditionson normality of continuous linear forms can lead to build finitely additive measures on the ring of cylinder sets which cannot be extendedto countably additive measures on the whole space (1i, 81/.). A classicalexample is the Gauss measure i.e. the cylindrical measure assigning thedistribution N(O, IIx1l2) to the linear form represented by x. This kindof difficulty gave rise to the theory of abstract Wiener spaces for whichthe reader is referred to Gross (1965, 1970) for a basic exposition.Another important feature of gaussian measures is defined through thenotion of "kernel" of such a measure. For the definition of the kernelof a measure and developments in the infinite dimensional setting thereader is referred to Lifshits (1995).
To study random variables with values in a RKHS and their convergenceone needs some background on weak convergence in the set of probabilitymeasures on a RKHS. This is the aim of the next section. We denote byPr('H) the set of probability measures on ('H,81/.) '
Measures and random measures 199
4. WEAK CONVERGENCE IN PR(ll)Recall the definition of weak convergence of measures on a topological
space.
DEFINITION 30 Let £ be a topological space and B its Borel a-algebra.Let M be the set of signed measures on (£, B) and let 11 EM . Asequence (I1n)nEN of elements of M is said to be weakly convergent to 11as n ---+ 00 if and only if
Jf dl1n ---+ Jf dp.
for any bounded continuous real function f defined on E .
The sets of finite dimension playa key role in the study of measures onfunctional spaces. They are defined as follows.
DEFINITION 31 For (tl "'" tk) E E k,1rtl ,...•tk denotes the mapping
1i ---+ lRk
f ~ (J(td ,·· ·, f(tk)).
A subset of 1i is a set of finite dimension (or a cylinder) if and only ifit can be written as 1rt;.~..•,t k (B), where k E N* and B E BJRk.
THEOREM 92 The class F of finite dimensional sets is a determiningclass, i.e. if two probabilities take the same values on F, they are equal.
Proof. It is enough to prove thata) F is a boolean algebrab) The a-algebra a(F) generated by F is equal to Bu.
a) is a straightforward consequence of the definition of F .To prove b) first remark that the mappings 1rtl ... ..tk are continuous on 1ibecause the evaluation functionals have this property. Hence F C Buand a(F) C Bu.As in the proof of Theorem 88 one can write any closed ball as a countable intersection of sets of the form
{f; I < f - fo , g > I ~ r}
where g E 1)0. Such a set is of finite dimension: writing
k
g = I>i K( ., ti)i=l
200
we have
RKHS IN PROBABILITY AND STATISTICS
k
{J j I < 1- 10 ,g > I ~ r} = {J; IL ai < 1- 10, K(., ti) > I ~ r}i= l
-1 (B)= 1f t 1 , .• .,tk
where B is equal to ~-1 ([0, r]) and ~ is the measurable mapping
IRk ~ IRk
(a1l .. · ,ak) 1----7 L:adai - IO(ti))i= l
Thus the closed balls are in 0-(.1") and 81l C 0-(.1"). •Remark In general the class .1" does not determine the convergence:one may have
VA EF
for a sequence (Pn)nEN of Pr(l£) which does not converge weakly to P.Let us illustrate this remark by the following example.ExampleLet 1l = H 2([0 , 1]) = {J : I and I' are absolutely continuous on [0,1] andI" E £2 ([0 , I]) endowed with the inner prod uct
< l .s >= I (O)g(O) + 1' (O )g'(O)+ < I" ,g" >£2([0,1]) .
1i is a RKHS with kernel K defined on [0, IF by
if s < tif t ~ s
(see Chapter 6 and 7).For n E N* let In be the function of 'H defined by
if x E [0,2/n]if x E [2/n , 1] .
(1) 27and max In(x) = In - =-.xE[O,1J 2n 16
For any n E N, In satisfies
In(O) = In (~) = I~ (~) = I~ (~) = I~ (2~) = 0,In (~) = 1
Measures and mndom measures
1 . 5
1 . 25
0.75
0.5
0 . 25
0.2 0.4 0. 6 0 . 8
Figure 4.1: The functions fl, 12, 13, flO over [0,1].
2.2
201
2
1 .8
1 .6
1 .4
1 .2
0 .2 0 . 4 0.6 0 .8 1
Figure 4.2: The functions K(0.8, t) and K(t, t) over [0,1].
K is a bounded kernel and7
max (K(t, t)) = tE[O,l] 3
Hence, by the Cauchy-Schwarz inequality, convergence in 1£ implies uniform convergence on [0,1] :
'rig E 1£, 'rIt E [0, I], Ig(t)1 = I < s,K(., t) > I
202 RKHS IN PROBABILITY AND STATISTICS
< Ilgll sup (K(t, t))1/2tE[O,I)
< /Illgll
Clearly the sequence (fn)nEN* does not converge uniformly to the nullfunction on [0,1]. Hence it does not converge to the null function in 11. .Therefore (8in) does not converge weakly to 80 • However, if (h , . . . , tk) ::f:(0, . .. ,0) and (2/n) ~ min{tl, .. . ,tk}, we have
8 (1r- 1 (B)) _ { 1i n tl ,...,tk - 0
thus for n large enough we have
if (0, ,0) E Bif (0, ,0) ¢ B
8/ (1rt-1 t (B)) = 80 (1rt-
1 t (B)) .n 1,..., k 1,···, k
The present example also shows that a sequence offunctions in a RKHS11. can converge pointwise to a function of 11. without converging in thenorm sense.
4.1. WEAK CONVERGENCE CRITERIONAs the evaluation functionals on RKHS are continuous one can get a
similar criterion of weak convergence as in C([O, 1]).
THEOREM 93 Let P and {Pn : n E N} be elements of Pr(1I.).The sequence (Pn)nEN converges weakly to P if and only if
a) (Pn)nEN is tight:
VE> 0, 3K compact such that Vn E N, Pn(K) > 1 - E
b) Vk E N*, V(tl, ... ,tk) E E k, Pn1r~~...,tk ==::;. P1r~~...,tk'
Proof. This result is well known for the space C([0, 1]) (Billingsley,1968). It can be proved for 1l in the same way: a) is equivalent, byProhorov's theorem, to the relative compacity of the sequence (Pn)nEN'Therefore Theorem 93 is a consequence of the continuity of the applications 1rtl ,...,tk and of Theorem 92.
5. INTEGRATION OF 1-l-VALUED RANDOMVARIABLES
In this section we review basic definitions and properties about integration of RKHS valued random variables. One of the most useful resultis the possibility of interchanging integrals and continuous linear forms.For detailed exposition see Hille and Phillips (1957) .
Measures and random measures 203
5.1. NOTATION. DEFINITIONSLet X (n, A, P) --t (1£, B1l) be a random variable. If Z is a realfunction defined on n, if A E A and if ZIA is P-integrable, we denote
the integral of Z on A and, if Z is integrable,
E(Z) = En(Z).
DEFINITION 32 (WEAK INTEGRAL) Let A E A. X is weakly integrableon A if
vf E 1£ , < x, f > is integrable on A
and if there exists XOA E 1£ such that
Vf E 1£, EA« X,f » =<x~ ,f >.
°XA is called the weak integral of X on A and noted
X is said to be Pettis-integrable if it is weakly integrable on n.
From the definition it is clear that
Vf E 1£, i < X, f > dP =<x~, f >=< t X dP, f > .
DEFINITION 33 (STRONG INTEGRAL) Let A E A. X is strongly integrable on A if IIXII is in tegrable on A. X is said to be Bochner-integrableif it is strongly integrable on n (then X is strongly integrable on any
element of A).
As I < X, f > I $ IIXllllfll, if X is strongly integrable on A, there existsXA in 1£,
XA = i xdP,
such thatVf E 1£, EA« x .t » =< xA,f >.
Thus X is weakly integrable on A and XOA = x A. For a 1-l-valued randomvariable, Bochner integrability implies Pettis integrability (with equalityof the integrals) but the converse is not true, as shown in the following
204 RKHS IN PROBABILITY AND STATISTICS
example.Example Let (Q, A , P) = (N*, P(N*), P) with P(n) =2-n , n E N* andlet 1{ = [2 (N). As seen in Chapter 1 the reproducing kernel of f2(N) isgiven by
K(i ,j) = c5ij .
Let
X (N*,P(N*), P) -+ (f 2(N), B~2(N))
t f---7 X(i) = ~ii) K(., i)
where S is the element of f2(N) such that
Vi E N*,
• X is clearly measurable and we have
1Si = -;-.
t
and00 00 1L IIX(i)11 P(i) = L -;- = 00
i==1 i==1 t
Thus X is not strongly integrable.• Now let hE f2(N). We have
. 1. h(i)< h, X(z) >=< h, iP(i) K(. , z) >= iP(i)
and
r< h, X(i) > dP(i) = L h~i) =< h, SA >JA iEA
where SA is the orthogonal projection of S on the subspace of f2(N)spanned by the evaluations {ei : i E A}. It is given by
SA = L < s, K(., i) > K(. , i) = L ~K(., i) .iEA iEA t
It follows that X is weakly integrable.
Measures and random measures
5.2. INTEGRABILITY OF X AND OF{XT : TEE}.
If the random variable
X : (11, A, P) ---t (ti, B1i)
is (weakly or strongly) integrable, any real random variable
205
x, lA =< X , K(. , t) > lA,
is also integrable and the function
A EA,
E ---t R
t I---t EA(Xt ) = Lx, dP
is the integral of X over A . It follows that the function EA(X.) belongsto 11. .Conversely, if any X, is integrable, X (which is therefore measurable) isnot necessarily weakly integrable. As shown in the following example,the function E(X.) may not belong to H.As in the example of the above subsection take H = l2(N). Let
X : (N*, P(N*), P) ---t (£2(N) ,Be2(N ))
z I---t X(i) = 2iK(., i)with P(n) = 2-n , n E N*
Let j E N*. The random function < X , K (., j) > transforms the integeri either into 0 if i =I j or into 2j if i = j. Therefore
E« X ,K(.,j) » = 1.
Clearly the constant sequence {E« X,K(.,j) »: j E N*} does notbelong to ti and therefore X is not weakly integrable on N*.We will give in Theorem 96 a necessary and sufficient condition for theweak integrability of X . Let us begin with two theorems on linear formson 'H defined by integrals.
THEOREM 94 For any A E A the following conditions are equivalent .a) The mapping
206 RKHS IN PROBABILITY AND STATISTICS
belongs to 1£.b) The mapping
'PA 1£0 --t R
f t---7 EA < X, f >
is a continuous linear form on 1£0.If these conditions are satisfied then the representee of 'PAin 1£ is equalto EA.
Proof.a) ==} b)
Letn
f = I: ai K(., ti)i=1
be any element of 1£0. Then
n
EA < X,f >= EALaiXtii=1
n
= La;EAXti;=1n
= I: aiEA(ti) =< EA, f > .i=1
b) ==? a)It follows from Hahn-Banach Theorem that 'PA can be extended to acontinuous linear form 'l/JA over 1£ with the same norm as 'PA. Let fA bethe representer of'l/JA in 1£. We then have, for t in E,
fA(t) < fA, K(., t) >= 'l/JA(K(., t))
= 'PA(K(., t)) = EAXt
so that
THEOREM 95 Let A in A such that for any t in E, EAXt exists. Let
IA: 1£0 --t R+
f t---7 LI < X, f > IdP.
The following three conditions are equivalent.a) IA is continuous at O.b) IA is continuous.c) IA is Lipschitz continuous.
Measures and random measures
Proof. IA is well defined: if
n
f = L aiK (., ti)i=l
thenn
< X , f >= L aiXti is integrable on A .i = l
207
Clearly we have c) ==} b) ==} a). It remains to prove that c) is impliedby a).Suppose that IA is continuous at O. There exists 17 > 0 such that
Hence, if f i= 0,
(4.7)
Now, let f and 9 in 1lo. We have
ILA(J) - LA(g)1 lEA (I < X , f > I-I < X ,g > DI< EA II < X , f > I- I < X,g > II< EAI<X,f-g>I=IA(J-9)
< .!.llf - gil,17
where the last inequality follows from (4.7). Hence the Lipschitz continuity of LA follows. Now we are in a position to give a sufficient conditionfor weak integrability.
THEOREM 96 Let A in A. If
is an element of 1l and if LA is continuous at 0 then X is weakly integrable on A and t XdP= CA.
Proof. Let f E H, We have to prove t hat < X , f > is integrable on Aand that EA < X,f >=< cA,f >.
208 RKHS IN PROBABILITY AND STATISTICS
Let (In)nEN be a sequence of elements of 1£0 converging in 1£ to thefunction f . By Theorem 95 we can write
LI < X, fn > - < X , fm > IdP = IA(Jn - fm) ~ 1] IIfn - fmll1lo
where 1] is a positive constant . Therefore « X, fn >)nEN is a Cauchysequence in £1(A) , the space of P-integrable functions over A. As itconverges everywhere to < X , f >, it converges also to < X, f > inL1(A). It follows that < X,f > is integrable over A and that
EA < X , f >= lim EA < X, I« > .n~oo
By Theorem 94,
and
•We will end this section by giving in Theorem 98 a necessary and sufficient condition for weak integrability of RKHS valued random variables.For that we need a definition and a theorem of Hille and Phillips (1957) .
DEFINITION 34 A set function B defined on a probability space (0, A, P)and taking its values in 1£ is said to be absolutely continuous if and onlyif, for any e > 0 there exists 1]1$ > 0 such that
VA EA
THEOREM 97 If X :set function
(P(A) < 1]1$ ==* IIB(A)II < €) .
(0 ,A, P) ----7 (1£, B1l) is weakly integrable the
A ----7 1£
A t---+ XA = t X dP
is absolutely continuous on (0 , A, P).
THEOREM 98 A random variable X : (0 , A, P) ----7 (1£, B1l) is weaklyintegrable if and only if the following two conditions are satisfied.
a) For any A E A the mapping
Measures and random measures
is an element 01 1l.b) The set [unction
209
is absolutely continuous on (Q, A, P).
Proof. First suppose that X is weakly integrable. Then for any A in Aand any t in E we have
[A(t) =L< X, K( ., t) > dP = XA(t)
and a) is true. By Theorem 97 b) is also satisfied.Let us now turn to the converse. a) and b) are supposed to be true.First note that X is measurable since the Xt's are measurable. We willprove that for A E A the mapping IA is continuous at O. Then the weakintegrability of X will follow by Theorem 96.Let (In)nEN be a sequence of 1lo converging to 0 in the norm sense. Letus show that (IA (In)) tends to 0 as n tends to infinity.Let e > O.On the one hand, by b) there exists 1]1(e) > asuch that , for any BE A,
P(B) < 1]1(e) :=} II£BII < e1/2.
On the other hand « X, In >)nEN converges to aeverywhere and therefore in probability. So there exists N (e) such that
{
P(I< x.t; >1 > ~) < 1]1(e)n ~ N(e) :=} and
IIfnll < e1;ZFor n E N, let
A~ (e) = A n {I < X, f n>I > ~} n {< X, In >~ a}
and
We have
A; (e) = An {I< X, In >1 >~} n{< x.t; >< O}.
IA(ln) = L1< X, In >1 dP
~ r I<X,/n>ldP+~JAn{I<X.Jn>I>f} 2
210 RKHS IN PROBABILITY AND STATISTICS
~ r < X, In > dP - r < X, In > dP +iJA~(e) JA;:;(e)
Now, if n 2: N(€), we have the following inequalities
IIEA~(e)1I < €1/2
liEA;:; (e) II < €1/2
€1/2
Il/nll<4
so that
•Sufficient conditions for a given function on E to belong to 'H (conditiona)) can be found in Due-Jacquet (1973).
6. INNER PRODUCTS ON SETS OFMEASURES
In the rest of this chapter we will denote by(E , I) a measurable space,M the space of signed measures on (E, I),M+ the subset of M of positive measures,P the set of probability measures on (E, n,M o the space of measures with finite support,Po =pnMo,K a real bounded measurable reproducing kernel on Ex E,1-l the RKHS with kernel K,1-lo the subspace of 11 spanned by (K(. , X))xEE.From the properties of K, the mapping
(E, I, It) -+ (1-l ,B1l)
x ~ K(.,x)
is strongly integrable for all It in M and we can define a mapping
I: M-+ll
Jl ~ Ill- = JK(. , x)dJl(x)
Measures and random measures 211
We will suppose that K is such that the functions IIJ. and Iv are differentif JJ and v are not equal. A consequence is that
(IIJ. = 0)~ (JJ = 0)
since (JJ = 0) implies (IIJ. = 0). The next lemma expresses that thisproperty is shared by kernels that can be written
00
K(x, y) = "LJi(x)fi(Y)i=O
(4.8)
where the set of functions {fi : i E N} characterizes signed measures on(E, n as it is the case with the monomials in the moments example ofSubsection 1.3.
DEFINITION 35 A set of complex functions {fi : i E N} on E is said tocharacterize (or to determine, see Billingsley, 1968) the elements of Mif and only if for any JJ in M
(Vi E N, 1f idJJ = 0) ~ (JJ = 0) .
It is clearly equivalent to say that two different signed measures JJ and vproduce two different sequences {J fidJ.l : i E N} and {J fidv : i E N}.LEMMA 19 Let K be a kernel satisfying (4.8) with a set of functions{fi : i E N} characterizing the signed measures. Then
(IIJ. = 0)~ (JJ = 0) .
Proof. By the properties of integrals
< IIJ.,Iv >1i = <1K(., x)dJJ(x),1K( ., x)dv(x) >1i
1< K(. , y) ,1K (., x)dv(x) '>« dJJ(Y)
= 1(1 K(x ,Y)dV(X)) dJJ(Y)
~ J(J to /;(x If;(Y)dl'(X)) dl'(Y)·
On the other handn
L f i(X)fi(Y);=0
1 n
< 2L (Ji(x)2 + f ;(y)2)i= O
1< 2 (K(x, x) + K(y, y))
< supK.
212 RKHS IN PROBABILITY AND STATISTICS
Using the Hahn-Jordan decomposition of j.t and the Lebesgue dominatedconvergence Theorem one gets
IIIJL 11 2= f (J Jidj.t) 2
~=o
and the conclusion follows. •Let us now state the fondamental link between reproducing kernels andinner products on sets of measures.
THEOREM 99 The mapping
MxM --t lR
(j.t,v) f---t < j.t,V >M=< IJL,Iv >1{= JK(x ,y)d(j.t0v)(x,y)
defines an inner product on M for which Mo is dense in M.Conversely let << ., . >> be an inner product on M for which Mois dense in M. Suppose that the function K(x, y) =<< 8x,8y >> ismeasurable and bounded on E x E and that the corresponding mappingI is one-to-one from M to I(M). Then there exists a RKHS 1£ withkernel K and a unique linear mapping
h: I(M) ----+ 1£
such that
«j.t,V»M = < h(I(j.t)),h(I(v)) >1{
< h(j K(.,X)dj.t(x)),h(j K(. ,x)dv(x)) >1{.
If the mapping
has for any j.t a weak integral equal to u; then the mapping h is equal tothe identity and we have
«j.t,V»M < JK(.,x)dlJ(x),JK(.,x)dv(x) >1{
= j K d(1J ® v)
Measures and random measures 213
(by Fubini theorem)
< JL , u >
Proof. Direct part.From its very definition it is clear that the mapping < .,. > is bilinear.As K is symmetric and (JL 0 v)-integrable we have
JK(x , y)d(JL 0 v)(x , y)
JK(x, y)d(v 0 JL) (y, x)
JK(y,x)d(v 0JL)(Y,x)
< v,JL >
and < ., . > is symmetric.The positive definiteness follows from
and the equivalence(IJl = 0) {::::::::} (JL = 0) .
If JL and v are two probabilities we have
1< JL, v >1 ~ If K(x, y)d(JL 0 v)(x , y)1 ~ sup IKI·
As we have< JL, v >M=< IJl,Iv >tl
the mapping I is an isometry between M and I(M) and the densenessof M o in M is a consequence of the denseness of Jio in Ji.Converse part.By linearity we have
for JL and v belonging to the space M o so that the restriction of themapping I is an isometry from M o onto I(M o) = 1£0 but there is noreason for Formula (4.9) to hold for any elements JL and v in M (seeExercise 1). Let f be an element of I(M) and JL = I-1(J). As M o isdense in M there exists a sequence (JLn )nEN in M o converging to JL. Theisometry I transforms this converging sequence into a Cauchy sequencein 'H converging to some element 9J' Define
h: I(M) -+ 1£
f f----t h(J) = 9J·
214 RKHS IN PROBABILITY AND STATISTICS
< h(I(J.L)) , h(I(v)) >11. =
h is well defined, linear, and we have
lim < I(J.Ln) ,I(vn) >11.n-..+oo
lim «J.Ln,vn »Mn-..+oo
= << J.L , v >>M .
This ends the first part of the converse proof.Now, suppose that for any J.L in M we have
Then
«J.L,V»M « f OxdJ.L(x),f c5xdv(x) »M
= 1« ox,f Oydv(y) »M dJ.L(x)
1(1 «Ox,Oy »M dV(Y)) dJ.L(x)
1Kd(J.L Q9v)
= <1K(., x)dJ.L(x),1K(., x)dv(x) >11.
and h is the identity operator. •
RemarkConsider a random variable X taking its values in (E ,n with unknownprobability Px. Then the random Dirac measure Ox can be seen as anestimator of Px based on one observation X. For J.L = Px the condition given on the mapping ~ is equivalent to the unbiasedness of theestimator Ox since its (weak) expectation is
f OxdJ.L(x) = J.L.
7. INNER PRODUCT AND WEAKTOPOLOGY
A major tool in the study of measures is the weak topology. It istherefore important to compare topologies on M defined by inner products and the weak topology.In this section, adapted from Guilbart (1978a), E is a separable metricspace and T is its Borel a -algebra. Let < ., . >M be an inner product on
Measures and random measures 215
(4.10)
M such that the function K(x, y) =< 8x , 8y >M is bounded on Ex E .Recall that, by the Cauchy-Schwarz inequality, K(x, y) is bounded onEx E if and only if K(x, x) is bounded on E.The following theorem gives two important properties of the functionK when the inner product induces the weak topology on the set P ofprobabilities on (E, n.THEOREM 100 If the topology induced on P by the inner product <., . >M coincides with the weak topology then the function K is continuous and
< JL , v >M= JKd(JL 0 v).
Proof. Let (Xn)nEN and (Yn)nEN be two sequences in E convergingrespectively to x and y. Then the sequences (8X n )nEN and (8yn )nENconverge respectively to 8x and 8y in the sense of the weak topology(Parthasarathy, 1967) and therefore in the sense of the inner product.As we have
K(x,y) - K(xn,Yn) = < 8x,8y >M - < bX n ,8Yn >M
< s; - 8x n , 8y >M - < 8x n , 8Yn - 8y >M ,
we can write
and the continuity of K follows.Relation (4.10) is satisfied for Dirac measures and, by linearity, in theset M o of measures with finite support. Now let (JL, v) E M 2 • As M o isdense in M in the sense of the weak topology t here exist two sequences(JLn)nEN and (vn)nEN of elements of M o converging weakly respectivelyto JL and t/. By hypothesis this convergence also occurs in the sense ofthe norm on M so that the sequence of inner products
tends to < JL , v > .On the other hand K is bounded and continuous and (JLn 0 Vn)nEN tendsweakly to JL 0 v therefore
JKd(JLn 0 vn) tends to JKd(JL 0 v).
Hence, (4.10) is proved . •
216 RKHS IN PROBABILITY AND STATISTICS
Now let us see under what conditions the reciprocal of Theorem 100holds true. For this we need Lemma 20 on orthonormal systems in 1£characterizing signed measures on E and Lemma 21 on weak convergence.
LEMMA 20 Let 1£ be a Hilbert space of functions defined on a compactmetric space (E, d) with continuous reproducing kernel K. Then anyorthonormal system (ei) in 1£ characterizing signed measures is total inthe set Cb(E,C) of bounded continuous complex functions on E endowedwith the sup norm.
Proof. From Corollary 5 it is clear that 1£ c Cb(E,C). Let (ei) be anyorthonormal system in 1£ characterizing signed measures. The closedsubspace of Cb(E,C) spanned by (ei) is denoted by S. Suppose thatthere exists an element cp of Cb(E,C) that does not belong to S. By theHahn-Banach theorem (Rudin, 1975) there exists a continuous linearform L on Cb(E,C) which vanishes on S and takes a non-zero value atcp. By the Riesz representation theorem there exists J.l in M such that
L(J) = Jf du,
As L(cp) =f: 0, the measure J.l is not null and yet
Vi E N, Jei dJ.l = O.
We get a contradiction. Hence, cp does not exist and S = Cb(E,C). Thesystem (ei) is total in Cb(E,C). •
LEMMA 21 A sequence (e.) which is total in Cb(E, C) determines theweak convergence of probability measures, i.e. for any sequence (J.ln) inP and any J.l in P the weak convergence of (J.ln) to J.l is equivalent to
Vi EN, Je, dJ.ln --7 Jei dJ.l as n --7 00. (4.11)
Proof. Condition (4.11) is clearly necessary by definition of the weakconvergence.Let us now prove its sufficiency. By linearity, for any f belonging to thevector space E spanned by (ei) we have
Jf dJ.ln --7 Jf du as n --7 00.
Let e > 0 and let g be any element of Cb (E, C) . For f in £ such that
sup Ig - fl ~ c
Measures and random measures
we can write
217
If 9 dJ1n - f 9 dJ11 s If 9 dJ1n - f f dJ1nI+ 11 f dJ1n - 1f dJ1 j + 11 f dJ1 - 19 dJ11
and therefore
As the second term in the above right hand side member tends to zeroas n -t 00, we get that
The conclusion follows.
19 dJ1n -t19 dJ1.
•We end this section by stating the converse of Theorem 100 in the casewhere E is a compact metric space. The case where E is a non compactseparable metric space is treated by Guilbart (1978a) .
THEOREM 101 Suppose that E is a compact metric space, that the function K(x,y) =< 8x,8y >M is continuous and that
Then the topology induced on P by the inner product < 0' . >M coincideswith the weak topology.
Proof. Applying Corollary 5 we can write
"Is E E, "It E E,00
K(s, t) = Lej(t) ej(s),i=O
(4.12)
where the convergence is uniform on Ex E, (ej) is an orthonormal systemin 1£ and each e, is uniformly continuous and bounded. For any signedmeasure J1 on (E ,T) we have
"J1"~ = 1K d(J1 0 J1)
= J(to e,(z)e,(y)) d(p.0 p.)(z , y)
= to (1 ej dJ1) 2
218 RKHS IN PROBABILITY AND STATISTICS
by uniform convergence and the Fubini theorem. It follows that thesystem (e.) characterizes signed measures and, by Lemma 20 and 21,that the weak convergence in M of a sequence (f-ln) to some element f-lis equivalent to
Vi EN, Je j df-ln -7 Jei df-l as n -7 00. (4.13)
Hence it is clear that convergence in the sense of the inner productimplies weak convergence.Conversely, if (f-ln) converges weakly to f-l then (f-ln 0f-ln) converges weaklyto f-l 0 f-l. The function K being bounded and continuous this impliesthe convergence of
IIf-lnIl 2= JK d(f-ln 0 f-ln)
to 1If-l1l 2 • Together with (4.13) this implies convergence in the sense ofthe inner product. •
As we have seen the functions e, appearing in the decomposition of thereproducing kernel playa key role in many proofs. When those functionssatisfy an additional condition on their upper bounds it is possible toderive a Glivenko-Cantelli type theorem for random variables takingtheir values in E. (See Guilbart , 1977a and Exercise 3). Such a theorem,with rate of convergence, is basic in applications to statistical estimationand hypothesis testing.
8. APPLICATION TO NORMALAPPROXIMATION
In the present section we give an example of application of the RKHSmethodology to the normal approximation of partial sums of randomvariables with rates of convergence (Berry-Esseen theorems). It originates from a paper by Suquet (1994). The space of measures is embeddedin a reproducing kernel Hilbert space and in a L2 space using an integralrepresentation of the kernel. Then the weak convergence of probabilitymeasures can be metrized through a suitable choice of the kernel andrates of convergence in the Central Limit Theorem can be easily derived.Let X1"",Xn be independent real random variables with mean zeroand finite moments of order 3. Denote (1j the standard deviation ofXj,1 :S j :S n,
n
and S* = Snn Sn
Measures and random measures
wheren
sn = (L aD 1/2
i=l
219
is the standard deviation of Sn'Then the Berry-Esseen theorem (Shorack and Wellner , 1986) providesan upper bound for the Kolmogorov distance IIF~ - Fll oo between thedistribution function F~ of S~ and the distribution function F of N(O, 1).
THEOREM 102 (BERRY-ESSEEN THEOREM) There exists a un iversal constant C such that
n
IIF~ - Flloo =sup IP(S~ s x) - P(Z ~ x)1 s Cs;;3 L E IXjl3 (4.14)xER j = l
where Z has distribution N(O, 1).
Other distances have been considered (Rachev, 1991) for which similarbounds have been proved. Our goal here is to show that the RKHSframework is well adapted to prove such results.Consider a non negative real function q integrable with respect to theLebesgue measure A. The functions exp(ix.), x E JR, belong to the spaceL 2 (q) of square integrable functions with respect to the measure withdensity q on JR. Hence, by Lemma 1, the function
K(x,y) =< exp(ix.) ,exp(iy.) >£2(q)= Jexp(iu(x - y)) q(u) dA(u)
is a reproducing kernel on JR . Denote by dM the distance on the space Mof bounded signed measures associated with K , by £(Sn) the probabilitydistribution of Sn and let
a(q) =Ju6q(u) A(du).
Then we have the following bound (Suquet, 1994).
THEOREM 103 Suppose that Xl , .. "Xn are independent real randomvariables with mean zero and finite moments of order 3 and that thefunction q satisfies
o< a(q) < 00.
Then the distance dM meirizes the weak topology on the set of probabilitymeasures and we have
220 RKHS IN PROBABILITY AND STATISTICS
Under the same hypotheses, by considering the variables Xj / Sn , oneeasily gets the following bound
which takes the form
if the variables Xl, ... , X n satisfy, for 1 ;::; i ;::; n,
I 31 (2)1/2E Xi = T and EXi = a.
For the proof of Theorem 103 and extensions to multivariate and dependent cases the reader is referred to Suquet (1994).
9. RANDOM MEASURESAt first sight it seems natural to define a random measure as a ran
dom variable with values in a set of measures M equipped with somea-algebra. However, the definition of this a-algebra possibly derivedfrom some topology on M is not a simple matter and the resulting theory involves delicate mathematical questions (Kallenberg (1983) , Karr(1986), Geffroy and Zeboulon (1975) , Jacob (1978, 1979)).As the theory of Hilbert space valued random variables is much easierto handle there is a great temptation to use the embedding
I: M--tJi
/-l 1---7 JK(. , x) d/-l(x)
introduced in this chapter to define and study random measures. Buteven in this framework difficult questions immediately come up. How tocharacterize the elements of I(M) among the elements of Ji? What canbe the limit of a sequence of elements of I(M)? How to characterizethe a-algebras on Ji containing the set I(M)? From Theorem 88 theBorel a -algebra B1i on a separable RKHS is generated by the evaluation functionals but there is a priori no reason for B1i to contain I(M)(even if I(M) is a dense subspace!). Under some additional conditionsI(M) can be proved to be a Borel set when M is either the set of signedmeasures or the set of positive measures on (E, BE) where E is a locallycompact or separable metric space (Suquet, 1993).
Measures and random measures 221
Let us adopt the same route as in the above sections and start withDirac and finite support measures. As we will see this route leads to afunctional theory rather than a set theory of random measures.The notion of random measure on a measurable set (E ,T) can be introduced through the simple and understandable example of the Diracmeasure <5y , where
Y: (O,A,P) -+ (E , T)
is a random variable. For any set A in the e-algebra T , we have
J { I if YEA<5y(A) = lA d<5y = lA(Y) = 0 if Y rf. A
and , more generally, for any measurable function
f : (E, T) I-----T (C,Be),
the random integral Jf d<5y is equal to the random variable
f(Y) : (0, A, P) I-----T (C,Be) .
11. being a RKHS of func tions on E with measurable kernel K, K(., Y)is a 1I.-valued random variable. More generally, two triangular arraysbeing given (all random variables are defined on (0, A, P))• one of complex random variables
A1,n, A2 ,n, ... , Ak(n),n,
• one of random points in E
the sumken)
j.ln = ~ Ai,n<5Y; ,ni=l
is a sequence of measures on E with finite support associating with f arandom integral
ken)Jf dj.ln = ~ Ai,nf(Yi,n)'i= l
Each of these measures is represented by the 1I.-valued random variable
k(n)
~ Ai ,nK(. , Yi,n)i = l
222 RKHS IN PROBABILITY AND STATISTICS
in the sense that for any ! in H, we have,
k(n) k(n)! ! dJ1-n = L Ai,n!(li,n) =< I, L Ai,nK( ., li ,n) >1{ .i=l i = l
Note that the dimension k(n) of the above triangular arrays can itselfbe a random variable so that the present setting covers many examplesof sequences of discrete random measures. Let us briefly mention a fewof them.Example 1. Empirical measure. The empirical measure
1 nJ1-n = - ~ 8y,n L...J •
i=l
associated with a sample (Y1 , ... , Yn ) of n random variables is dealt within Subsection 9.1 below. It corresponds to
1k(n) = n, lin = li and Ain =-., 'n
Example 2. Donsker measure. It can be written as
1 n
J1-n = vn L li8{i/n}t=1
and therefore corresponds to
k(n) = n, li ,n = i/n and Ai,n = ~'
The random functions involved in the Donsker theorem (Billingsley,1968) can be written as integrals with respect to Donsker measures.Main applications are invariance principles in RKHS and L2 spaces (Suquet, 1993, Suquet and Oliveira, 1994).Example 3. Point process. A point process on E can be defined as
N
J1-n = L8Yii=l
where N is the random number of points Y1 , • •• , YN falling into E. Itcorresponds to
k(n) = N, li ,n = Yi and Ai,n = 1.
Other models can be considered, for instance thinned point processes forwhich each observation li is deleted with some probability p(li).
Measures and random measures 223
RKHS methods can be used for estimating and predicting point processes (Bensaid, 1992a) and random measures (Bosq, 1979). In thefollowing subsection we consider the empirical measure. Strong approximations of empirical processes will be presented in Chapter 5.
9.1. THE EMPIRICAL MEASURE AS1/.-VALUED VARIABLE
The present subsection deals with the simple case of the empiricalmeasure. It can serve as an introduction to the general theory of random measures as RKHS valued random variables.Let Y : (n, A , P) ----+ (E,7) be a random variable with unknown probability measure Jl on E. For any tEE the Dirac measure Ot defines acontinuous linear form on 1l :
f ~kf dOt = f (t)
which is nothing else than the evaluation functional et represented in1l by the function K(., t) . Thus the random Dirac measure Oy is represented in H by the variable K(., Y) . Let (l'i)i>l be a sequence of independent random variables taking their values in (E,7), defined on(n, A, P) with common probability measure u, The natural estimate ofJl associated with this sample is the empirical measure
1 n
Jln = - LOYk'n k=l
It is represented in 1l by
The RKHS theory provides a functional framework to study how Jlnapproximates Jl or rather how its representer in 1{ approximates therepresenter of Jl which is
Ill- = JK(. , x) dJl(x).
From Corollary 12 the mapping
(n, A, P) ----+ (1l, B1i)1 n
W ~ -;;: I:K(. ,Yk(W))k=l
224 RKHS IN PROBABILITY AND STATISTICS
is measurable for any n E N* if and only if K(t , Y) is a real randomvariable for any tEE (to simplifly we consider here that 1l is a space ofreal functions on E). When this last condition is fulfilled the empiricalmeasure can be considered as a 1l-valued random variable. This is thecase if the mapping
'11 K : (E, n -7 (1l , B'H)t ~ K(.,t)
is measurable.In the rest of this subsection we will make the assumption that K( .,Y)is a second order random variable and that two elements f and 9 of 'H.have the same integral with respect to p if and only if f = g.
9.1.1 INTEGRABLE KERNELSLet us summarize the assumptions made on K , 1l and p:
(Hj ) K is ('(2 - BIR)-measurable.(H2) The mapping
E -7 a+x ~ K(x ,x)
belongs to £1(p).(H3) the null function is the only one function in 1l that is null p
almost everywhere. As FK(X) = K(., x), IIFK(x)1I 2 = K(x, x) and wehave
JK(x, x)dp(x) = JIIFK(x)11 2dp(x) = JIIK( .,Y)11 2dP.
K(. ,Y) is a second order random variable on (0, A , P) if and only if '11K
is a second order random variable on (E, T, p) and this is equivalent tohypothesis (H2) .
Before giving properties of the natural estimate of Ip. = JK( ., x) dp(x)let us draw some consequences of our assumptions. For definitions ofhilbertian subspaces and Schwartz kernels see Subsection 6.1 in Chapter1.
THEOREM 104 H is a hilbertian subspace oj £2(p).
Proof. Let 9 E 'H, As we have
Ig(x)1 2 = I < g, K(. , x) >'H 12 :S IIgll~ K(x, x)
the function 9 belongs to £2 (p) and
IlgIIL2(JL) $ IIgll'H (J K(x ,x)dp(x) Y/2
Measures and random measures
The conclusion follows.
225
•For any square integrable real function 9 on (E t I, J.L), denote by 9 itsclass in L2(J.L). Assumption (H3 ) implies that t he natural embedding
is an isomorphism of Hilbert spaces between 1l and its image it.
THEOREM 105 For any 9 in L2(J.L) , the mapping
1l -t lR
f r----t j fg dJ.L
is a continuous linear form on H. It is represented in 1l by
Lg./-l = JK(., x)g(x )dJ.L(x) t
where g.J-t stands for the measure with density 9 with respect to u ,
Proof. Let 9 in L2(J.L) . Then the mapping
E -t 1l
x r----t g(x)K( ., x )
is defined JL-almost everywhere, measurable and Bochner integrable byAssumption (H2) . Thus
Jtci. ,x)g(x) dJ.L(x)
does exist and belongs to H. By the properties of Bochner integral wehave, for any f in 1l
< I, JK(., x)g(x) dJ.L(x) >11.= J< It K( ., x)g(x) > dJ.L(x) = JIgdJ.L
and the theorem follows.
Remark The continuity of the linear form
1l -t lR
I r----t jfg dJ.L=<f,g>£2(/-l)
•
226 RKHS IN PROBABILITY AND STATISTICS
is an immediate consequence of Theorem 104. By Lemma 10 this continuous linear form is represented in 1l by the function
t t-t JK(t, x)g(x) dJ-L(x).
Hence we can conclude that the weak integral
f K(., x)g(x) dJ-L(x)
exists. Then Assumption (H2 ) has to be invoked to obtain strong integrability.We are now in a position to describe the Schwartz kernel of H,
THEOREM 106 The mapping
N : L2(J-L)t-t 1l
g t-t N(g) = Ig./l- =JK(. , x)g(x) dJ-L(x)
is the Schwart z kernel of 1l considered as hilbertian subspace of £2 (J-L) .
Proof. The kernel L of 1l is characterized by
From Theorem 105 it follows that
thus L =N. •It is worth noting that the restriction to 1l of the Schwartz kernel Nis the covariance operator CK( .,Y) of the 1l-valued random variableK(. ,Y). To see this, let f and g be two elements of 1l. Then we have
< CK( .,Y) (g), f >7-{ = E(f(Y)g(Y)) = Jfg dJ-L
< Ig ./l-' f >7-{=< N(g) , f > 1l
and therefore
Vg E 1l ,
See Exercise 7.In the particular case where the inner product of 1l coincides with the
Measures and random measures 227
inner product of £2(/.t) , I g . /1- is the orthogonal projection II1i(g) of thefunction 9 on 1i. It is easily seen by writing
I g . /1- JK(., x)g(x) df.l(x)
= JK(., x) II1i (g)(x) df.l(x)
= II1i(g).
This situation is encountered when unknown functions are estimated byorthogonal functions methods.The norm of I g . /1- can be computed as an integral of the kernel as statedin the following theorem.
THEOREM 107 For any 9 in £2(f.l) K is g.f.l Q9 g.f.l integrable and
Proof. Let (x, y) E E 2. The integrability of K with respect to g.f.l Q9 g.f.lfollows from the inequality
IK(x, y)g(x)g(y)1 ~ IIK(., x)IIIIK(., y)lIlg(x)llg(y)l·
Now,
l/.1"g ./1-11 2 = < JK(., x) g(x) df.l(x),JK(., y) g(y) df.l(Y) '>«
JK (g Q9g)d(f.l Q9f.l)
by the properties of the inner product and integrals and Fubini theorem. When K is bounded the condition in (H2 ) is satisfied for any f.lin the space M of bounded measures on E. Thus for any real boundedmeasurable function 9 on E one can define a linear mapping
.1"[g] : M ---t 11.
f.l ~ I[g](f.l) = I g . /1- = JK(., x)g(x) df.l(x).
The case where 9 is identically the constant 1 on E provides undersuitable assumptions the embedding of M in 11. which is exploited inthe present chapter.
228 RKHS IN PROBABILITY AND STATISTICS
Since
9.1.2 ESTIMATION OF I~
Recall that the probability J1- is unknown and that we estimate itsrepresenter
LIJ, = JK(., x) dJ1-(x)
from a sequence (Yi)i~l of independent random variables with probability law J1- on E.The random variables K(. ,Yk) : (fl, A ,P) I--T (1£,B1/.) are integrable,independent and have the same distribution. Let , for n ~ 1,
n
s; = L K(., Yk)k=l
and
An = vn (~n - LIJ,) .
LIJ, =JK (.,Yk)dP = E(K( ., Yk))
we have, by the strong law of large numbers, almost sure convergenceof Sn/n towards LIJ, as n tends to infinity. Now, as K(., Y) is a secondorder 1£-valued variable one can prove by using the Hilbert space version of the Central Limit Theorem that the sequence (An)n>l convergesweakly to a centered gaussian variable (Ledoux and Talagrand, 1991).We give hereafter a direct proof of this result to illustrate the simplicityof calculations in RKHS and conditions of relative compactness.For n ~ 1, An will denote the law of probability of An on 1£.Let us first prove two lemmas.
LEMMA 22 An is a second order random variable and
Proof. An is a sum of second order random variables. Hence it is asecond order variable. As
we have
Measures and random measures
Expanding the inner product in the first sum we get
229
But
and
Therefore
x E 1£,
and the formula follows.
LEMMA 23 The sequence P'n)nEN* is relatively compact in Pr(1l) equippedwith the weak topology.
Proof. In the case where 1£ has a finite dimension, the conclusion followsfrom Lemma 22 and Tchebychev inequality:
Let e > O. For R large enough the closed ball B* (0, R) is a compact setwith An-measure greater than 1 - e, for any n ~ 1.In the case where 1£ has infinite dimension, a sufficient condition forrelative compactness is (Parthasarathy (1967) and Suquet (1994»):
supjrN(x) dAn (X) -70 as N -7 00n~l
where00
rN(x) = L < x, e; >~,i=N
and (ei)iEN is an orthonormal basis in 1£. Let n ~ 1. As An is a secondorder variable, < An' e, >~ is P-integrable for all i 2: 0 and
00
L E < An ' ei >~= E(/iAn /l 2) .
i= O
(4.15)
230
Now,
If k i= e
hence
RKHS IN PROBABILITY AND STATISTICS
J< An, ei >~ dP = J[ei(Y)- < IJ1-' e, >1£]2 dP = Var (ei(Y))
E(ei(y))2 - (IJ1-,ei?
The last quantity is independent of n so that E(rN(An ) ) is independentof n and tends to 0 with 1/N since it is the rest of order N in the seriesin (4.15). From Lemma 22 and 23 we get the following weak convergencetheorem.
THEOREM 108 The sequence (An)n>l of laws of probability of the random variables (An)n>l converges weakly to a centered gaussian probability on 1i with covariance function given by
C(J,g) = JfgdJl- JfdJl Jq du, (J,g) E 1i2.
Proof. For n ;:::: 1 the characteristic functional of An is denoted ~n' For9 in 1i we have
~n(g) = Jexp (i < f, 9 >1£) dAn(J)
E(exp(i < An,g >1£))
E (exp (In) It.9 (Y, l- <:[.,9 >1<])
[E (exp ()n[g(y)- < IJ1-' 9 >1£]) ) rLet <PZg be the characteristic function of the real variable
Since
Measures and mndom measures
Zg is a centered variable. Now,
E(Z;) = Var(g(Y)) = E(g(y)2)_ < t.;9 >~
therefore a Taylor series expansion at the point 0 yields
CPZg(t) = 1 _ Var(;(Y)) t 2 + o(t 2 )
and
\ ( ) _ [ (_1)]n _( _var(g(y)))n (~)An 9 - 'PZg rz; - 1 2 + 0 .
yn n n
Hence
231
lim >'n(g) = exp (_ var(g(y))) .n~oo 2
Applying Lemma 2.1 of Parthasarathy (1967) we can conclude that thereexists AO in Pr(Ji) such that
An -+ AO weakly as n --t 00.
As the characteristic functional of AO is given by
Ao(g) = exp (-~var(g(y))), 9 E Ji ,
AO is a centered gaussian distribution on Ji. Let C be its covariancefunction, 5 be its covariance operator and let 1 and 9 in H: We have
C(j,g) =< 51,g >1i, < 5g,g >1i= Var(g(Y))
C(j,g) =< 5j,g »u
and1
< 51,g >1i= 2[< 5(j +g) ,1 +9 >1i - < 51, j >1i - < 5g,g >1i]'
Hence12[Var«(j +g)(Y)) - Var(j(Y)) - Var(g(Y))]
Cov(j(Y) , g(Y))
E(jg(Y)) - E(j(Y)) E(g(Y)),
and the conclusion follows. •
The covariance operator 5 of AO and the covariance operators of thevariables An and (K(. , Y) - IJ-!) are equal. Their kernel (in the sense ofDefinition 5 of Chapter 1) is the function associating with (s, t) in E 2
the real
C(K( ., t), K( ., s)) = E[K(t, Y)K(s, V)] - LJ-!(t)LJ-!(S)
= JK(t, x)K(s, x) dJL(x) - JK(t, x) dJL(x) JK(s,x) dJL(x) .
232 RKHS IN PROBABILITY AND STATISTICS
9.2. CONVERGENCE OF RANDOMMEASURES
After the works by Bosq (1979) and Berlinet (1980), Suquet (1986)presented a general framework to define and study random measures asRKHS valued random variables. A first gain over the classical theory isthat there is no need to define a priori a topology on E. The counterpartis that we need an inner product on the set M of signed measures on(E, n or equivalently a mapping from M into some RKHS 1£ of functions on E. The hilbertian construction of random measures carried outby Suquet needs a metric on the set E. Three cases are distinguishedin his paper according to whether E is compact, locally compact orseparable. In the classical theory E is usually supposed to be locallycompact with countable basis (Kallenberg, 1983). In this subsection wewill limit ourselves to the case where E is a compact metric space andT is its Borel a-algebra and give one theorem about convergence in law.For weaker conditions and convergence with respect to other stochasticmodes the reader is referred to Suquet (1993).As before we consider a sequence (ei) of measurable functions characterizing signed measures on (E, n (Definition 35) and satisfying
00
2: Ileill~ < 00.
i = O
As E is a compact space, it is always possible to build such a sequencefrom a sequence that is total in the separable space of continuous functions on E.Defining the function K on E2 by
00
K(x, y) =I: ei(x)ei(Y)i = O
we can define an inner product < .,. >M on M by setting
< 11, v >M= / K d(ll Q9 v) = to (/ e; dll) (/ ei dV) .
The function K is the reproducing kernel of a space 1£, the mapping
I: M-+1£
I-l t---t / K(., y) dl-l(Y)
is an isometry from M onto I(M) and we have
Vf E 1£, VIl E M , < f ,I(Il) >M= Jf du,
Measures and random measures 233
I(M) is dense in H and the topology defined by < ., . >M coincides withthe weak topology. The sets I(M) and I(M+) , M+ denoting the set ofbounded positive measures on E , belong to the a-algebra B1£ of H, Thislast property is not true in the general case. It is of great importancefor it makes the following definition possible.
DEFINITIO N 36 A random measure (respectively a posit ive random measure) on (E,7) based on a probability space (0, A , P) is a random variable defined on (0, A, P) and taking almost surely its values in I(M)(respectively I(M+)) equipped with the a-algebra induced by B1l.
It follows from Corollary 12 that a variable /-l(') defined on (0, A,P)taking almost surely its values in I(M) is a random measure if and onlyif for any y E E the function
defines a real random variable on (0, A, P). This last condition is equivalent to the measurability of
for any element f of 1l. Indeed, in the present setting the role of therandom variables {J f d/-l(') : f E 1l} is similar to the role played by thevariables {/-l( ')(A) : A E C} in t he classical t heory of random measures,C being some subset of T',Now, the notion of convergence of a sequence of random measures, withrespect to some stochastic mode (almost surely, in probability, in law)can be easily derived from the same notion of convergence of Hilbertvalued random variables. But rare are the theorems giving a characterization of convergence by means of convergence of the random variables{J f dJ-L(') : f E 1l} without an additional condition of compactness. Wecite the following theorem under the hypothesis made in this subsection.
THEOREM 109 Let (J-L~») be a sequence of positive measures and J-L(') be
a positive measure. Then (J-Lk») converges in law to /-l(') if and only if for
any f in H. the sequence of real random variables (J f dJ-L~») converges
in law to J f dJ-L( ·).
See Suquet (1993) for the proof and other modes of convergence.
234 RKHS IN PROBABILITY AND STATISTICS
10. EXERCISES1 From Guilbart (1978a).
Let E = [0,0.5] , 'T be its Borel a-algebra and M be t he set of signedmeasures on (E, T) , as in the moments example (Subsection 1.3).By the Lebesgue decomposition Theorem, any element J.L of M canbe written
J.L = S(J.L) + s(J.L)
where s(J.L) is a measure with finite or countable support and S(p) isa measure giving the mass 0 to any singleton. Consider the mapping
«·,·»M:
so that we can write
M xM -+ R
(J.L, v) ~ < J.L, u > + < S(J.L) , s(v) >
00 00
« J.L, u > M= L J.LilIi +L S(J.L)iS(V)ii=O i=O
where
J.Li =JxidJ.L(x)
denotes the moment of order i of the measure J.L.Prove that « .,. »M is an inner product on M but that
II J.LII~ # J«bx , by» d(J.L 0 J.L)(x, y)
whenever s(J.L) = 0 and S(J.L ) # O.Prove that M o is not dense in M.
2 From Guilbart (1978a).Let (E ,T) be a measurable space and J( be the set of bounded measurable reproducing kernel on E x E such that the formula
< J.L, v >K=JKd(J.L 0 v)
defines an inner product on the set M of signed measures on E.Let J.Ll, ... , J.Ln be n elements of M linearly independent, M n be thesubspace of M spanned by J.Ll, ... , J.Ln and 11K be the projection ontoM n in the sense of < J.L, v >K . The set J( is endowed with theuniform metric
du(Kl , K 2 ) = sup IKl (x, y) - K 2(x ,y)\(x,y)EE xE
Measures and random measures
1) Show that, for any v in M, the mapping
235
is continuous.2) Let K be a subset of IC such that for any K in K, the matrix
has its determinant lower bounded by some constant a > O. _Show that, for any v in M, the restriction of the mapping 7r'1I to IC isLipschitz continuous.
3) Let 11 .11 be any norm on M n and M be a part of M such that
sup I~I (E) < 00
J.LEM
where I~I = ~+ - ~- is the total variation of ~ (Rudin , 1975).Prove the existence of a constant C such that
sup IIIlKl(V) - IlK2(v)II::; C du(K1,K2) .
IIEM
4) Let (Kn)nEN be a sequence in Kconverging pointwise to K.Then
Vv EM,
3 Glivenko-Cantelli type theorem. From Guilbart (1977a).Let (E, T) be a measurable space and (Xi)i>1 be a sequence of in
dependent random variables defined on a probability space (0, A, P) ,taking their values in (E, T) and having the same probability distribution Px1 • We denote by
1 nP« = - ~8xn L...J I
i=1
the empirical measure associated with XI, . .. , X n •
Suppose that Udi>1 is a sequence of bounded measurable real functions defined on (E,T) such that
sup Ifi(x)1 < 00
'>1 E 1/21_ ,xE a i
236 RKHS IN PROBABILITY AND STATISTICS
where (ai) is a sequence of positive numbers with finite sum
Let K be defined on E X E by
00
K(x , y) = ~ f i(x)fi(Y)'i= 1
Finally suppose that the formula
< p"v >M= f K d(p,® v)
defines an inner product on the space M of signed measures on(E,I).
I) Prove that the mapping
is measurable.2) Let e be a positive number and define the following "neighbor
hood" of PXl in M:
v = {v: Ifs. du - f 9i dPX11 < s , Vi E {I, .. .,f} },where 91 , ... , ge are f bounded measurable real functions defined on(E ,I).Prove t hat c
P [\In 2: N, r; E V] 2: I - N '
where
c = SfMJ and M o = max sup 19i(X)I.€2 l~z~exEE
3) Prove that
whereIfi(x)1
M= sup ~i~l ,xEE ai
Measures and random measures
and no satisfies
4 From Guilbart (1977a).Let (O!i) be a sequence of positive numbers with finite sum
and let , for (x, y) in the set 1R+ of non negative real numbers,
00
K(x, y) = L O!iexp (-ix) exp (-iy).i= O
237
(4.16)
The set 1R+ is equipped with the Euclidean distance and its Borelo-algebra is denoted by 8;. Let M be the space of signed measureson (1R+, 8 lR+).
l)Prove that the mapping defined on M 2 by
(f-L, v) f-7< f-L, v>M= JK d(f-L 0 v)
is an inner product on M and that the topology associated with thisinner product coincides with the weak topology.
2)Prove the existence of a constant k > 0 such that
5 From Suquet (1994).Let E be a metric space and M be the space of signed measures on itsBorel o-algebra T. Let p be a positive measure on some measurablespace (U, U) and r a complex function defined on Ex U such that
~~~ Ilr(x, .)/b(U,p) < 00.
Define K on E x E by
K(x, y) = Jrex, u)r(y, u)dp(u).
1) Show that K is a bounded reprod ucing kernel.2) Denote by 1£ the Hilbert space of functions on E with re
producing kernel K. Prove that a complex function h defined on E
238 RKHS IN PROBABILITY AND STATISTICS
belongs to 11. if and only if there exists an element cp of L2(p) suchthat
h(x) = Jcp(u)r(x, u) dp(u). (4.17)
3) Let n be the closed subspace of L2(U,p) spanned by the functions {r(x, .) : x E E}. Show that there is a unique element ip of nsatisfying (4.17). Denote it by cp(l) and prove that the mapping
cp : 11. ---t nf f-----t ip(I)
is an isometry.4) Let j.L be an element of M. Show that the function r(.,u) is
j.L-integrable except possibly for a set EfJ. of u such that p(EfJ.) = O.In other words r(., u) is j.L-integrable p-almost everywhere.
5) Now suppose that for any j.L in M
(J r(x,.) dj.L(x) = 0 p-a. s') ==} (j.L = 0).
Show that the mapping defined on M 2 by
(IL, v) f-----t< IL, v >M= JK d(j.L 0 v)
is an inner product on M.
6 From Suquet (1994).Under the assumptions of Subsection 9.2 prove that
1) for any random measure j.L( .) the function
defines a real random variable whenever f is continuous on E.2) for any random measure j.L( .), its total variation, its positive
part and its negative part are also random measures.
7 The notation and the assumptions are those of Subsection 9.1.1) Let v belong to the set M of bounded signed measures on E.
Prove that 'l1K is v-integrable and that the set
{J K(., t) dv(t) : v E M}
Measures and random measures
is a dense subset of H.2) Prove that the image ImN of the Schwartz kernel
N : L2 (J.L) ----+ L2 (J.L)
9 f.---t JK(., t)g(t)dJ.L(t)
239
is dense in 'H:3) Prove that an element j of H belongs to ImN if and only if the
linear form If : k f.---t< k, j >1i is continuous on 1i for the topologyinduced by the topology of L 2 (J-L ).
8 The notation is the same as in Subsection 9.1 but stronger assumptions are made. E is supposed to be a compact topological space withBorel a-algebra T and the reproducing kernel K is supposed to becontinuous on E 2
• The measure J.L is supposed to have a support equalto E. Then the conditions in HI , H2 and H3 are clearly satisfied . AsK belongs to L2(J.L 0 J.L) the Schwartz kernel N (see Exercise 7) isa Hilbert-Schmidt operator. Therefore there exists an orthonormalbasis (hn)nEN of L2(J.L) such that each hn is an eigenfunction of Nassociated with an eigenvalue An and
00
Moreover the sequence (~bn)nENo where No = {nlAn -# O} is anorthonormal basis of 1i and
K(s, t) = L Anhn(s)hn(t) ,nENo
the last series converging in 1i
WEE K(., t) =LAo hn(t) hn)nEo
but also uniformly on E 2 by Mercer's theorem.1) Prove that an element f of 11 belongs to ImN if and only if
L < j,hn >~< 00.
nENo
2) Prove that the condition given above is equivalent to
'" < j, hn >;'2(1l)L..J A2 < 00.
nENo n
240 RKHS IN PROBABILITY AND STATISTICS
3) Let 8 1 (respectively 8 2) be the closed subspace of £2(J.l) spannedby (K(., S))sEE (respectively (hn)nENo)' Let 83 be the subspace of£2 (J.l) orthogonal to the null space of N. Prove that
81 = 82 = 83 •
9 Let K be a real reproducing kernel on a metric space E. Supposethat K is bounded, measurable and defines an inner product on theset M of signed measures on E. Prove that K has a unique supportequal to E (See Chapter 1, Subsection 4.3) for the definition of thesupport of a reproducing kernel).