Covariance identities and expansions via Stein s method · 2019. 6. 28. · Charles Stein, 1920 {...

Covariance identities and expansions via Stein’s method

Yvik Swan (joint with Marie Ernst and Gesine Reinert)

Symposium in Memory of Charles Stein

Charles Stein, 1920 – 2016A statistician of genius who was also a committed political activist

Although never very prolific as awriter(∗), something he attributed to acombination of laziness and perfection-ism, Professor Stein was known as “theEinstein of [Stanford’s] Statistics De-partment”.

His legacy in the field lives on in thenames of Stein’s lemma, Stein’s para-dox, and Stein’s method.

(∗) 39 publications on MathSciNet

2

https://www.timeshighereducation.com/people/obituary-charles-stein-1920-2016

Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.

The starting point is the identity

E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2

Crucial are the following:

N

Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]

N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)

In particular,ϕ′(x)

ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N

Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)

N

Couplings exchangeable pairs, zero and size- bias, ...

3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3





N


N

Solutions

fh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)


ϕ(x)= − x

σ2⇐⇒ − 1

ϕ(x)

∫ x

−∞uϕ(u) = σ2

N


N


3

Chen obtained

E[Zf(Z)] = E[λf(Z + 1)] for all f iff Z ∼ p(x) = e−λλx/x!

and the corresponding equation, bounds, etc.

The method has been extended in multiple directions and there now exists an entireecosystem of “Stein-[...]”:

N

operators,

N

equations,

N

factors,

N

couplings,

N

kernels,

N

...

There ought to be a mug.

4

Chen obtained




N

operators,

N

equations,

N

factors,

N

couplings,

N

kernels,

N

...


4

Chen obtained




N

operators,

N

equations,

N

factors,

N

couplings,

N

kernels,

N

...


4

The Chen-Stein mug

5

There are now Stein identities (and bounds), for

N

specific laws: Dickman, gamma, stable, beta, multinomial, multinormal, Wishart,multivariate gamma, ...

E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]

E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]

E[Xf(X)] = E[Xf ′′(X) + f ′(X)]

...

N

for families of laws: infinitely divisible, diffusions, Pearson, Ord, Exponential family,elliptical distributions

N

Much of this can be written in a coupling framework:

E[G(f(W ′)− f(W )

)] = E [(W − E[W ])f(W )]

with exchangeable pairs, zero bias, size-bias, X − P bias, ...

6


N


E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]

E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]

E[Xf(X)] = E[Xf ′′(X) + f ′(X)]

...

N


N


E[G(f(W ′)− f(W )

)] = E [(W − E[W ])f(W )]


6


N


E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]

E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]

E[Xf(X)] = E[Xf ′′(X) + f ′(X)]

...

N


N


E[G(f(W ′)− f(W )

)] = E [(W − E[W ])f(W )]


6

All such equations are of the form

X ∼ µ if and only if E[Af(X)] = E[Bf(X?)] for all f ∈ F (1)

for A,B, X? and F well chosen.

The heuristic: under (1), for Y ∼ ν another measure,

S(µ, ν,F) = supf∈G|E[Af(Y )]− E[Bf(Y ?)]|

ought to capture the difference between µ and ν.

As we know, there are many settings where there exist choices of A,B, X? and F forwhich S(µ, ν,F) captures essential features of the dissimilarity.

7








7








7

1 The density approach

2 Stein identities

3 A first representation and first-order bounds

4 A second representation and higher-order expansions

5 A to do list

Stein’s operators for ϕ are

f 7→ Tϕf(x) =(f(x)ϕ(x))′

ϕ(x)= f ′(x)− xf(x)

h 7→ Lϕh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(N)])ϕ(u)du

(a) (b)

Figure: The function x 7→ LϕI[· ≤ z](x) and its first two derivatives for z = 0 (Figure (a))),z = 1 (Figure (b)).

8

It is not hard to extrapolate for general p density of X on R.

Stein’s operators are

f 7→ Tpf(x) =(f(x)p(x))′

ϕ(x)= f ′(x) +

p′(x)

p(x)f(x)

h 7→ Lϕh(x) =1

p(x)

∫ x

x0

(h(u)− E[h(X)])p(u)du

(a) (b)

Figure: The function x 7→ LpI[· ≤ z](x) and its first two derivatives for z = 0 (Figure (a))),z = 1 (Figure (b)) when X ∼ Beta(1/2, 1/2) and x0 = 0.

9

It is formally easy to “correct” these problems.

Let

∆`f(x) =f(x+ `)− f(x)

`=

∆+f(x) = f(x+ 1)− f(x) if ` = 1

∆−f(x) = f(x)− f(x− 1) if ` = −1

f ′(x) if ` = 0

Definition 1 (Density approach – differential operator)

The canonical `-Stein operator is the linear operator T `p : F(p)→ F (0)(p) defined by

T `p f(x) =∆`(fp)(x)

p(x)

on F`(p) all functions such that T `p f ∈ L1(p) and E[T `p f(X)

]= 0.

10

Example 1

N

If X ∼ N (0, σ2) then

T 0ϕ f(x) =

(f(x)ϕ(x))′

ϕ(x)= f ′(x)− x

σ2f(x)

N

If X ∼ Po(λ) then

T +λ f(x) =

∆+(f(x)p(x))

p(x)=

λ

x+ 1f(x+ 1)− f(x) with f(0) = 0

T −λ f(x) =∆−(f(x)p(x))

p(x)= f(x)− x

λf(x− 1)

N

If X ∼ Bin(n, θ) then

T +n,θ f(x) =

θ

1− θn− xx+ 1

f(x+ 1)− f(x) with f(0) = 0

T −n,θ f(x) = f(x)− 1− θθ

x

n− (x− 1)f(x− 1) with f(n) = 0.

11

Often, “good” operators are obtained through

f(x) c(x)f(x− `)

leading to

Af(x) = T `p(c(x)f(x− `)

)= (T `p c(x))f(x) + c(x)(∆−`f(x))

where A acts on a generic class (Goldstein Reinert JAP 2013, Dobler EJP 2015).

Example 2

N

If X ∼ N (0, σ2) then c(x) = σ2 and Af(x) = σ2f ′(x)− xf(x)

N

If X ∼ Po(λ) then

c(x) = x and A+f(x) = λf(x+ 1)− xf(x)

c(x) = λ and A−f(x) = λf(x)− xf(x− 1)

N

If X ∼ Bin(n, θ) then

c(x) = (1− θ)x and A+f(x) = θ(n− x)f(x+ 1)− (1− θ)xf(x)

c(x) = θ(n− x) and A−f(x) = θ(n− x)f(x)− (1− θ)xf(x− 1)

12

One-slide Stein’s density approach

Given X ∼ p:

N

choose c and solve

h(x)− E[h(X)] = (T `p c(x))fch(x) + c(x)(∆−`fch(x))

N

consider

E[h(W )]− E[h(X)] = E[(T `p c(W ))fch(W ) + c(W )(∆−`fch(W ))

]

N

given W ∼ q, note that

E[(T `q c(W ))fch(W ) + c(W )(∆−`fch(W ))] = 0

N

reap

|E[h(W )]− E[h(X)]| ≤ E[∣∣∣(T `p c(W ))− (T `q c(W ))

∣∣∣ |fch(X)|]

≤ suph∈H‖fch‖E

[∣∣∣T `p c(W )− T `q c(W )∣∣∣]

If there exists c such that |fch| is uniformly bounded then this is good.

13

Recall

|E[h(W )]− E[h(X)]| ≤ suph∈H‖fch‖E

[∣∣∣T `p c(W )− T `q c(W )∣∣∣]

Example 3 (` = 0, support R, mean 0)

Let H be the set of all Lipschitz-1 function and

Wass(X,W ) = suph∈H|Eh(W )− Eh(X)|

N

c = 1 and

Wass(X,W ) ≤ suph∈H‖f1h‖

√√√√E

[(p′(W )

p(W )− q′(W )

q(W )

)2]

which is the Fisher information distance when p(x) = ϕ(x).

N

one can also optimise in c to get

Wass(X,W ) ≤ infc∈F

suph∈H‖fch‖

√E[(T `p c(W )− T `q c(W )

)2]

14

Now focus on the other operator:

h 7→ Lϕh(x) =1

ϕ(x)

∫ x

−∞(h(u)− E[h(X)])ϕ(u)du

Let p have support [a, b]. Given h ∈ L1(p) we define

L0ph(x) =

1

p(x)

∫ x

a

(h(u)− E[h(X)])p(u)du =1

p(x)

∫ b

x

(E[h(X)]− h(u))p(u)du

L−p h(x) =1

p(x)

x∑j=a

(h(j)− E[h(X)])p(j) =1

p(x)

b∑j=x+1

(E[h(X)]− h(j))p(j)

L+p h(x) =

1

p(x)

x−1∑j=a

(h(j)− E[h(X)])p(j) =1

p(x)

b∑j=x

(E[h(X)]− h(j))p(j)

Definition 2 (Density approach – integral operator)

Operator h 7→ L`ph(x) = fh(x) is the canonical inverse Stein operator.

By construction

L`pT `p f(x) = f(x)

T `p L`ph(x) = h(x)− E[h(X)]

15

Definition 3 (Generalized Stein kernel)

The function τ `p =: −L`pId is the (`-)Stein kernel of p.

In particular if h(x) = Id(x) and ` = 0 then

−L0ph(x) = τ0p (x) =

1

p(x)

∫ ∞x

(u− E[X])p(u)du

is the classical Stein kernel of p (Stein, 1986).

Example 4

` = 0

N

If X ∼ N (0, σ2) then τ0p (x) = σ2

N

If X ∼ p(x) ∝ (1 + x2)−β then τ0p (x) = 1+x2

2(β−1)if β > 1

` = ±1

N

If X ∼ Po(λ) then τ+p (x) = λ and τ−p (x) = x.

N

If X ∼ Bin(n, θ) then τ+p (x) = θ(n− x) and τ−p (x) = (1− θ)x.

16

One-slide Stein’s density approach

Given X ∼ p:

N

choose η and solve

h(x)− E[h(X)] = η(x)fηh (x) + L`pη(x)(∆−`fηh (x))

N

consider

E[h(W )]− E[h(X)] = E[η(W )fηh (W ) + L`pη(W )(∆−`fηh (W ))

].

N

given W ∼ q, note that

E[η(W )fηh (W ) + L`pη(W )(∆−`fηh (W ))] = 0

N

reap

|E[h(W )]− E[h(X)]| ≤ E[∣∣∣(L`pη(W ))− (L`qη(W ))

∣∣∣ |∆−`fηh (X)|]

≤ suph∈H‖∆−`fηh‖E

[∣∣∣L`pη(W )− L`qη(W )∣∣∣]

If there exists η such that |∆−`fηh | is uniformly bounded then this is good.

17

Example 5

Let X ∼ p have finite mean. Let H be the set of all Lipschitz-1 function and

Wass(X,W ) = suph∈H|Eh(W )− Eh(X)|

then

N

η(x) = x and

Wass(X,W ) ≤ suph∈H‖∆−`f Id

h ‖√

E[(τp(W )− τq(W ))2

]which is the Stein discrepancy.

N

one can also optimise in η to get

Wass(X,W ) ≤ infη

suph∈H‖∆−`fηh‖

√E[(L`pη(W )− L`qη(W )

)2]

18

Versions of this general story are available in different recent papers.

For instance Ley, Reinert, and Swan (PS 2017, AOAP 2017) and Mijoule, Reinert, andSwan (arXiv 2018)

See also Reinert (IMS LN 2003), Stein et al. (IMS LN 2004), Chatterjee and Shao(AOAP 2011), Goldstein Reinert (JAP 2013), Dobler (EJP 2015), Arras and Houdre(Springer Briefs 2018, EJP 2019), Gaunt, Mackey and his team, Upadhye, Vydas,Cekanavicius (Bernoulli 2017), Fang, Shao, Xu (PTRF, 2019), ...

and many more.

19


2 Stein identities



5 A to do list

Under weak conditions on p, operators T `p and L`p automatically provide naturalidentities for the target.

Indeed, by construction for all f ∈ F(p)

Cov[T `p f(X), g(X)

]= −E

[f(X)∆−`g(X)

]for all g such that [some conditions depending on f ].

This leads to:

Lemma 1 (Covariance identity)

For all f, g we have

Cov[f(X), g(X)] = E[−L`pf(X)∆−`g(X)

]under explicit and non stringent conditions on f, g.

20


2 Stein identities



5 A to do list

Given ` ∈ {−1, 0, 1}, we introduce

χ`(x, y) = I[x ≤ y − I[` = 1]] =

I[x ≤ y] ` = 0,

I[x ≤ y] ` = −1,

I[x < y] ` = 1

Define

K`p(x, x

′) = E[χ`(X,x)χ`(X,x′)]− E[χ`(X,x)

]E[χ`(X,x′)

]

Properties:

N

K`p(x, x

′) is symmetric and positive.

N

The function x′ 7→ p?x(x′) = −K`p(x, x

′)/p(x)∆−`T `p 1(x) is a density for all x(Menz Otto 2013, AOP when ` = 0).

N

Often, it is also unimodal and K`p(x, x

′)/p(x) is bounded.

21

(a) (b)

(c) (d)

Figure: The functions x 7→ K`p(x, x

′)/p(x) for fixed x′ and p the standard normal distribution,

Figure (a); beta distribution with parameters 1.3 and 2.4, Figure (b); binomial distribution withparameters (50, 0.2), Figure (c); Poisson distribution with parameter 20, Figure (d).

22

Lemma 2 (Representation formula II (ERS19a))

For all f ∈ dom(∆−`) such that f ∈ L1(p) we have,

− L`pf(x) = E

[K`p(X,x)

p(X)p(x)∆−`f(X)

]. (2)

Formula (2) is already available when ` = 0, see (Saumard, arXiv 2019).

We deduce first order Hoeffding-type covariance identities

Cov[f(X), g(X)] = E

[∆−`f(X)

K`p(X,X

′)

p(X)p(X ′)∆−`g(X ′)

].

Theorem 1 (Weighted covariance inequality)

Fix h ∈ L1(p) a decreasing function. For all f, g

|Cov[f(X), g(X)]|

≤

√E[(∆−`f(X))2

−L`ph(X)

∆−`h(X)

]√E[(∆−`g(X))2

−L`ph(X)

∆−`h(X)

]with equality if and only if f, g ∝ h

23

Taking f = g we get the weighted Poincare inequality

Var[f(X)] ≤ E

[(∆−`f(X))2

−L`ph(X)

∆−`h(X)

]=: E

[(∆−`f(X))2Γ`1,ph(X)

]where h is any decreasing function in L2(p) and equality holds iff f(x) ∝ h(x).

Example 6

Let N ∼ ϕ and ` = 0. Then

N

h(x) = x and Γ01Id(x) = 1 and (Chernov bound): Var[f(X)] ≤ E [f ′(X)].

N

h(x) = xebx2/2 and Γ0

1h(x) = ((1− b)(1 + bx2))−1 when b < 1 and

(1− b)Var[f(X)] ≤ E[

(f ′(X))2

1 + bX2

]

24

The weights

Γ`1,ph(x) =−L`ph(x)

∆−`h(x), h↘

seem to be good.

Taking

N

h(x) = −x then Γ`1,ph = τ `p the Stein kernel (Cacoullos identity).

N

if h = T `p 1 is decreasing (e.g. p log-concave when ` = 0) then Γ`1h = 1 and we geta Brascamp-Lieb type inequality.

N

we easily get asymmetric Brascamp-Lieb as in Menz-Otto.

N

bounding Γ`1h(x) by its maximum we obtain that for any nice positive function σ2,

λ(p, σ2) := inff

{E[(∆−`f(X))2σ2(X)

]Var[f(X)]

}≥ suph∈IF

inf∆−`(T `p (σ2∆−`h))

∆−`h.

When ` = 0 this is the spectral gap estimate from Joulin and Bonnefont (ESAIM2016).

25

The weights


∆−`h(x), h↘

seem to be good.

Taking

N


N


N


N


λ(p, σ2) := inff

{E[(∆−`f(X))2σ2(X)

]Var[f(X)]

}≥ suph∈IF

inf∆−`(T `p (σ2∆−`h))

∆−`h.


25

The weights


∆−`h(x), h↘

seem to be good.

Taking

N


N


N


N


λ(p, σ2) := inff

{E[(∆−`f(X))2σ2(X)

]Var[f(X)]

}≥ suph∈IF

inf∆−`(T `p (σ2∆−`h))

∆−`h.


25

The weights


∆−`h(x), h↘

seem to be good.

Taking

N


N


N


N


λ(p, σ2) := inff

{E[(∆−`f(X))2σ2(X)

]Var[f(X)]

}≥ suph∈IF

inf∆−`(T `p (σ2∆−`h))

∆−`h.


25


2 Stein identities



5 A to do list

Lemma 3 (Representation formula I (ERS19a))

Let X,X1, X2 be independent copies of X ∼ p. Then, for all f ∈ L1(p) we have

− L`pf(x) =1

p(x)E[(f(X2)− f(X1))χ`(X1, x)−`(x,X2)

](3)

with χ`(x, y) = I[x ≤ y − I[` = 1]].

In particular

− L0pf(x) =

1

p(x)E [(f(X2)− f(X1))I[X1 ≤ x ≤ X2]]

− L+p f(x) =

1

p(x)E [(f(X2)− f(X1))I[X1 < x ≤ X2]]

− L−p f(x) =1

p(x)E [(f(X2)− f(X1))I[X1 ≤ x < X2]]

26

The formula

−L`pf(x) =1

p(x)E[(f(X2)− f(X1))χ`(X1, x)−`(x,X2)

]is useful.

Returning to the Stein equations

Af(x) = L`pη(x)∆−`f(x) + η(x)f(x) = h(x)− E[h(X)],

solutions are often of the form

fh(x) =L`ph(x)

L`pη(x)

for η fixed (typically η(x) = x).

We immediately obtain

fh(x) =E[(h(X2)− h(X1))χ`(X1, x)−`(x,X2)

]E [(η(X2)− η(X1))χ`(X1, x)−`(x,X2)]

yielding uniform and non-uniform bounds (also on ∆−`fh, see Mijoule (2019))!

27

Let h be monotone and ` = (`1, `2, . . .) be a sequence in {−1, 1} or {0}.

N

Starting with some functions f : R→ R, we recursively define

f0 = f, f1(x) =∆−`1f0(x)

∆−`1h(x), fi(x) =

∆−ìfi−1(x)

∆−ìh(x), i ≥ 2

In particular if h(x) = −x then fi(x) = ∆`1,`2,...,ìh(x).

N

For any sequence (xj)j≥1 we let Φ`0(x1, x2) = 1 and

Φ`n(x1, x3, . . . , x2n−1, x2n+1, x2n+2, x2n, . . . , x2)

=1∏2n+2

i=3 p(xi)χ`

2

(x2n+1, x2n+2)n∏i=1

χì(x2i−1, x2i+1)χ−ì(x2i+2, x2i).

For instance Φ`1(x1, x3, x4, x2) = I[x1≤x3≤x4≤x2]p(x3)p(x4)

28

Theorem 2 (Papathanasiou type expansion ERS19b)

For all f, g

Cov [f(X), g(X)]

=

n∑k=1

(−1)k−1E[∆−`kfk−1(X)∆−`kgk−1(X)

Γ`k(h)(X)

∆−`h(X)

]+ (−1)nR`n(h)

where

Γ`kh(x) = E

[(hk(X2k)− hk(X2k−1))

k−1∏i=1

∆−ìhi(X2i+1)∆−ìhi(X2i+2)

Φ`kp (x2k−1, x, x2k)Φ`k−1(X1, . . . , X2k−1, X2k, . . . , X2)

]

where R`n(h) ≥ 0 is explicit.

29

Corollary 1 (Weighted Poincare upper and corresponding lower bounds)

Under natural assumptions

E

[∆−`1f(X)∆−`1g(X)

Γ`11 h1(X)

∆−`1h1(X)

]

− E

[∆−`2

(∆−`1f(X)

∆−`1h(X)

)∆−`2

(∆−`1f(X)

∆−`1h(X)

)Γ`1,`22 (h1, h2)(X)

∆−`2h2(X)

]

≤ Cov[f(X), g(X)] ≤ E

[∆−`1f(X)∆−`1g(X)

Γ`11 h1(X)

∆−`1h1(X)

].

Clearly Γ`1h(x) is the same as the previous one. What about the other ones?

30

N

Fix ` = (0, 0, . . .) and let h be non-decreasing. Then

Γ0kh(x) =

1

k!(k − 1)!

1

p(x)E[(h(X2)− h(x)

)k−1(h(x)− h(X1)k−1

(h(X2)− h(X1)

)I[X1 ≤ x ≤ X2]

].

If h(x) = x we recover the classical result by Papathanasiou (SPL 1988).

N

The discrete case is explicit as well but quite ugly; we have found no general neatexpression except when h(x) = x in which case if ` ∈ {−1, 1}∞ then

Γ`k(x) =1

k!(k − 1)!p(x)E[(X2 − x− bk + 1)[k−1](x−X1 − ak + 1)[k−1]

(X2 −X1)I[X1 + ak ≤ x ≤ X2 − bk]

]where f [k−1] is the discrete factorial and ak = k − bk counts the number of +1.

Taking ì = −1 we recover the result from Afendras et al (Sankhya 2007).

31

N

Fix ` = (0, 0, . . .) and let h be non-decreasing. Then

Γ0kh(x) =

1

k!(k − 1)!

1

p(x)E[(h(X2)− h(x)

)k−1(h(x)− h(X1)k−1

(h(X2)− h(X1)

)I[X1 ≤ x ≤ X2]

].

If h(x) = x we recover the classical result by Papathanasiou (SPL 1988).

N

The discrete case is explicit as well but quite ugly; we have found no general neatexpression except when h(x) = x in which case if ` ∈ {−1, 1}∞ then

Γ`k(x) =1

k!(k − 1)!p(x)E[(X2 − x− bk + 1)[k−1](x−X1 − ak + 1)[k−1]

(X2 −X1)I[X1 + ak ≤ x ≤ X2 − bk]

]where f [k−1] is the discrete factorial and ak = k − bk counts the number of +1.

Taking ì = −1 we recover the result from Afendras et al (Sankhya 2007).

31

Under various assumptions these weights become even more explicit:

N

If X ∼ p is integrated Pearson distributed with Stein kernel τp(x) = δx2 + βx+ γthen

Γ0k(x) =

τp(x)k

k!∏k−1j=0 (1− jδ)

This is already known Johnson (SD 1993).

N

If X ∼ p is cumulative Ord distributed with τ−p (x) = δx2 + βx+ γ (and henceτ+p (x) = x(δx+ β + 1)), then

Γ`k(x) =1

k!∏k−1j=0 (1− jδ)

(τ+p (x)

)[ak]

(τ−p (x)

)[bk]

This is already known in the case ì = −1 Afendras et al (Sankhya 2007).

32

Neat examples to conclude

N

If X ∼ Laplace(0, 1) (i.e. p(x) = e−|x|/2 on R) then

Γ0k(x) =

k∑j=0

|x|j

j!.

N

If X ∼ Rayleigh(0, 1) (i.e. p(x) = xe−x2/2 on R+) then τ0p (x) does not take on an

agreeable form. Nevertheless the choice h(x) = x2 leads to

Γ0kh(x)

h′(x)=

2k−2

k!x2(k−1).

N

If X ∼ Cauchy(0, 1), taking h(x) = arctan(x) leads to

Γ0k(x)

h′(x)=

1

4k(k + 1)!(k)!(1 + x2)2

(π2 − 4 arctan(x)2

)k.

33


2 Stein identities



5 A to do list

Some questions:

N

Connect with the works of Arras and Houdre for SD, Fathi’s higher order centeredStein kernels.

N

Other “iterated operators”.

N

Multivariate representation for Γ and/or even τ , L`p.

N

What happens for distributions with higher order operators?

N

Polynomials, Sturm-Liouville, weights in Poincare, spectral gap, ...

34

G. Afendras, N. Papadatos, and V. Papathanasiou.

The discrete Mohr and Noll inequality with applications to variance bounds.Sankhya, 69(2):162–189, 2007.

T. Cacoullos.

On upper and lower bounds for the variance of a function of a random variable.The Annals of Probability, 10(3):799–809, 1982.

L. H. Chen.

An inequality for the multivariate normal distribution.Journal of Multivariate Analysis, 12(2):306–315, 1982.

L. H. Chen.

Poincare-type inequalities via stochastic integrals.Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 69(2):251–277, 1985.

R. W. Johnson.

A note on variance bounds for a function of a Pearson variate.Statist. Decisions, 11(3):273–278, 1993.

M. Ledoux.

L’algebre de Lie des gradients iteres d’un generateur markovien—developpements de moyennes et entropies.Ann. Sci. Ecole Norm. Sup., 28(4):435–460, 1995.

V. Papathanasiou.

Variance bounds by a generalization of the Cauchy-Schwarz inequality.Statistics & Probability Letters, 7(1):29–33, 1988.

35

Date post:	02-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Covariance identities and expansions via Stein s method · 2019. 6. 28. · Charles Stein, 1920 {...

Documents