Covariance identities and expansions via Stein’s method
Yvik Swan (joint with Marie Ernst and Gesine Reinert)
Symposium in Memory of Charles Stein
Charles Stein, 1920 – 2016A statistician of genius who was also a committed political activist
Although never very prolific as awriter(∗), something he attributed to acombination of laziness and perfection-ism, Professor Stein was known as “theEinstein of [Stanford’s] Statistics De-partment”.
His legacy in the field lives on in thenames of Stein’s lemma, Stein’s para-dox, and Stein’s method.
(∗) 39 publications on MathSciNet
2
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Stein’s method is a toolbox for estimating discrepancies between probabilitydistributions.
The starting point is the identity
E[Nf(N)] = σ2E[f ′(N)] for all f iff N ∼ ϕ(x) = (2πσ2)−1/2e−x2/2
Crucial are the following:
N
Stein equationsσ2f ′(x)− xf(x) = h(x)− E[h(N)]
N
Solutions
fh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)
In particular,ϕ′(x)
ϕ(x)= − x
σ2⇐⇒ − 1
ϕ(x)
∫ x
−∞uϕ(u) = σ2
N
Bounds on suph |fh|, suph |f ′h|, suph |f ′′h | (a.k.a. Stein factors)
N
Couplings exchangeable pairs, zero and size- bias, ...
3
Chen obtained
E[Zf(Z)] = E[λf(Z + 1)] for all f iff Z ∼ p(x) = e−λλx/x!
and the corresponding equation, bounds, etc.
The method has been extended in multiple directions and there now exists an entireecosystem of “Stein-[...]”:
N
operators,
N
equations,
N
factors,
N
couplings,
N
kernels,
N
...
There ought to be a mug.
4
Chen obtained
E[Zf(Z)] = E[λf(Z + 1)] for all f iff Z ∼ p(x) = e−λλx/x!
and the corresponding equation, bounds, etc.
The method has been extended in multiple directions and there now exists an entireecosystem of “Stein-[...]”:
N
operators,
N
equations,
N
factors,
N
couplings,
N
kernels,
N
...
There ought to be a mug.
4
Chen obtained
E[Zf(Z)] = E[λf(Z + 1)] for all f iff Z ∼ p(x) = e−λλx/x!
and the corresponding equation, bounds, etc.
The method has been extended in multiple directions and there now exists an entireecosystem of “Stein-[...]”:
N
operators,
N
equations,
N
factors,
N
couplings,
N
kernels,
N
...
There ought to be a mug.
4
The Chen-Stein mug
5
There are now Stein identities (and bounds), for
N
specific laws: Dickman, gamma, stable, beta, multinomial, multinormal, Wishart,multivariate gamma, ...
E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]
E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]
E[Xf(X)] = E[Xf ′′(X) + f ′(X)]
...
N
for families of laws: infinitely divisible, diffusions, Pearson, Ord, Exponential family,elliptical distributions
N
Much of this can be written in a coupling framework:
E[G(f(W ′)− f(W )
)] = E [(W − E[W ])f(W )]
with exchangeable pairs, zero bias, size-bias, X − P bias, ...
6
There are now Stein identities (and bounds), for
N
specific laws: Dickman, gamma, stable, beta, multinomial, multinormal, Wishart,multivariate gamma, ...
E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]
E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]
E[Xf(X)] = E[Xf ′′(X) + f ′(X)]
...
N
for families of laws: infinitely divisible, diffusions, Pearson, Ord, Exponential family,elliptical distributions
N
Much of this can be written in a coupling framework:
E[G(f(W ′)− f(W )
)] = E [(W − E[W ])f(W )]
with exchangeable pairs, zero bias, size-bias, X − P bias, ...
6
There are now Stein identities (and bounds), for
N
specific laws: Dickman, gamma, stable, beta, multinomial, multinormal, Wishart,multivariate gamma, ...
E[(α(1−X)− βX)f(X)] = E[X(1−X)f ′(X)]
E[θ(n−X)f(X)] = E[(1− θ)Xf(X − 1)]
E[Xf(X)] = E[Xf ′′(X) + f ′(X)]
...
N
for families of laws: infinitely divisible, diffusions, Pearson, Ord, Exponential family,elliptical distributions
N
Much of this can be written in a coupling framework:
E[G(f(W ′)− f(W )
)] = E [(W − E[W ])f(W )]
with exchangeable pairs, zero bias, size-bias, X − P bias, ...
6
All such equations are of the form
X ∼ µ if and only if E[Af(X)] = E[Bf(X?)] for all f ∈ F (1)
for A,B, X? and F well chosen.
The heuristic: under (1), for Y ∼ ν another measure,
S(µ, ν,F) = supf∈G|E[Af(Y )]− E[Bf(Y ?)]|
ought to capture the difference between µ and ν.
As we know, there are many settings where there exist choices of A,B, X? and F forwhich S(µ, ν,F) captures essential features of the dissimilarity.
7
All such equations are of the form
X ∼ µ if and only if E[Af(X)] = E[Bf(X?)] for all f ∈ F (1)
for A,B, X? and F well chosen.
The heuristic: under (1), for Y ∼ ν another measure,
S(µ, ν,F) = supf∈G|E[Af(Y )]− E[Bf(Y ?)]|
ought to capture the difference between µ and ν.
As we know, there are many settings where there exist choices of A,B, X? and F forwhich S(µ, ν,F) captures essential features of the dissimilarity.
7
All such equations are of the form
X ∼ µ if and only if E[Af(X)] = E[Bf(X?)] for all f ∈ F (1)
for A,B, X? and F well chosen.
The heuristic: under (1), for Y ∼ ν another measure,
S(µ, ν,F) = supf∈G|E[Af(Y )]− E[Bf(Y ?)]|
ought to capture the difference between µ and ν.
As we know, there are many settings where there exist choices of A,B, X? and F forwhich S(µ, ν,F) captures essential features of the dissimilarity.
7
1 The density approach
2 Stein identities
3 A first representation and first-order bounds
4 A second representation and higher-order expansions
5 A to do list
Stein’s operators for ϕ are
f 7→ Tϕf(x) =(f(x)ϕ(x))′
ϕ(x)= f ′(x)− xf(x)
h 7→ Lϕh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(N)])ϕ(u)du
(a) (b)
Figure: The function x 7→ LϕI[· ≤ z](x) and its first two derivatives for z = 0 (Figure (a))),z = 1 (Figure (b)).
8
It is not hard to extrapolate for general p density of X on R.
Stein’s operators are
f 7→ Tpf(x) =(f(x)p(x))′
ϕ(x)= f ′(x) +
p′(x)
p(x)f(x)
h 7→ Lϕh(x) =1
p(x)
∫ x
x0
(h(u)− E[h(X)])p(u)du
(a) (b)
Figure: The function x 7→ LpI[· ≤ z](x) and its first two derivatives for z = 0 (Figure (a))),z = 1 (Figure (b)) when X ∼ Beta(1/2, 1/2) and x0 = 0.
9
It is formally easy to “correct” these problems.
Let
∆`f(x) =f(x+ `)− f(x)
`=
∆+f(x) = f(x+ 1)− f(x) if ` = 1
∆−f(x) = f(x)− f(x− 1) if ` = −1
f ′(x) if ` = 0
Definition 1 (Density approach – differential operator)
The canonical `-Stein operator is the linear operator T `p : F(p)→ F (0)(p) defined by
T `p f(x) =∆`(fp)(x)
p(x)
on F`(p) all functions such that T `p f ∈ L1(p) and E[T `p f(X)
]= 0.
10
Example 1
N
If X ∼ N (0, σ2) then
T 0ϕ f(x) =
(f(x)ϕ(x))′
ϕ(x)= f ′(x)− x
σ2f(x)
N
If X ∼ Po(λ) then
T +λ f(x) =
∆+(f(x)p(x))
p(x)=
λ
x+ 1f(x+ 1)− f(x) with f(0) = 0
T −λ f(x) =∆−(f(x)p(x))
p(x)= f(x)− x
λf(x− 1)
N
If X ∼ Bin(n, θ) then
T +n,θ f(x) =
θ
1− θn− xx+ 1
f(x+ 1)− f(x) with f(0) = 0
T −n,θ f(x) = f(x)− 1− θθ
x
n− (x− 1)f(x− 1) with f(n) = 0.
11
Often, “good” operators are obtained through
f(x) c(x)f(x− `)
leading to
Af(x) = T `p(c(x)f(x− `)
)= (T `p c(x))f(x) + c(x)(∆−`f(x))
where A acts on a generic class (Goldstein Reinert JAP 2013, Dobler EJP 2015).
Example 2
N
If X ∼ N (0, σ2) then c(x) = σ2 and Af(x) = σ2f ′(x)− xf(x)
N
If X ∼ Po(λ) then
c(x) = x and A+f(x) = λf(x+ 1)− xf(x)
c(x) = λ and A−f(x) = λf(x)− xf(x− 1)
N
If X ∼ Bin(n, θ) then
c(x) = (1− θ)x and A+f(x) = θ(n− x)f(x+ 1)− (1− θ)xf(x)
c(x) = θ(n− x) and A−f(x) = θ(n− x)f(x)− (1− θ)xf(x− 1)
12
One-slide Stein’s density approach
Given X ∼ p:
N
choose c and solve
h(x)− E[h(X)] = (T `p c(x))fch(x) + c(x)(∆−`fch(x))
N
consider
E[h(W )]− E[h(X)] = E[(T `p c(W ))fch(W ) + c(W )(∆−`fch(W ))
]
N
given W ∼ q, note that
E[(T `q c(W ))fch(W ) + c(W )(∆−`fch(W ))] = 0
N
reap
|E[h(W )]− E[h(X)]| ≤ E[∣∣∣(T `p c(W ))− (T `q c(W ))
∣∣∣ |fch(X)|]
≤ suph∈H‖fch‖E
[∣∣∣T `p c(W )− T `q c(W )∣∣∣]
If there exists c such that |fch| is uniformly bounded then this is good.
13
Recall
|E[h(W )]− E[h(X)]| ≤ suph∈H‖fch‖E
[∣∣∣T `p c(W )− T `q c(W )∣∣∣]
Example 3 (` = 0, support R, mean 0)
Let H be the set of all Lipschitz-1 function and
Wass(X,W ) = suph∈H|Eh(W )− Eh(X)|
N
c = 1 and
Wass(X,W ) ≤ suph∈H‖f1h‖
√√√√E
[(p′(W )
p(W )− q′(W )
q(W )
)2]
which is the Fisher information distance when p(x) = ϕ(x).
N
one can also optimise in c to get
Wass(X,W ) ≤ infc∈F
suph∈H‖fch‖
√E[(T `p c(W )− T `q c(W )
)2]
14
Now focus on the other operator:
h 7→ Lϕh(x) =1
ϕ(x)
∫ x
−∞(h(u)− E[h(X)])ϕ(u)du
Let p have support [a, b]. Given h ∈ L1(p) we define
L0ph(x) =
1
p(x)
∫ x
a
(h(u)− E[h(X)])p(u)du =1
p(x)
∫ b
x
(E[h(X)]− h(u))p(u)du
L−p h(x) =1
p(x)
x∑j=a
(h(j)− E[h(X)])p(j) =1
p(x)
b∑j=x+1
(E[h(X)]− h(j))p(j)
L+p h(x) =
1
p(x)
x−1∑j=a
(h(j)− E[h(X)])p(j) =1
p(x)
b∑j=x
(E[h(X)]− h(j))p(j)
Definition 2 (Density approach – integral operator)
Operator h 7→ L`ph(x) = fh(x) is the canonical inverse Stein operator.
By construction
L`pT `p f(x) = f(x)
T `p L`ph(x) = h(x)− E[h(X)]
15
Definition 3 (Generalized Stein kernel)
The function τ `p =: −L`pId is the (`-)Stein kernel of p.
In particular if h(x) = Id(x) and ` = 0 then
−L0ph(x) = τ0p (x) =
1
p(x)
∫ ∞x
(u− E[X])p(u)du
is the classical Stein kernel of p (Stein, 1986).
Example 4
` = 0
N
If X ∼ N (0, σ2) then τ0p (x) = σ2
N
If X ∼ p(x) ∝ (1 + x2)−β then τ0p (x) = 1+x2
2(β−1)if β > 1
` = ±1
N
If X ∼ Po(λ) then τ+p (x) = λ and τ−p (x) = x.
N
If X ∼ Bin(n, θ) then τ+p (x) = θ(n− x) and τ−p (x) = (1− θ)x.
16
One-slide Stein’s density approach
Given X ∼ p:
N
choose η and solve
h(x)− E[h(X)] = η(x)fηh (x) + L`pη(x)(∆−`fηh (x))
N
consider
E[h(W )]− E[h(X)] = E[η(W )fηh (W ) + L`pη(W )(∆−`fηh (W ))
].
N
given W ∼ q, note that
E[η(W )fηh (W ) + L`pη(W )(∆−`fηh (W ))] = 0
N
reap
|E[h(W )]− E[h(X)]| ≤ E[∣∣∣(L`pη(W ))− (L`qη(W ))
∣∣∣ |∆−`fηh (X)|]
≤ suph∈H‖∆−`fηh‖E
[∣∣∣L`pη(W )− L`qη(W )∣∣∣]
If there exists η such that |∆−`fηh | is uniformly bounded then this is good.
17
Example 5
Let X ∼ p have finite mean. Let H be the set of all Lipschitz-1 function and
Wass(X,W ) = suph∈H|Eh(W )− Eh(X)|
then
N
η(x) = x and
Wass(X,W ) ≤ suph∈H‖∆−`f Id
h ‖√
E[(τp(W )− τq(W ))2
]which is the Stein discrepancy.
N
one can also optimise in η to get
Wass(X,W ) ≤ infη
suph∈H‖∆−`fηh‖
√E[(L`pη(W )− L`qη(W )
)2]
18
Versions of this general story are available in different recent papers.
For instance Ley, Reinert, and Swan (PS 2017, AOAP 2017) and Mijoule, Reinert, andSwan (arXiv 2018)
See also Reinert (IMS LN 2003), Stein et al. (IMS LN 2004), Chatterjee and Shao(AOAP 2011), Goldstein Reinert (JAP 2013), Dobler (EJP 2015), Arras and Houdre(Springer Briefs 2018, EJP 2019), Gaunt, Mackey and his team, Upadhye, Vydas,Cekanavicius (Bernoulli 2017), Fang, Shao, Xu (PTRF, 2019), ...
and many more.
19
1 The density approach
2 Stein identities
3 A first representation and first-order bounds
4 A second representation and higher-order expansions
5 A to do list
Under weak conditions on p, operators T `p and L`p automatically provide naturalidentities for the target.
Indeed, by construction for all f ∈ F(p)
Cov[T `p f(X), g(X)
]= −E
[f(X)∆−`g(X)
]for all g such that [some conditions depending on f ].
This leads to:
Lemma 1 (Covariance identity)
For all f, g we have
Cov[f(X), g(X)] = E[−L`pf(X)∆−`g(X)
]under explicit and non stringent conditions on f, g.
20
1 The density approach
2 Stein identities
3 A first representation and first-order bounds
4 A second representation and higher-order expansions
5 A to do list
Given ` ∈ {−1, 0, 1}, we introduce
χ`(x, y) = I[x ≤ y − I[` = 1]] =
I[x ≤ y] ` = 0,
I[x ≤ y] ` = −1,
I[x < y] ` = 1
Define
K`p(x, x
′) = E[χ`(X,x)χ`(X,x′)]− E[χ`(X,x)
]E[χ`(X,x′)
]
Properties:
N
K`p(x, x
′) is symmetric and positive.
N
The function x′ 7→ p?x(x′) = −K`p(x, x
′)/p(x)∆−`T `p 1(x) is a density for all x(Menz Otto 2013, AOP when ` = 0).
N
Often, it is also unimodal and K`p(x, x
′)/p(x) is bounded.
21
(a) (b)
(c) (d)
Figure: The functions x 7→ K`p(x, x
′)/p(x) for fixed x′ and p the standard normal distribution,
Figure (a); beta distribution with parameters 1.3 and 2.4, Figure (b); binomial distribution withparameters (50, 0.2), Figure (c); Poisson distribution with parameter 20, Figure (d).
22
Lemma 2 (Representation formula II (ERS19a))
For all f ∈ dom(∆−`) such that f ∈ L1(p) we have,
− L`pf(x) = E
[K`p(X,x)
p(X)p(x)∆−`f(X)
]. (2)
Formula (2) is already available when ` = 0, see (Saumard, arXiv 2019).
We deduce first order Hoeffding-type covariance identities
Cov[f(X), g(X)] = E
[∆−`f(X)
K`p(X,X
′)
p(X)p(X ′)∆−`g(X ′)
].
Theorem 1 (Weighted covariance inequality)
Fix h ∈ L1(p) a decreasing function. For all f, g
|Cov[f(X), g(X)]|
≤
√E[(∆−`f(X))2
−L`ph(X)
∆−`h(X)
]√E[(∆−`g(X))2
−L`ph(X)
∆−`h(X)
]with equality if and only if f, g ∝ h
23
Taking f = g we get the weighted Poincare inequality
Var[f(X)] ≤ E
[(∆−`f(X))2
−L`ph(X)
∆−`h(X)
]=: E
[(∆−`f(X))2Γ`1,ph(X)
]where h is any decreasing function in L2(p) and equality holds iff f(x) ∝ h(x).
Example 6
Let N ∼ ϕ and ` = 0. Then
N
h(x) = x and Γ01Id(x) = 1 and (Chernov bound): Var[f(X)] ≤ E [f ′(X)].
N
h(x) = xebx2/2 and Γ0
1h(x) = ((1− b)(1 + bx2))−1 when b < 1 and
(1− b)Var[f(X)] ≤ E[
(f ′(X))2
1 + bX2
]
24
The weights
Γ`1,ph(x) =−L`ph(x)
∆−`h(x), h↘
seem to be good.
Taking
N
h(x) = −x then Γ`1,ph = τ `p the Stein kernel (Cacoullos identity).
N
if h = T `p 1 is decreasing (e.g. p log-concave when ` = 0) then Γ`1h = 1 and we geta Brascamp-Lieb type inequality.
N
we easily get asymmetric Brascamp-Lieb as in Menz-Otto.
N
bounding Γ`1h(x) by its maximum we obtain that for any nice positive function σ2,
λ(p, σ2) := inff
{E[(∆−`f(X))2σ2(X)
]Var[f(X)]
}≥ suph∈IF
inf∆−`(T `p (σ2∆−`h))
∆−`h.
When ` = 0 this is the spectral gap estimate from Joulin and Bonnefont (ESAIM2016).
25
The weights
Γ`1,ph(x) =−L`ph(x)
∆−`h(x), h↘
seem to be good.
Taking
N
h(x) = −x then Γ`1,ph = τ `p the Stein kernel (Cacoullos identity).
N
if h = T `p 1 is decreasing (e.g. p log-concave when ` = 0) then Γ`1h = 1 and we geta Brascamp-Lieb type inequality.
N
we easily get asymmetric Brascamp-Lieb as in Menz-Otto.
N
bounding Γ`1h(x) by its maximum we obtain that for any nice positive function σ2,
λ(p, σ2) := inff
{E[(∆−`f(X))2σ2(X)
]Var[f(X)]
}≥ suph∈IF
inf∆−`(T `p (σ2∆−`h))
∆−`h.
When ` = 0 this is the spectral gap estimate from Joulin and Bonnefont (ESAIM2016).
25
The weights
Γ`1,ph(x) =−L`ph(x)
∆−`h(x), h↘
seem to be good.
Taking
N
h(x) = −x then Γ`1,ph = τ `p the Stein kernel (Cacoullos identity).
N
if h = T `p 1 is decreasing (e.g. p log-concave when ` = 0) then Γ`1h = 1 and we geta Brascamp-Lieb type inequality.
N
we easily get asymmetric Brascamp-Lieb as in Menz-Otto.
N
bounding Γ`1h(x) by its maximum we obtain that for any nice positive function σ2,
λ(p, σ2) := inff
{E[(∆−`f(X))2σ2(X)
]Var[f(X)]
}≥ suph∈IF
inf∆−`(T `p (σ2∆−`h))
∆−`h.
When ` = 0 this is the spectral gap estimate from Joulin and Bonnefont (ESAIM2016).
25
The weights
Γ`1,ph(x) =−L`ph(x)
∆−`h(x), h↘
seem to be good.
Taking
N
h(x) = −x then Γ`1,ph = τ `p the Stein kernel (Cacoullos identity).
N
if h = T `p 1 is decreasing (e.g. p log-concave when ` = 0) then Γ`1h = 1 and we geta Brascamp-Lieb type inequality.
N
we easily get asymmetric Brascamp-Lieb as in Menz-Otto.
N
bounding Γ`1h(x) by its maximum we obtain that for any nice positive function σ2,
λ(p, σ2) := inff
{E[(∆−`f(X))2σ2(X)
]Var[f(X)]
}≥ suph∈IF
inf∆−`(T `p (σ2∆−`h))
∆−`h.
When ` = 0 this is the spectral gap estimate from Joulin and Bonnefont (ESAIM2016).
25
1 The density approach
2 Stein identities
3 A first representation and first-order bounds
4 A second representation and higher-order expansions
5 A to do list
Lemma 3 (Representation formula I (ERS19a))
Let X,X1, X2 be independent copies of X ∼ p. Then, for all f ∈ L1(p) we have
− L`pf(x) =1
p(x)E[(f(X2)− f(X1))χ`(X1, x)−`(x,X2)
](3)
with χ`(x, y) = I[x ≤ y − I[` = 1]].
In particular
− L0pf(x) =
1
p(x)E [(f(X2)− f(X1))I[X1 ≤ x ≤ X2]]
− L+p f(x) =
1
p(x)E [(f(X2)− f(X1))I[X1 < x ≤ X2]]
− L−p f(x) =1
p(x)E [(f(X2)− f(X1))I[X1 ≤ x < X2]]
26
The formula
−L`pf(x) =1
p(x)E[(f(X2)− f(X1))χ`(X1, x)−`(x,X2)
]is useful.
Returning to the Stein equations
Af(x) = L`pη(x)∆−`f(x) + η(x)f(x) = h(x)− E[h(X)],
solutions are often of the form
fh(x) =L`ph(x)
L`pη(x)
for η fixed (typically η(x) = x).
We immediately obtain
fh(x) =E[(h(X2)− h(X1))χ`(X1, x)−`(x,X2)
]E [(η(X2)− η(X1))χ`(X1, x)−`(x,X2)]
yielding uniform and non-uniform bounds (also on ∆−`fh, see Mijoule (2019))!
27
Let h be monotone and ` = (`1, `2, . . .) be a sequence in {−1, 1} or {0}.
N
Starting with some functions f : R→ R, we recursively define
f0 = f, f1(x) =∆−`1f0(x)
∆−`1h(x), fi(x) =
∆−`ifi−1(x)
∆−`ih(x), i ≥ 2
In particular if h(x) = −x then fi(x) = ∆`1,`2,...,`ih(x).
N
For any sequence (xj)j≥1 we let Φ`0(x1, x2) = 1 and
Φ`n(x1, x3, . . . , x2n−1, x2n+1, x2n+2, x2n, . . . , x2)
=1∏2n+2
i=3 p(xi)χ`
2
(x2n+1, x2n+2)n∏i=1
χ`i(x2i−1, x2i+1)χ−`i(x2i+2, x2i).
For instance Φ`1(x1, x3, x4, x2) = I[x1≤x3≤x4≤x2]p(x3)p(x4)
28
Theorem 2 (Papathanasiou type expansion ERS19b)
For all f, g
Cov [f(X), g(X)]
=
n∑k=1
(−1)k−1E[∆−`kfk−1(X)∆−`kgk−1(X)
Γ`k(h)(X)
∆−`h(X)
]+ (−1)nR`n(h)
where
Γ`kh(x) = E
[(hk(X2k)− hk(X2k−1))
k−1∏i=1
∆−`ihi(X2i+1)∆−`ihi(X2i+2)
Φ`kp (x2k−1, x, x2k)Φ`k−1(X1, . . . , X2k−1, X2k, . . . , X2)
]
where R`n(h) ≥ 0 is explicit.
29
Corollary 1 (Weighted Poincare upper and corresponding lower bounds)
Under natural assumptions
E
[∆−`1f(X)∆−`1g(X)
Γ`11 h1(X)
∆−`1h1(X)
]
− E
[∆−`2
(∆−`1f(X)
∆−`1h(X)
)∆−`2
(∆−`1f(X)
∆−`1h(X)
)Γ`1,`22 (h1, h2)(X)
∆−`2h2(X)
]
≤ Cov[f(X), g(X)] ≤ E
[∆−`1f(X)∆−`1g(X)
Γ`11 h1(X)
∆−`1h1(X)
].
Clearly Γ`1h(x) is the same as the previous one. What about the other ones?
30
N
Fix ` = (0, 0, . . .) and let h be non-decreasing. Then
Γ0kh(x) =
1
k!(k − 1)!
1
p(x)E[(h(X2)− h(x)
)k−1(h(x)− h(X1)k−1
(h(X2)− h(X1)
)I[X1 ≤ x ≤ X2]
].
If h(x) = x we recover the classical result by Papathanasiou (SPL 1988).
N
The discrete case is explicit as well but quite ugly; we have found no general neatexpression except when h(x) = x in which case if ` ∈ {−1, 1}∞ then
Γ`k(x) =1
k!(k − 1)!p(x)E[(X2 − x− bk + 1)[k−1](x−X1 − ak + 1)[k−1]
(X2 −X1)I[X1 + ak ≤ x ≤ X2 − bk]
]where f [k−1] is the discrete factorial and ak = k − bk counts the number of +1.
Taking `i = −1 we recover the result from Afendras et al (Sankhya 2007).
31
N
Fix ` = (0, 0, . . .) and let h be non-decreasing. Then
Γ0kh(x) =
1
k!(k − 1)!
1
p(x)E[(h(X2)− h(x)
)k−1(h(x)− h(X1)k−1
(h(X2)− h(X1)
)I[X1 ≤ x ≤ X2]
].
If h(x) = x we recover the classical result by Papathanasiou (SPL 1988).
N
The discrete case is explicit as well but quite ugly; we have found no general neatexpression except when h(x) = x in which case if ` ∈ {−1, 1}∞ then
Γ`k(x) =1
k!(k − 1)!p(x)E[(X2 − x− bk + 1)[k−1](x−X1 − ak + 1)[k−1]
(X2 −X1)I[X1 + ak ≤ x ≤ X2 − bk]
]where f [k−1] is the discrete factorial and ak = k − bk counts the number of +1.
Taking `i = −1 we recover the result from Afendras et al (Sankhya 2007).
31
Under various assumptions these weights become even more explicit:
N
If X ∼ p is integrated Pearson distributed with Stein kernel τp(x) = δx2 + βx+ γthen
Γ0k(x) =
τp(x)k
k!∏k−1j=0 (1− jδ)
This is already known Johnson (SD 1993).
N
If X ∼ p is cumulative Ord distributed with τ−p (x) = δx2 + βx+ γ (and henceτ+p (x) = x(δx+ β + 1)), then
Γ`k(x) =1
k!∏k−1j=0 (1− jδ)
(τ+p (x)
)[ak]
(τ−p (x)
)[bk]
This is already known in the case `i = −1 Afendras et al (Sankhya 2007).
32
Neat examples to conclude
N
If X ∼ Laplace(0, 1) (i.e. p(x) = e−|x|/2 on R) then
Γ0k(x) =
k∑j=0
|x|j
j!.
N
If X ∼ Rayleigh(0, 1) (i.e. p(x) = xe−x2/2 on R+) then τ0p (x) does not take on an
agreeable form. Nevertheless the choice h(x) = x2 leads to
Γ0kh(x)
h′(x)=
2k−2
k!x2(k−1).
N
If X ∼ Cauchy(0, 1), taking h(x) = arctan(x) leads to
Γ0k(x)
h′(x)=
1
4k(k + 1)!(k)!(1 + x2)2
(π2 − 4 arctan(x)2
)k.
33
1 The density approach
2 Stein identities
3 A first representation and first-order bounds
4 A second representation and higher-order expansions
5 A to do list
Some questions:
N
Connect with the works of Arras and Houdre for SD, Fathi’s higher order centeredStein kernels.
N
Other “iterated operators”.
N
Multivariate representation for Γ and/or even τ , L`p.
N
What happens for distributions with higher order operators?
N
Polynomials, Sturm-Liouville, weights in Poincare, spectral gap, ...
34
G. Afendras, N. Papadatos, and V. Papathanasiou.
The discrete Mohr and Noll inequality with applications to variance bounds.Sankhya, 69(2):162–189, 2007.
T. Cacoullos.
On upper and lower bounds for the variance of a function of a random variable.The Annals of Probability, 10(3):799–809, 1982.
L. H. Chen.
An inequality for the multivariate normal distribution.Journal of Multivariate Analysis, 12(2):306–315, 1982.
L. H. Chen.
Poincare-type inequalities via stochastic integrals.Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 69(2):251–277, 1985.
R. W. Johnson.
A note on variance bounds for a function of a Pearson variate.Statist. Decisions, 11(3):273–278, 1993.
M. Ledoux.
L’algebre de Lie des gradients iteres d’un generateur markovien—developpements de moyennes et entropies.Ann. Sci. Ecole Norm. Sup., 28(4):435–460, 1995.
V. Papathanasiou.
Variance bounds by a generalization of the Cauchy-Schwarz inequality.Statistics & Probability Letters, 7(1):29–33, 1988.
35