Optimal nonparametric inference for discretelyobserved compound Poisson processes
Alberto J. Coca *Statslab, University of Cambridge
IMA Conference on Inverse Problems from Theory to Applicationat University of Cambridge, UK
19th September 2017
*Supported by DPMMS’ 2016 EPSRC Doctoral Prize andH2020 ERC grant ”UQMSI” with project ID 647812
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Outline
1 Introduction
2 Compound Poisson processes
3 Inference for discretely observed compound Poisson processes
4 References
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
(Functional/nonparametric) inverse problems
General idea: to characterise the system/underlying causesgiving rise to a set of indirect measurements/observations.Examples: recovery of potentials and initial conditions fromsolutions to PDEs, of images from MRI measurements and ofthe distribution of a random variable corrupted by noise.General form: g = G(f ), where g is an ‘observed’ function, Gis an operator and f is the objective ‘unobserved’ function.Difficulties: G non-invertible and degree of ill-posedness, Gnon-linear, etc. Linear problems better understood (e.g. SVDof G) but lack of general techniques for non-linear problems.General objective: to develop general and optimalmethodology to solve (linear or non-linear) statistical inverseproblems and to quantify the uncertainty of such solutions.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
(Functional/nonparametric) statistical inverse problems: examples
Example (deconvolution problem): wish to estimate adensity f from a sample (Xk)n
k=1of it but we observeYk = Xk + εk , k = 1, . . . , n,
where (εk)nk=1are i.i.d. with common distribution P
independent of the Xk ’s; thus, we observe a sample ofH(f ) = G(f ) = f ∗ dP and this is a (linear) inverse problem.Example (additive Gaussian noise): wish to estimate afunction f when observing G(f ) at (xk)n
k=1 corrupted by std.Gaussian noise; i.e. if (εk)n
k=1i.i.d.∼ N(0,1) and W is GWN,
Yk = G(f )(xk) + εkcts. analogue⇐⇒ Yk =G(f )+ 1√
nW(ωk);
under enough assumptions and with abuse of notation, weobserve a sample of H(f ) := N (G(f ), 1
n Id), where G may be acompact operator turning this into an inverse problem.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
(Functional/nonparametric) statistical inverse problems
Setting: H(f ) = H(G(f )) is a prob. density we sample from.Techniques: SVD, regularisation methods, Fourier methods,Bayesian method, etc. They heavily exploit the form of G .Bayesian approach: propose a ‘prior’ probability measure Πon the functional space H of f ; let the data Y ‘update’ itthrough Bayes’ rule, giving the ‘posterior’ probability measure
Π(h |Y )∝n∏
k=1H(h)(Yk) Π(h), h ∈ H;
advantages, only evaluate forward problem, optimal solutionsin many unrelated problems and uncertainty quantification;disadvantage, form of G is used to propose Π.Objective: propose implementable priors using as littleunderstanding of G as possible for which we can rigorouslyshow optimality (minimax RoC and Cramer–Rao bounds).
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Construction and properties
How does a trajectory of a compound Poisson process look?
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Construction
Let (Nt)t≥0 be a 1-dim. Poisson process with intensity λ > 0;Let X1,X2,... be i.i.d. real-valued random variables withcommon probability distribution P.Assume this sequence is independent of the Poisson process.
Then, a 1-dimensional (zero-drift) compound Poisson process withintensity λ and jump size distribution P can be written as
Ct =Nt∑
j=1Xj , t ≥ 0,
0∑j=1
Xj = 0, so C0 = 0 a.s.
.CPPs are Markov and, in particular, Levy processes (LPs, moredetails later): Textbook example of pure jump LPs.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Applications
Compound Poisson processes are the basic model for systemswith random shocks that come ‘out of the blue’.Numerous applications: Seismology, storage theory (resources,ecosystems, etc.), queuing and renewal theory.Nonparametric inference on them (discretely observed) hasreceived much attention lately: Buchmann and Grubel (2003,2004), C. (2017), Comte et al. (2014, 2015), Duval (2013,2014), Duval and Hoffmann (2011), van Es et al. (2007),Gugushvili (2007), Gugushvili et al. (2015a, 2015b), Nickl andReiß (2012), Nickl et al. (2016), Trabs (2014), etc.Furthermore, any LP is approximated arbitrarily well by CPPs,and estimation of CPPs can be used to estimate LPs.LPs are used in a myriad of applications within finance, biologyand engineering.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Setting
Assume throughout λ and P are unknown. In most practicalsituations C := (Ct)t≥0 is not observed continuously.
Instead, we observe C∆, . . . ,Cn∆ for some ∆ > 0 and n ∈ N.
Figure: C∆, . . . ,Cn∆, ∆ = 2.5 and n = 4 (λ = 0.5,P = N(0, 1))
Figure: C∆, . . . ,Cn∆, ∆ = 2.5 and n = 4 (λ = 0.5,P = N(0, 1))How can we infer λ and P with such incomplete information?
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
CPPs as LPs and nonlinear inverse problem
(Ct)t≥0 is a Levy process. Therefore,
it has independent and stationary increments.In particular, Yk := Ck∆ − C(k−1)∆, k = 1, . . . , n, areindependent copies of Y := Y1 := C∆ :=
∑N∆j=1 Xj (C0 = 0).
This is a nonlinear inverse problem:
we observe a random variable X corrupted by a randomnumber of independent copies of itself ⇒ auto-convolution!Indeed, we observe a sample from the distribution of Y ,g =G(f )=e−∆
∫Rf ∑∞
j=0∆j f ∗j
j! , where f = λdF is Levy density.
Question: What is the ill-posedness?Fourier transform F is unitary operator, so we resort to the spectralapproach to find answer and to construct an estimator.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
The spectral approach: no ill-posedness and estimator
Fourier transform of g/characteristic function of Y (∼ the noise) is
F [G(f )](u) = Eg[eiuY
]= e∆(F f (u)−
∫R f ) =: ϕ(u), u ∈ R .
Due to ‖F f ‖L∞≤∫R f , inf
u∈R|ϕ(u)|≥e−2∆
∫R f >0 so no ill-posedness
in rates ⇒ great toy example, rates only depend on f and d .Furthermore, an estimator for f can be constructed from the above:
ϕ(u) = exp(
∆(F f (u) − F [δ0](u)
∫R
f))
1∆ Logϕ(u) = exp(∆(F f (u) − F [δ0](u)
∫R
f )),
where Log is the distinguished logarithm
1∆F
−1[Logϕ](x) = F f (dx) −Fδ0(dx)∫R
f ,
where Log is the distinguished logarithm
1∆F
−1[Logϕ](x) = f (dx) − δ0(dx)∫R
f ,
where Log is the distinguished logarithm
1∆ 1R \{0}(x)F−1[Logϕ](dx) = 1R \{0}(x)( f (dx)−δ0(dx)
∫R
f ),
where Log is the distinguished logarithm,
1∆ 1R \{0}(x)F−1[Logϕ](dx) = 1R \{0}(x)( f (dx)−δ0(dx)
∫R
f ) = f (x),
where Log is the distinguished logarithm,
f (x) = 1∆ 1R \{0}(x)F−1[Logϕ](dx),
where Log is the distinguished logarithm,
fn(x) := 1∆ 1R \{0}(x)F−1[Logϕ](dx),
where Log is the distinguished logarithm,
fn(x) := 1∆ 1(−τn,τn)C (x)F−1[Logϕ](dx),
where Log is the distinguished logarithm, τn → 0 as n→∞
fn(x) := 1∆ 1(−τn,τn)C (x)F−1[Logϕn](dx),
where Log is the distinguished logarithm, τn → 0 as n→∞,ϕn(u) := 1
n∑n
k=1 eiuYk is the empirical characteristic function of theincrements
fn(x) := 1∆ 1(−τn,τn)C (x)F−1[LogϕnFKhn ](x)dx ,
where Log is the distinguished logarithm, τn → 0 as n→∞,ϕn(u) := 1
n∑n
k=1 eiuYk is the empirical characteristic function of theincrements and Khn := 1
hnK(·
hn
), with K a band-limited kernel
function and hn → 0. Issue if ϕn(u) = 0 on [−h−1n , h−1
n ], moredetails in C. (in prep.).
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
fn is minimax-optimal for Lp-norms
For I ⊆ R, assume f is in the Holder(–Zygmund if s integer) space
Cs(I)⊇{
h∈Cu(I) :‖h‖L∞(I)+‖Dbsch‖L∞(I)+ supx ,y∈I
|Dbsch(x)−Dbsch(y)||x − y |s−bsc
};
note that minimax rates for this problem are rn := (∆n)−s/(2s+1).
Theorem (C. (in prep.))Under assumptions on τn, hn and K, for any p ∈ [1,∞)
Egn [‖fn − f ‖Lp(I)] . rn.
Furthermore, for any L > 0, Mn →∞, p ∈ [1, 2] and if s > 1/p,
Pr(‖fn − f ‖Lp(I) ≥ Mnrn
)≤ e−L∆nr2
n .
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Posterior for Besov prior contracts at minimax rate for Lp-norms
Define the random wavelet series
v(x) =2J0−1∑k=0
ukφk(x) +∞∑
j=J0
2j−1∑l=0
2−j(s+1/2)uj,lψj,l (x), x ∈ I,
where J0 fixed, uk , uj,li.i.d.∼ Unif(−B,B), B > 0 fixed, and {φk , ψj,l}
CDV-basis. Then, take as priors the induced laws on C(I) of exp(v)and v0 + v for v0 > 0 sufficiently large. These do not depend on G!
Theorem (C. (in prep.))Previous assumptions and define H :=
{h∈L1(R)∩L∞(R) :
inf I h≥A, h≡0 on R\I} for 0<A<∞ fixed. If f ∈H and p∈ [1,2],
Egn Π(h ∈ H : ‖h−f ‖Lp(I) ≤ Mnrn(log ∆n)s
2s+1 |Y )→ 1 ∀Mn →∞.
Key ingredients: Π places enough mass around any element inCs(I); and, existence of statistical tests with small enough errors.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Fn satisfies Donsker theorem and is CR-optimal
Let Fn(x) :=∫ x−∞ fn and F =
∫ x−∞ f . Assume that
|F (x)−F (y)|.min{(log |x−y |−1)α, 1} for all x , y ∈R andsome α>2, and∫R logβ
(max{|x |, e}
)F (dx) <∞ for some β > 2.
Theorem (C. (2017))Under mild assumptions on τn, hn and K,
√n(
Fn − F)→D Gg in L∞(R),
where Gg is centred Gaussian process on R with optimal covariance
Σx ,y := 1∆2
∫R
(hx ∗F−1
[ 1G(f )(−·)
](z))(
hy ∗F−1[ 1G(f )(−·)
](z))
G(f )(dz),
hx :=1(−∞,x ] 1R \{0} and F−1 [G(f )−1(−·)]
is finite signed measure.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
A Bernstein–von Mises theorem
Minimax contraction rates hold. The score operator is
S : L2(f )→ L20(G(f )), h 7→ ∆
(d((hf ) ∗ G(f ))dG(f ) −
∫R
hdf)
;
issue: we do not know how to control denominator ⇒ work onI = (−1/2, 1/2] as compact group with modulo 1.Need
∫I f ∆ < π.
Take as prior the induced law on C(I) of exp(v), wherev :=
∑j≤J
∑k≤2j−1 al uj,kψj,k , with {ψj,k} periodic basis.
Theorem (Nickl and Sohl (in prep.), C. (in prep.))
Taking al =2−l l−2 and 2J∼(n/logn)1/(2s+1), or as in contraction,and Πn the image measure of Π(· | Y ) under h 7→
√n(h − hn),
EgnσM0
(Πn,LM0(Gg )
)→ 0.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Implementation of the posterior distribution
Literature on MCMC to sample infinite-dim. objects is growingrapidly: Beskos, Girolami, Roberts, Stuart and collaborators.Monte Carlo is not dimension-dependent: discretise at the end!In particular, Vollmer (2014) gives proposal for uniform Besovpriors and shows dimension-indep. errors if likelihood bounded.This, together with independence sampler, can be used.Issue: evaluating G(f )=e−∆
∫f ∑∞
j=0∆j f ∗j
j! . Note thatdG(f )/dµ =e−∆
∫f 1{0}+e−∆
∫fF−1[e∆F f −1]. Can be
approximated with FFT, but inexact. Pseudo-marginal MCMC!In this program to use Bayesian methods for inverse problemswith general priors showing BvM theorems, we have tocompute score operator. Much of the newest literature aboveconcerns Langevin or Hamiltonian methods.Some interesting work left to do...
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Relevant references: Bayesian approach
Castillo, I. and Nickl, R. (2013): Nonparametric Bernstein-von Mises Theorems inGaussian White Noise. Ann. Statist. 41 1999-2028.Castillo, I. and Nickl, R. (2014): On the Bernstein-von Mises phenomenon fornonparametric Bayes procedures. Ann. Statist. 42 1941-1969.Coca, A. J.: Optimal inference on the Levy density of a compound Poissonprocesses observed discretely at arbitrary frequencies. In preparation.Gine, E. and Nickl, R. (2011): Rates of contraction for posterior distributions inLr -metrics, 1 ≤ r ≤ ∞. Ann. Statist. 39(6), 2883–2911Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000): Convergence rates ofposterior distributions. Ann. Statist. 28(2), 500–531Monard, F., Nickl, R. and Paternain G.P. (2017): Efficient nonparametricBayesian inference for X-ray transforms. arXiv:1708.06332Nickl, R. (2017): Bernstein-von Mises theorems for statistical inverse problems I:Schrodinger equation. arXiv:1707.01764Nickl, R. and Sohl, J.: Bernstein-von Mises theorems for statistical inverseproblems I: compound Poisson processes observed discretely at low frequencies. Inpreparation.Vollmer, S. J. (2014): Dimension-independent MCMC sampling for inverseproblems with non-Gaussian priors. arXiv:1302.2213v3
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |
Relevant references: inference for discretely observed CPPs
Buchmann, B. and Grubel, R. (2003). Decompounding: an estimation problemfor Poisson random sums. Ann. Statist. 31, 1054–1074.Coca, A. J. (2017). Efficient nonparametric inference for discretely observedcompound Poisson processes. Prob. Theory Related Fields, online first. doi:10.1007/s00440-017-0761-5Comte, F., Duval, C. and Genon-Catalot, V. (2014). Nonparametric densityestimation in compound Poisson process using convolution power estimators.Metrika, Springer Verlag (Germany) 77(1), 163–183.Duval, C. (2013a). Density estimation for compound Poisson processes fromdiscrete data. Stochastic Process. Appl. 123, 3963–3986.van Es, B., Gugushvili, S. and Spreij, P. (2007). A kernel type nonparametricdensity estimator for decompounding. Bernoulli 13 (2007), 672–694.Nickl, R. and Reiß, M. (2012). A Donsker theorem for Levy measures. J. Funct.Anal. 263, 3306–3332.Nickl, R., Reiß, M., Sohl, J. and Trabs, M. (2016). High-frequency Donskertheorems for Levy measures. Prob. Theory Related Fields, 164(1), 61–108.Trabs, M. (2014). Information bounds for inverse problems with application todeconvolution and Levy models. Ann. Inst. Henri Poincare Probab. Stat. 51(4),1620–1650.
Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs
Thanks for your attention!