*-0.1cm Optimal nonparametric inference for discretely ...€¦ · Optimal nonparametric inference...

Optimal nonparametric inference for discretelyobserved compound Poisson processes

Alberto J. Coca *Statslab, University of Cambridge

IMA Conference on Inverse Problems from Theory to Applicationat University of Cambridge, UK

19th September 2017

*Supported by DPMMS’ 2016 EPSRC Doctoral Prize andH2020 ERC grant ”UQMSI” with project ID 647812

| Introduction | | CPPs | | Inference for disc. obs. CPPs | | References |

Outline

1 Introduction

2 Compound Poisson processes

3 Inference for discretely observed compound Poisson processes

4 References

Alberto J. Coca, University of Cambridge Optimal nonparametric inference for discretely observed CPPs


(Functional/nonparametric) inverse problems

General idea: to characterise the system/underlying causesgiving rise to a set of indirect measurements/observations.Examples: recovery of potentials and initial conditions fromsolutions to PDEs, of images from MRI measurements and ofthe distribution of a random variable corrupted by noise.General form: g = G(f ), where g is an ‘observed’ function, Gis an operator and f is the objective ‘unobserved’ function.Difficulties: G non-invertible and degree of ill-posedness, Gnon-linear, etc. Linear problems better understood (e.g. SVDof G) but lack of general techniques for non-linear problems.General objective: to develop general and optimalmethodology to solve (linear or non-linear) statistical inverseproblems and to quantify the uncertainty of such solutions.



(Functional/nonparametric) statistical inverse problems: examples

Example (deconvolution problem): wish to estimate adensity f from a sample (Xk)n

k=1of it but we observeYk = Xk + εk , k = 1, . . . , n,

where (εk)nk=1are i.i.d. with common distribution P

independent of the Xk ’s; thus, we observe a sample ofH(f ) = G(f ) = f ∗ dP and this is a (linear) inverse problem.Example (additive Gaussian noise): wish to estimate afunction f when observing G(f ) at (xk)n

k=1 corrupted by std.Gaussian noise; i.e. if (εk)n

k=1i.i.d.∼ N(0,1) and W is GWN,

Yk = G(f )(xk) + εkcts. analogue⇐⇒ Yk =G(f )+ 1√

nW(ωk);

under enough assumptions and with abuse of notation, weobserve a sample of H(f ) := N (G(f ), 1

n Id), where G may be acompact operator turning this into an inverse problem.



(Functional/nonparametric) statistical inverse problems

Setting: H(f ) = H(G(f )) is a prob. density we sample from.Techniques: SVD, regularisation methods, Fourier methods,Bayesian method, etc. They heavily exploit the form of G .Bayesian approach: propose a ‘prior’ probability measure Πon the functional space H of f ; let the data Y ‘update’ itthrough Bayes’ rule, giving the ‘posterior’ probability measure

Π(h |Y )∝n∏

k=1H(h)(Yk) Π(h), h ∈ H;

advantages, only evaluate forward problem, optimal solutionsin many unrelated problems and uncertainty quantification;disadvantage, form of G is used to propose Π.Objective: propose implementable priors using as littleunderstanding of G as possible for which we can rigorouslyshow optimality (minimax RoC and Cramer–Rao bounds).



Construction and properties

How does a trajectory of a compound Poisson process look?



Construction

Let (Nt)t≥0 be a 1-dim. Poisson process with intensity λ > 0;Let X1,X2,... be i.i.d. real-valued random variables withcommon probability distribution P.Assume this sequence is independent of the Poisson process.

Then, a 1-dimensional (zero-drift) compound Poisson process withintensity λ and jump size distribution P can be written as

Ct =Nt∑

j=1Xj , t ≥ 0,

0∑j=1

Xj = 0, so C0 = 0 a.s.

.CPPs are Markov and, in particular, Levy processes (LPs, moredetails later): Textbook example of pure jump LPs.



Applications

Compound Poisson processes are the basic model for systemswith random shocks that come ‘out of the blue’.Numerous applications: Seismology, storage theory (resources,ecosystems, etc.), queuing and renewal theory.Nonparametric inference on them (discretely observed) hasreceived much attention lately: Buchmann and Grubel (2003,2004), C. (2017), Comte et al. (2014, 2015), Duval (2013,2014), Duval and Hoffmann (2011), van Es et al. (2007),Gugushvili (2007), Gugushvili et al. (2015a, 2015b), Nickl andReiß (2012), Nickl et al. (2016), Trabs (2014), etc.Furthermore, any LP is approximated arbitrarily well by CPPs,and estimation of CPPs can be used to estimate LPs.LPs are used in a myriad of applications within finance, biologyand engineering.



Setting

Assume throughout λ and P are unknown. In most practicalsituations C := (Ct)t≥0 is not observed continuously.

Instead, we observe C∆, . . . ,Cn∆ for some ∆ > 0 and n ∈ N.

Figure: C∆, . . . ,Cn∆, ∆ = 2.5 and n = 4 (λ = 0.5,P = N(0, 1))

Figure: C∆, . . . ,Cn∆, ∆ = 2.5 and n = 4 (λ = 0.5,P = N(0, 1))How can we infer λ and P with such incomplete information?



CPPs as LPs and nonlinear inverse problem

(Ct)t≥0 is a Levy process. Therefore,

it has independent and stationary increments.In particular, Yk := Ck∆ − C(k−1)∆, k = 1, . . . , n, areindependent copies of Y := Y1 := C∆ :=

∑N∆j=1 Xj (C0 = 0).

This is a nonlinear inverse problem:

we observe a random variable X corrupted by a randomnumber of independent copies of itself ⇒ auto-convolution!Indeed, we observe a sample from the distribution of Y ,g =G(f )=e−∆

∫Rf ∑∞

j=0∆j f ∗j

j! , where f = λdF is Levy density.

Question: What is the ill-posedness?Fourier transform F is unitary operator, so we resort to the spectralapproach to find answer and to construct an estimator.



The spectral approach: no ill-posedness and estimator

Fourier transform of g/characteristic function of Y (∼ the noise) is

F [G(f )](u) = Eg[eiuY

]= e∆(F f (u)−

∫R f ) =: ϕ(u), u ∈ R .

Due to ‖F f ‖L∞≤∫R f , inf

u∈R|ϕ(u)|≥e−2∆

∫R f >0 so no ill-posedness

in rates ⇒ great toy example, rates only depend on f and d .Furthermore, an estimator for f can be constructed from the above:

ϕ(u) = exp(

∆(F f (u) − F [δ0](u)

∫R

f))

1∆ Logϕ(u) = exp(∆(F f (u) − F [δ0](u)

∫R

f )),

where Log is the distinguished logarithm

1∆F

−1[Logϕ](x) = F f (dx) −Fδ0(dx)∫R

f ,


1∆F

−1[Logϕ](x) = f (dx) − δ0(dx)∫R

f ,


1∆ 1R \{0}(x)F−1[Logϕ](dx) = 1R \{0}(x)( f (dx)−δ0(dx)

∫R

f ),

where Log is the distinguished logarithm,

1∆ 1R \{0}(x)F−1[Logϕ](dx) = 1R \{0}(x)( f (dx)−δ0(dx)

∫R

f ) = f (x),


f (x) = 1∆ 1R \{0}(x)F−1[Logϕ](dx),


fn(x) := 1∆ 1R \{0}(x)F−1[Logϕ](dx),


fn(x) := 1∆ 1(−τn,τn)C (x)F−1[Logϕ](dx),

where Log is the distinguished logarithm, τn → 0 as n→∞

fn(x) := 1∆ 1(−τn,τn)C (x)F−1[Logϕn](dx),

where Log is the distinguished logarithm, τn → 0 as n→∞,ϕn(u) := 1

n∑n

k=1 eiuYk is the empirical characteristic function of theincrements

fn(x) := 1∆ 1(−τn,τn)C (x)F−1[LogϕnFKhn ](x)dx ,

where Log is the distinguished logarithm, τn → 0 as n→∞,ϕn(u) := 1

n∑n

k=1 eiuYk is the empirical characteristic function of theincrements and Khn := 1

hnK(·

hn

), with K a band-limited kernel

function and hn → 0. Issue if ϕn(u) = 0 on [−h−1n , h−1

n ], moredetails in C. (in prep.).



fn is minimax-optimal for Lp-norms

For I ⊆ R, assume f is in the Holder(–Zygmund if s integer) space

Cs(I)⊇{

h∈Cu(I) :‖h‖L∞(I)+‖Dbsch‖L∞(I)+ supx ,y∈I

|Dbsch(x)−Dbsch(y)||x − y |s−bsc

};

note that minimax rates for this problem are rn := (∆n)−s/(2s+1).

Theorem (C. (in prep.))Under assumptions on τn, hn and K, for any p ∈ [1,∞)

Egn [‖fn − f ‖Lp(I)] . rn.

Furthermore, for any L > 0, Mn →∞, p ∈ [1, 2] and if s > 1/p,

Pr(‖fn − f ‖Lp(I) ≥ Mnrn

)≤ e−L∆nr2

n .



Posterior for Besov prior contracts at minimax rate for Lp-norms

Define the random wavelet series

v(x) =2J0−1∑k=0

ukφk(x) +∞∑

j=J0

2j−1∑l=0

2−j(s+1/2)uj,lψj,l (x), x ∈ I,

where J0 fixed, uk , uj,li.i.d.∼ Unif(−B,B), B > 0 fixed, and {φk , ψj,l}

CDV-basis. Then, take as priors the induced laws on C(I) of exp(v)and v0 + v for v0 > 0 sufficiently large. These do not depend on G!

Theorem (C. (in prep.))Previous assumptions and define H :=

{h∈L1(R)∩L∞(R) :

inf I h≥A, h≡0 on R\I} for 0<A<∞ fixed. If f ∈H and p∈ [1,2],

Egn Π(h ∈ H : ‖h−f ‖Lp(I) ≤ Mnrn(log ∆n)s

2s+1 |Y )→ 1 ∀Mn →∞.

Key ingredients: Π places enough mass around any element inCs(I); and, existence of statistical tests with small enough errors.



Fn satisfies Donsker theorem and is CR-optimal

Let Fn(x) :=∫ x−∞ fn and F =

∫ x−∞ f . Assume that

|F (x)−F (y)|.min{(log |x−y |−1)α, 1} for all x , y ∈R andsome α>2, and∫R logβ

(max{|x |, e}

)F (dx) <∞ for some β > 2.

Theorem (C. (2017))Under mild assumptions on τn, hn and K,

√n(

Fn − F)→D Gg in L∞(R),

where Gg is centred Gaussian process on R with optimal covariance

Σx ,y := 1∆2

∫R

(hx ∗F−1

[ 1G(f )(−·)

](z))(

hy ∗F−1[ 1G(f )(−·)

](z))

G(f )(dz),

hx :=1(−∞,x ] 1R \{0} and F−1 [G(f )−1(−·)]

is finite signed measure.



A Bernstein–von Mises theorem

Minimax contraction rates hold. The score operator is

S : L2(f )→ L20(G(f )), h 7→ ∆

(d((hf ) ∗ G(f ))dG(f ) −

∫R

hdf)

;

issue: we do not know how to control denominator ⇒ work onI = (−1/2, 1/2] as compact group with modulo 1.Need

∫I f ∆ < π.

Take as prior the induced law on C(I) of exp(v), wherev :=

∑j≤J

∑k≤2j−1 al uj,kψj,k , with {ψj,k} periodic basis.

Theorem (Nickl and Sohl (in prep.), C. (in prep.))

Taking al =2−l l−2 and 2J∼(n/logn)1/(2s+1), or as in contraction,and Πn the image measure of Π(· | Y ) under h 7→

√n(h − hn),

EgnσM0

(Πn,LM0(Gg )

)→ 0.



Implementation of the posterior distribution

Literature on MCMC to sample infinite-dim. objects is growingrapidly: Beskos, Girolami, Roberts, Stuart and collaborators.Monte Carlo is not dimension-dependent: discretise at the end!In particular, Vollmer (2014) gives proposal for uniform Besovpriors and shows dimension-indep. errors if likelihood bounded.This, together with independence sampler, can be used.Issue: evaluating G(f )=e−∆

∫f ∑∞

j=0∆j f ∗j

j! . Note thatdG(f )/dµ =e−∆

∫f 1{0}+e−∆

∫fF−1[e∆F f −1]. Can be

approximated with FFT, but inexact. Pseudo-marginal MCMC!In this program to use Bayesian methods for inverse problemswith general priors showing BvM theorems, we have tocompute score operator. Much of the newest literature aboveconcerns Langevin or Hamiltonian methods.Some interesting work left to do...



Relevant references: Bayesian approach

Castillo, I. and Nickl, R. (2013): Nonparametric Bernstein-von Mises Theorems inGaussian White Noise. Ann. Statist. 41 1999-2028.Castillo, I. and Nickl, R. (2014): On the Bernstein-von Mises phenomenon fornonparametric Bayes procedures. Ann. Statist. 42 1941-1969.Coca, A. J.: Optimal inference on the Levy density of a compound Poissonprocesses observed discretely at arbitrary frequencies. In preparation.Gine, E. and Nickl, R. (2011): Rates of contraction for posterior distributions inLr -metrics, 1 ≤ r ≤ ∞. Ann. Statist. 39(6), 2883–2911Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000): Convergence rates ofposterior distributions. Ann. Statist. 28(2), 500–531Monard, F., Nickl, R. and Paternain G.P. (2017): Efficient nonparametricBayesian inference for X-ray transforms. arXiv:1708.06332Nickl, R. (2017): Bernstein-von Mises theorems for statistical inverse problems I:Schrodinger equation. arXiv:1707.01764Nickl, R. and Sohl, J.: Bernstein-von Mises theorems for statistical inverseproblems I: compound Poisson processes observed discretely at low frequencies. Inpreparation.Vollmer, S. J. (2014): Dimension-independent MCMC sampling for inverseproblems with non-Gaussian priors. arXiv:1302.2213v3



Relevant references: inference for discretely observed CPPs

Buchmann, B. and Grubel, R. (2003). Decompounding: an estimation problemfor Poisson random sums. Ann. Statist. 31, 1054–1074.Coca, A. J. (2017). Efficient nonparametric inference for discretely observedcompound Poisson processes. Prob. Theory Related Fields, online first. doi:10.1007/s00440-017-0761-5Comte, F., Duval, C. and Genon-Catalot, V. (2014). Nonparametric densityestimation in compound Poisson process using convolution power estimators.Metrika, Springer Verlag (Germany) 77(1), 163–183.Duval, C. (2013a). Density estimation for compound Poisson processes fromdiscrete data. Stochastic Process. Appl. 123, 3963–3986.van Es, B., Gugushvili, S. and Spreij, P. (2007). A kernel type nonparametricdensity estimator for decompounding. Bernoulli 13 (2007), 672–694.Nickl, R. and Reiß, M. (2012). A Donsker theorem for Levy measures. J. Funct.Anal. 263, 3306–3332.Nickl, R., Reiß, M., Sohl, J. and Trabs, M. (2016). High-frequency Donskertheorems for Levy measures. Prob. Theory Related Fields, 164(1), 61–108.Trabs, M. (2014). Information bounds for inverse problems with application todeconvolution and Levy models. Ann. Inst. Henri Poincare Probab. Stat. 51(4),1620–1650.


Thanks for your attention!

Date post:	29-Sep-2020
Category:	Documents
Upload:	others
View:	18 times
Download:	0 times

*-0.1cm Optimal nonparametric inference for discretely ...€¦ · Optimal nonparametric inference...

Documents