+ All Categories
Home > Documents > Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. ·...

Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. ·...

Date post: 17-Aug-2021
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
67
Statistical Learning with Hawkes Processes and new Matrix Concentration Inequalities 48` emes Journ´ ees de Statistique de la SFdS St´ ephane Ga¨ ıffas
Transcript
Page 1: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Statistical Learning with Hawkes Processesand new Matrix Concentration Inequalities

48emes Journees de Statistique de la SFdS

Stephane Gaıffas

Page 2: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 3: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 4: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Introduction

You have users of a system

You want to quantify their level of interactions

You don’t want to use only declared interactions: deprecated, notrelated to the users’ activity

You really want levels of interaction driven by user’s actions, using theirtimestamps’ patterns

Example 1: Twitter. Timestamps of users’ messages. Find somethingbetter than the graph given by links of type “user 1 follows user 2”

Example 2: MemeTracker. Publications times of articles onwebsites/blogs, with hyperlinks. Quantify the influence of the publicationactivity of websites on the others.

Page 5: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Introduction

From:

Build:

Page 6: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Introduction

Data: large number of irregular timestamped events recorded incontinuous time

Activity of users on a social network [DARPA Twitter Bot Challenge2016, etc.]

High-frequency variations of signals in finance [Bacry et al. 2013]

Earthquakes and aftershocks in geophysics [Ogata 1998]

Crime activity (!?!) [Mohler 2011 and the PrePol startup]

Genomics, Neurobiology [Reynaud-Bouret et al. 2010]

Methods: in the context of social networks, survival analysis andmodeling based on counting processes [Gomez et al. 2013, 2015], [Xu etal. 2016]

Page 7: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Introduction

Setting

For each node i ∈ I = {1, . . . , d} we have a set Z i of events

Any τ ∈ Z i is the occurence time of an event related to i

Counting process

Put Nt = [N1t · · ·Nd

t ]>

N it =

∑τ∈Z i 1τ≤t

Intensity

Stochastic intensities λt = [λ1t · · ·λdt ]>, λit = intensity of N i

t

λit = limdt→0

P(N it+dt − N i

t = 1|Ft)

dt

λit = instantaneous rate of event occurence at time t for node i

λt characterizes the distribution of Nt [Daley et al. 2007]

Patterns can be captured by putting structure on λt

Page 8: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

The Multivariate Hawkes Process (MHP)

Scaling

We observe Nt on [0,T ]. “Asymptotics” in T → +∞. d is “large”

The Hawkes process

A particular structure for λt : auto-regression

Nt is called a Hawkes process [Hawkes 1971] if

λit = µi +d∑

j=1

∫ t

0

ϕij(t − t ′)dN jt′ ,

µi ∈ R+ exogenous intensity

ϕij non-negative integrable and causal (support R+) functions

ϕij are called kernels. Encodes the impact of an action by node j onthe activity of node i

Captures auto-excitation and cross-excitation across nodes, aphenomenon observed in social networks [Crane et al. 2008]

Page 9: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Stability condition of the MHP

Stability condition

Introduce the matrix with entries

G ij =

∫ +∞

0

ϕij(t)dt

Its spectral norm ‖G‖ must satisfy ‖G‖ < 1 to ensure stability ofthe process (and stationarity)

Page 10: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A brief history of MHP

Brief history

Introduced in Hawkes 1971

Earthquakes and geophysics [Kagan and Knopoff 1981], [Zhuang etal. 2012]

Genomics [Reynaud-Bouret and Schbath 2010]

High-frequency Finance [Bacry et al. 2013]

Terrorist activity [Mohler et al. 2011, Porter and White 2012]

Neurobiology [Hansen et al. 2012]

Social networks [Carne and Sornette 2008], [Zhou et al.2013]

And even FPGA-based implementation [Guo and Luk 2013]

Page 11: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A brief history of MHP

Page 12: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Estimation for MHP

Parametric estimation (Maximum likelihood)

First work [Ogata 1978]

and [Simma and Jordan 2010], [Zhou et al. 2013]→ Expected Maximization (EM) algorithms, with priors

Non parametric estimation

[Marsan Lengline 2008], generalized by [Lewis, Mohler 2010]→ EM for penalized likelihood function→ Monovariate Hawkes processes

[Reynaud-Bouret et al. 2011]→ `1-penalization over a dictionary

[Bacry and Muzy 2014]→ Another approach: Weiner-Hopf equations, larger datasets

Page 13: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

MHP in large dimension

What for?

Infer influence and causality directly from actions of users

Exploit the hidden lower-dimensional structure of model parametersfor inference/prediction

Dimension d is large. We want:

a simple parametric model on µ = [µi ] and ϕ = [ϕij ]

a tractable and scalable optimization problem

to encode some prior assumptions using (convex) penalization

Page 14: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A simple parametrization of the MHP

Simple parametrization

Considerϕij(t) = aij × αije

−αij t

aij = level of interaction between nodes i and j

αij = lifetime of instantaneous excitation of node i by node j

The matrixA = [aij ]1≤i,j≤d

is understood has a weighted adjacency matrix of mutual influence ofnodes {1, . . . , d}

A is non-symmetric: “oriented graph”

Page 15: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A simple parametrization of the MHP

We end up with intensities

λiθ,t = µi +

(0,t)

d∑

j=1

aijαije−αij (t−s)dN j

s

for i ∈ {1, . . . , d} whereθ = [µ,A,α]

with

baselines µ = [µ1 · · ·µd ]> ∈ Rd+

interactions A = [aij ]1≤i,j≤d ∈ Rd×d+

decays α = [αij ]1≤i,j≤d ∈ Rd×d+

Page 16: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A simple parametrization of the MHP

For d = 1 the intensity λθ,t looks like this:

Page 17: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Goodness-of-fit functionals

Minus log-likelihood

−`T (θ) =d∑

i=1

{∫ T

0

λiθ,tdt −∫ T

0

log λiθ,tdNit

}

Least-squares

RT (θ) =d∑

i=1

{∫ T

0

(λiθ,t)2dt − 2

∫ T

0

λiθ,tdNit

}

with

λiθ,t = µi +d∑

j=1

aijαij

(0,t)

e−αij (t−s)dN js

where θ = [µ,A,α] with µ = [µi ], A = [aij ], α = [αij ]

Page 18: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A simple framework

Put ‖λθ‖2T = 〈λθ, λθ〉T with

〈λθ, λθ′〉T =1

T

d∑

i=1

[0,T ]

λiθ,t λiθ′,t dt

so that least-squares writes

RT (θ) = ‖λθ‖2T −

2

T

d∑

i=1

[0,T ]

λiθ,tdNit

It is natural: if N has ground truth intensity λ∗ then

E[RT (θ)] = E‖λθ‖2T − 2E〈λθ, λ∗〉T = E‖λθ − λ∗‖2

T − ‖λ∗‖T ,

where we used “signal + noise” decomposition (Doob-Meyer):

dN it = λ∗t dt + dM i

t

with M i martingale

Page 19: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 20: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

A simple framework

A strong assumption: assume that

ϕij(t) = aijhij(t)

for known hij meaning that

λiθ,t = µi +

(0,t)

d∑

j=1

aijhij(t − s)dN js ,

where θ = [µ,A] with µ = [µ1, . . . , µd ]> and A =[aij ]1≤i,j≤d

However

Most papers using high-dimensional MHP assume hij(t) = αe−αt fora known α!

e.g. [Yang and Zha 2013], [Zhou et al. 2013], [Farajtabar et al.2015]

More on this problem later

Page 21: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Prior encoding by penalization

Prior assumptions

Some users are basically inactive and react only if stimulated:

µ is sparse

Everybody does not interact with everybody:

A is sparse

Interactions have community structure, possibly overlapping, a smallnumber of factors explain interactions:

A is low-rank

Page 22: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Prior encoding by penalization

Standard convex relaxations [Tibshirani (01), Srebro et al. (05),Bach (08), Candes & Recht (08), etc.]

Convex relaxation of ‖A‖0 =∑

ij 1Aij>0 is `1-norm:

‖A‖1 =∑

ij

|Aij |

Convex relaxation of rank is trace-norm:

‖A‖∗ =∑

j

σj(A) = ‖σ(A)‖1

where σ1(A) ≥ · · · ≥ σd(A) singular values of A

Page 23: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Prior encoding by penalization

So, we use the following penalizations

Use `1 penalization on µ

Use `1 penalization on AUse trace-norm penalization on A

[but other choices might be interesting...]

NB1: to induce sparsity AND low-rank on A, we use the mixedpenalization

A 7→ γ∗‖A‖∗ + γ1‖A‖1

NB2: recent work by Richard et al (2013): much better way to inducesparsity and low-rank than this

Page 24: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Sparse and low-rank matrices

{A : ‖A‖∗ ≤ 1} {A : ‖A‖1 ≤ 1} {A : ‖A‖1 + ‖A‖∗ ≤ 1}

The balls are computed on the set of 2× 2 symmetric matrices, which isidentified with R3.

Page 25: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Algorithm

We end up with the problem

θ ∈ argminθ∈Rd

+×Rd×d+

{RT (θ) + pen(θ)

},

with mixed penalizations

pen(θ) = τ1‖µ‖1 + γ1‖A‖1 + γ∗‖A‖∗

But there is the “features scaling” problem

Features scaling is necessary for “linear approaches” in supervisedlearning

No features and labels here!

⇒ Can be solved here by fine tuning of the penalization terms

Page 26: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Algorithm

Consider instead

θ ∈ argminθ∈Rd

+×Rd×d+

{RT (θ) + pen(θ)

},

where this time

pen(θ) = ‖µ‖1,w + ‖A‖1,W + w∗‖A‖∗

Penalization tuned by data-driven weights w , W and w∗ to solvethe “scaling” problem

Comes from sharp controls of the noise terms, using newprobabilistic tools

Ugly (but computationally easy) formulas

Page 27: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler
Page 28: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment

Toy example: take matrix A as

Page 29: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment: dimension 10, 210 parameters

No penalization `1 penalization

trace-norm penalization `1 + trace norm penalization

Page 30: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment: dimension 100, 20100 parameters

No penalization `1 penalization

trace-norm penalization `1 + trace norm penalization

Page 31: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment: dimension 100, 20100 parameters

Ground truth `1 `1 + trace

No pen w-`1 w-`1 + trace

Page 32: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment: dimension 100, 20100 parameters

`1 VS w-`1 `1 + trace VS w-`1 + trace

Estimation error of A

Page 33: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Numerical experiment: dimension 100, 20100 parameters

`1 + trace VS w-`1 + trace

AUC for support selection A

Page 34: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Theorical results

A sharp oracle inequality

Recall 〈λ1, λ2〉T = 1T

∑di=1

∫ T

0λi1,tλ

i2,tdt and ‖λ‖2

T = 〈λ, λ〉TAssume RE in our setting (Restricted Eigenvalues), mandatoryassumption to obtain fast rates for convex-relaxation basedprocedures

Theorem. We have

‖λθ − λ∗‖2T ≤ inf

θ

{‖λθ − λ∗‖2

T + κ(θ)2(5

4‖(w)supp(µ)‖2

2

+9

8‖(W )supp(A)‖2

F +9

8w2∗ rank(A)

)}

with a probability larger than 1− 146e−x .

Page 35: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Theoretical results

Roughly, θ achieves an optimal tradeoff between approximation andcomplexity given by

‖µ‖0 log d

Tmax

iN i ([0,T ])/T

+‖A‖0 log d

Tmaxij

v ijT

+rank(A) log d

Tλmax(V T )

Complexity measured both by sparsity and rank

Convergence has shape (log d)/T , where T = length of theobservation interval

These terms are balanced by “empirical variance” terms

Page 36: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Theoretical results

Data-driven weights come from new “empirical” Bernstein’sinequalities, entrywise and for operator norm of the noise ZT (amatrix martingale)

Leads to a data-driven scaling of penalization: deals correctly withthe inhomogeneity of information over nodes

Noise term is

Z t =

∫ t

0

diag[dMs ]H s ,

with H t predictable process with entries

(H t)ij =

(0,t)

hij(t − s)dN js

We need to control 1T ‖ZT‖op

Page 37: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Theoretical results

A consequence of our new concentration inequalities (more after):

P[‖Z t‖op

t≥√

2v(x + log(2d))

t+

b(x + log(2d))

3t,

bt ≤ b, λmax(V t) ≤ v

]≤ e−x ,

for any v , x , b > 0, where

V t =1

t

∫ t

0

‖H s‖22,∞

[diag[λ∗s ] 0

0 H>s diag[H sH>s ]−1 diag[λ∗s ]H s

]ds

and bt = sups∈[0,t] ‖H s‖2,∞ (‖ · ‖2,∞ = maximum `2 row norm)

Useless for statistical learning! Event λmax(V t) ≤ v is annoying and V t

is not observable (depends on λ∗)!

Page 38: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Theoretical results

Theorem [Something better]. For any x > 0, we have

‖Z t‖opt

≤ 8

√(x + log d + ˆ

x,t)λmax(V t)

t

+(x + log d + ˆ

x,t)(10.34 + 2.65bt)

t

with a probability larger than 1− 84.9e−x , where

V t =1

t

∫ t

0

‖H s‖22,∞

[diag[dNs ] 0

0 H>s diag[H sH>s ]−1 diag[dNs ]H s

]ds

and small ugly term:

ˆx,t = 4 log log( 2λmax(V t ) + 2(4 + b2

t /3)x

x∨ e

)+ 2 log log

(b2t ∨ e

).

This is a non-commutative deviation inequality with observable variance

Page 39: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 40: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

Main tool: new concentration inequalities for matrix martingales incontinuous time

Introduce

Z t =

∫ t

0

As(C s � dM s)Bs ,

where {At}, {C t} and {Bt} predictable and where {M t}t≥0 is a“white” matrix martingale, in the sense that [vecM]t is diagonal

NB: entries of Z t are given by

(Z t)i,j =

p∑

k=1

q∑

l=1

∫ t

0

(As)i,k(C s)k,l(Bs)l,j(dM s)k,l .

Page 41: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

〈M〉t = entrywise predictable quadratic variation, so that

M�2t − 〈M〉t

martingale

vectorization operator vec : Rp×q → Rpq stacks vertically thecolumns of X〈vecM〉t is the pq × pq matrix with entries that are all pairwisequadratic covariations, so that

vec(M t)vec(M t)> − 〈vecM〉t

is a martingale.

Page 42: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

M t = Mct + Md

t , where Mct is a continuous martingale and Md

t is apurely discountinuous martingale. Its (entrywise) quadratic variationis defined as

[M]t = 〈Mc〉t +∑

0≤s≤t(∆M t)

2, (1)

and its quadratic covariation by

[vecM]t = 〈vecMc〉t +∑

0≤s≤tvec(∆M s)vec(∆M s)>.

We say that M is purely discontinuous if the process 〈vecMc〉t isidentically the zero matrix.

Page 43: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

Concentration for purely discountinuous matrix martingale:

M t is purely discountinuous and we have

〈M〉t =

∫ t

0

λsds

for a non-negative and predictable intensity process {λt}t≥0.

Standard moment assumptions (subexponential tails)

Introduce

V t =

∫ t

0

‖As‖2∞,2‖Bs‖2

2,∞W sds

where

W t =

[W 1

t 00 W 2

t

], (2)

W 1t = At diag[A>t At ]

−1 diag[(C�2

t � λt)1]A>t

W 2t = B>t diag[BtB>t ]−1 diag

[(C�2

t � λt)>1]Bt

Page 44: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

Introduce also

bt = sups∈[0,t]

‖As‖∞,2‖Bs‖2,∞‖C s‖∞.

Theorem.

P[‖Z t‖op ≥

√2v(x + log(m + n)) +

b(x + log(m + n))

3,

bt ≤ b, λmax(V t) ≤ v

]≤ e−x ,

First result of this type for matrix-martingale in continuous time

Page 45: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

Corollary. {N t} a p × q matrix, each (N t)i,j is an independent inhomogeneousPoisson processes with intensity (λt)i,j . Consider the martingale M t = N t −Λt ,where Λt =

∫ t

0λsds and let {C t} be deterministic and bounded. We have∥∥∥∫ t

0

C s � d(N t − Λt)∥∥∥op

√2(∥∥∥∫ t

0

C�2s � λsds

∥∥∥1,∞∨∥∥∥∫ t

0

C�2s � λsds

∥∥∥∞,1

)(x + log(p + q))

+sups∈[0,t] ‖C s‖∞(x + log(p + q))

3

holds with a probability larger than 1− e−x .

Page 46: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

Corollary. Even more particualar: N random matrix where N i,j areindependent Poisson variables with intensity λi,j . We have

‖N − λ‖op ≤√

2(‖λ‖1,∞ ∨ ‖λ‖∞,1)(x + log(p + q))

+x + log(p + q)

3.

Up to our knowledge, not previously stated in literature

NB: In the Gaussian case: variance depends on maximum `2 norm ofrows and columns (cf. Tropp (2011))

Page 47: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

New matrix concentration inequalities

We have as well a non-commutative Hoeffding’s inequality when M t

has continuous paths (allowing Ito’s formula...), with a similarvariance term

Tools from stochastic calculus, use of the dilation operator and someclassical matrix inequalities about the trace exponential and theSDP order.

A difficult proposition: a control of the quadratic variation of the purejump process

Uut =

0≤s≤t

(eu∆S (Z s ) − u∆S (Z s)− I

)

given by

〈Uξ〉t �∫ t

0

ϕ(ξ‖As‖∞,2‖Bs‖2,∞‖C s‖∞

)

‖C s‖2∞W sds,

where ϕ(x) = ex − x − 1.

Page 48: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 49: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Mean-field inference for Hawkes

Going back to maximum-likelihood estimation, with d very large

For inference, exploit the fact that d is large

⇒ use a Mean-Field approximation! (from Delattre et al. 2015)

�0.5

0

0.5

1

1.5

2

2.5

3

0 0.2 0.4 0.6 0.8 1

�1 t/⇤

1

t/T

0.1

1

1 10 100E1

/2[(�

1 t/⇤

1�

1)2]

d

d = 1d = 16

d = 128

Simulation resultsd�1/2

When d is large, we have

λit ≈ Λi with Λit = E[dN i

t ]/dt

Page 50: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Mean-field inference for Hawkes

Use the quadratic approximation

log λit ≈ log Λi+λit − Λi

Λi− (λit − Λi )2

2(Λi )2

in the log-likelihood

⇒Reduces inference to linear systems

Fluctuations E1/2[(λ1t/Λ1 − 1)2]

0 0.2 0.4 0.6 0.8

||Φ||

1

10

100

d0.001

0.01

0.1

1

Page 51: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Mean-field inference for Hawkes

No clean proof yet (only on toy example)

But it works very well empirically

Page 52: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Mean-field inference for Hawkes

0.01

0.1

1

10

1000 10000 100000Rel

ativ

eer

rorE1

/2[ (α

inf/αtr−

1)2]

T

α = 0.3

0.01

0.1

1

10

1000 10000 100000

T

α = 0.7

d = 4d = 8d = 16d = 32T−1/2

d = 4d = 8d = 16d = 32T−1/2

Page 53: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Mean-field inference for Hawkes

It is faster by several order of magnitude than state-of-the-art solvers

15.25

15.3

15.35

15.4

15.45

15.5

15.55

15.6

1 10 100Min

usLo

g-Li

kelih

ood−logP(N

t|θ

inf)

Computational time (s)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100Rel

ativ

eer

rorE1

/2[ (α

inf/α

tr−1)

2]

Computational time (s)

BFGSEMCFMF

BFGSEMCFMF

Page 54: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

1 Introduction

2 Sparse and Low Rank MHP

3 New matrix concentration inequalities

4 Faster inference: a dedicated mean field approximation

5 A more direct approach: cumulants matching

Page 55: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Some thoughts

Our original motivation for MHP is influence and causality recoveryof nodes

Knowledge of the full parametrization of MHP is of little interest byitself

Idea

Let’s not estimate the kernels ϕij , but their integrals only!

Nonparametric approach, no structure imposed on the kernels

Let’s not use a dictionary either (over-parametrization)

A way more direct approach

Page 56: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

We want to estimate G = [g ij ] where

g ij =

∫ +∞

0

ϕij(u) du ≥ 0 for 1 ≤ i , j ≤ d

Remark

g ij = average total number of events of node i whose directancestor is an event of node j

introducing N i←jt that counts the number of events of i whose direct

ancestor is an event of j we can prove that

E[dN i←jt ] = g ijE[dN i

t ] = g ijΛidt

Consequence

G describes mutual influences between nodes, but also their directcausal relationship (Granger)

Recall stability condition ‖G‖ < 1, which entails that I − G isinvertible

Page 57: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Cumulant matching method for estimation of GCompute estimates of the third order cumulants of the process

Find G that matches these empirical cumulants

Highly non-convex problem: polynomial or order 10 with respect tothe entries of (I − G )−1

Actually not so hard, local minima turns out to be good (deeplearning literature)

Cumulant matching quite powerful for latent topics models, such asLatent Dirichlet Allocation [Bach et al. 2015]

Page 58: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

First order-three cumulants can be estimated as

Λi =1

T

τ∈Z i

1 =N i

T

T

C ij =1

T

τ∈Z i

(N jτ+H − N j

τ−H − 2HΛj)

K ijk =1

T

τ∈Z i

(N jτ+H − N j

τ−H − 2HΛj)(

Nkτ+H − Nk

τ−H − 2HΛk)

− Λi

T

τ∈Z j

(Nkτ+2H − Nk

τ−2H − 4HΛk)

+ 2Λi

T

τ∈Z j

τ ′∈Z k

τ−2H≤τ ′<τ

(τ − τ ′)− 4H2Λi Λj Λk .

Page 59: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

DefiningR = (I − G )−1

we can make a link between the cumulants and G

Λidt = E(dN it)

C ijdt =

τ∈(−∞,+∞)

(E(dN i

tdNjt+τ )− E(dN i

t)E(dN jt+τ )

)

K ijkdt =

∫ ∫

τ,τ ′∈(−∞,+∞)

(E(dN i

tdNjt+τdN

kt+τ ′)

+ 2E(dN it)E(dN j

t+τ )E(dNkt+τ ′)

− E(dN itdN

jt+τ )E(dNk

t+τ ′)− E(dN itdN

kt+τ ′)E(dN j

t+τ )

− E(dN jt+τdN

kt+τ ′)E(dN i

t)),

Page 60: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

and

Λi =d∑

m=1

R imµm

C ij =d∑

m=1

ΛmR imR jm

K ijk =d∑

m=1

(R imR jmC km + R imC jmRkm + C imR jmRkm − 2ΛmR imR jmRkm)

Why order three and not two?

integrated covariance (order two) contains only symmetricinformation, and is thus unable to provide causal information

the skewness of the process breaks the symmetry between past andfuture so to uniquely fix G

Page 61: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Our algorithm [NPHC: Non Parametric Hawkes Cumulant]

Compute estimators of d2 third-order cumulant components

{K iij}1≤i,j≤d (not d3 !). Put it in K c

FindR ∈ argminR‖K c(R)− K c‖2

2

using a first-order stochastic gradient descent algorithm (AdaGrad inour case)

SetG = I − R

−1

Page 62: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

A toy example

(a) HMLE with β = β2 (b) HMLE with β = β1 (c) NPHC

(d) True G

Page 63: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Experiments with MemeTracker dataset

keep the 100 most active sites

contains publication times of articles in many websites/blogs, withhyperlinks

roughly 8 millions events

Use hyperlinks to establish an estimated ground truth for thematrix G

Page 64: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Metrics

Relative Error

RelErr(A,B) =1

d2

i,j

(|aij − bij |/|aij |+ |bij |1aij=0

)

Mean Kendall Rank Correlation

MRankCorr(A,B) =1

d

d∑

i=1

RankCorr([ai•], [bi•]),

where

RankCorr(x , y) =1

d(d − 1)/2(Nconcordant(x , y)− Ndiscordant(x , y))

with Nconcordant(x , y) = number of pairs (i , j) s.t xi > xj and yi > yjor xi < xj and yi < yj and Ndiscordant(x , y) defined conversely

Page 65: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Results on MemeTracker

Comparable relative error (exact values hard to recover), but verystrong improvement in rank correlation

Method HMLE (β1) HMLE (β2) HMLE (β3) NPHC

RelErr 0.153 0.154 0.156 0.147MRankCorr 0.035 0.032 0.029 0.184

Favourably hand chosen parameters for parametric Hawkesβ1 = 4.2 · 10−4 β2 = 8.3 · 10−4 β3 = 1.6 · 10−3

Page 66: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Cumulants matching for MHP

Results on Power Law kernels

A numerically very difficult problem

Very strong improvement over the parametric Hawkes withexponential kernels (impossible to fit polynomial kernels on a largedataset)

Method HMLE (β = β2) HMLE (β = β1) NPHC

RelErr 0.200 0.106 0.013MRankCorr 0.19 0.32 0.47

Page 67: Statistical Learning with Hawkes Processes and new Matrix … · 2021. 5. 27. · Genomics[Reynaud-Bouret and Schbath 2010] High-frequency Finance[Bacry et al. 2013] Terrorist activity[Mohler

Thank you!


Recommended