Risk Process based on General
Compound Hawkes Process and its
Implementation with Real Datatext
Gabriela Zeller
University of Calgary
Calgary, Alberta, Canada
'Hawks Seminar' Talk
Dept. of Math. & Stat.
Calgary, Canada
August 07, 2018
Outline of Presentation
• The Classical Risk Model
• Hawkes Process Background
• RMGCHP: Theoretical Results
• RMGCHP: Implementation with Real Data
• Evaluation and Next Steps
• References
The Classical Risk Model
This section gives some mathematical background on the classical risk
model, such as common assumptions and approaches.
The Cramer-Lundberg Model - Background
The classical risk model or compound-Poisson risk model was
introduced in 1903 by Filip Lundberg and has the following form:
R(t) = u+ ct−Nt∑i=1
Yi
where u denotes the initial capital, c denotes the (continuous)
premium rate and the number of claims in the interval [0, t) is a
homogeneous Poisson process Nt with rate λ.
The claims Yi are i.i.d. positive random variables with distribu-
tion function G and mean µG independent of (Nt).
The Cramer-Lundberg Model - Computations
Quantities of interest that have been extensively studied:
Ruin time: τ = inf{t > 0 : R(t) < 0} where inf∅ =∞Probability of ruin in [0, t):Φ(u, t) = P (τ ≤ t|R(0) = u) = P ( inf
0<s≤tR(s) < 0)
Probability of ultimate ruin:
Φ(u) = limt→∞
Φ(u, t) = P (inft>0
R(t) < 0)
For practical purposes, it is sometimes more convenient to use
the survival probability: δ(u) = 1−Φ(u)Severity of ruin: |R(τ)|Distribution of the time to ruin (given ruin): P (τ < t|τ <∞)Net Pro�t Condition:
When is the mean income strictly larger than the mean out�ow?
Premium Principle:
How should the premium be set against the insurance risk?
The Cramer-Lundberg Model - Net Pro�t Condition
The �rst interest of an insurance company is to avoid a state
where ruin occurs with probability 1.
• If c < λµG, ruin is unavoidable, that is Φ(u) = 1.
• If c = λµG ruin also occurs with probability 1.
(This result requires some deep theory of random walks (e.g.
Mikosch (2000))
• Thus, the obvious condition for solvency of the company is
the net pro�t condition c > λµG which implies the premium
rate to be c = (1+θ)λµG where θ is called the safety loading.
The Cramer-Lundberg Model - Ruin Probability
The following is known for the survival probability (see e.g. As-mussen and Albrecher (2010)):
Theorem 1 The survival probability δ(u) is continuous and dif-ferentiable everywhere (except the countable set where G is notcontinuous) and satis�es the following integro-di�erential equa-tion:
cδ′(u) = λ[δ(u)−u∫
0
δ(u− y)dG(y)] (1)
For a general distribution G, equation (1) is not analytically solv-
able.
The Cramer-Lundberg Model - Ruin Probability
However, for claims following an Exp(γ) distribution, the survival
(resp. ruin) probability can be explicitly computed as:
δ(u) = 1−λ
cγe(λc−γ)utextandtextΦ(u) =
λ
cγe(λc−γ)u
This convenience is one of the reasons why claims are often
assumed to be i.i.d. Exp(γ) distributed for insurance applications
(for the light-tailed case).
The Cramer-Lundberg Model - Ruin Probability
As equation (1) cannot be solved analytically for the general
case, this has motivated seeking bounds for the ruin probability.
To this end, Lundberg introduced the adjustment coe�cient
R > 0 as the unique positive root of
h(r) = 0
where h(r) = λ(MY (r)− 1)− crwhere MY (r) = E[erY1] is the m.g.f. of (Yi) which is assumed to
be �nite for all 0 < r < γ ≤ ∞ with limr→γ−
MY (r) =∞.
The intuition here is that for r ∈ R such that MY (r) <∞ we can
show that the process {e−rRt−h(r)t}t≥0 is a martingale.
The Cramer-Lundberg Model - Ruin Probability
The adjustment coe�cient is connected to the ruin probability
through the following theorem (see e.g. Asmussen and Albrecher
(2010)):
Theorem 2 (Lundberg Inequality)
Φ(u) ≤ e−Ru, text∀u ≥ 0
where R is the adjustment coe�cient.
For the special case of Exp(γ)-distributed claim sizes, the in-
equality gives Φ(u) ≤ e(λc−γ)u
which is related closely to the net pro�t condition c > λγ in this
case.
The Cramer-Lundberg Model - Ruin Probability
Using the adjustment coe�cient, we can furthermore describe
the asymptotic behavior of the ruin probability.
Theorem 3 Assume that the adjustment coe�cient R exists
and that
λ
c
∞∫0
xeRx(1−G(x))dx <∞
Then
limu→∞Φ(u)eRu =
c− λµGλM ′Y (R)− c
The Cramer-Lundberg Model - Di�usion Approximation
For a general distribution function, ruin probabilities can be es-
timated using the following di�usion approximation.
Proposition 1 Let (R(n)t ) be a sequence of Cramer-Lundberg
processes with initial capital u, claim arrival intensities λ(n) = nλ,
claim size distributions G(n)(x) = G(√nx) and premium rates
c(n) = (1 +c− λµGλµG√n
)λ(n)µ(n)G = c+ (
√n− 1)λµG
Let µG =∞∫0ydG(y) and assume that µ2,G =
∞∫0y2dG(y) < ∞.
Then
R(n)t
d→ (u+Wt)
where (Wt) is a (c−λµG, λµ2,G)-Brownian motion and the conver-
gence is in distribution in the topology of uniform convergence
on �nite intervals.
The Cramer-Lundberg Model - Di�usion Approximation
Let τ(n) denote the ruin time of (R(n)t ) and
τ = inf{t ≥ 0 : u+Wt < 0} the ruin time of the di�usion limit.
Proposition 2 Let (R(n)t ) and (Wt) be as above. Then
limn→∞P [τ(n) ≤ t] = P [τ ≤ t]
and
limn→∞P [τ(n) ≤ ∞] = P [τ ≤ ∞]
The Cramer-Lundberg Model - Di�usion Approximation
The idea is to approximate P [τ(1) ≤ t] by P [τ ≤ t] (�nite ruin
probability) and P [τ(1) ≤ ∞] by P [τ ≤ ∞] (in�nite ruin probabil-
ity).
Thus, we use the following result for the Brownian motion:
Proposition 3 Let (Wt) be a (m,σ2)-Brownian motion with
m > 0 and τ = inf{t ≥ 0 : u+Wt < 0}.Then
P [τ <∞] = e−2umσ2
and
P [τ ≤ t] = 1−Φ(mt+ u
σ√t
) + e−2umσ2 Φ(
mt− uσ√t
)
The Cramer-Lundberg Model - Extensions
The Cramer-Lundberg risk model is a convenient foundation for
classical risk theory, but it does not re�ect the dependencies of
incoming claims in reality. Thus, several extensions have been
proposed:
The Sparre-Andersen model / renewal risk model introduced in
1957 replaces the Poisson process with a general renewal pro-
cess Nt, thus allowing for claim inter-arrival times with arbitrary
distribution functions. However, the lengths of the intervals be-
tween subsequent arrivals stay independent.
More recently, risk models using a Cox process (with Poisson shot
noise) have been studied as they incorporate time-dependent in-
tensity in�uenced by exogenously caused jumps ("shocks", envi-
ronmental factors).
The Cramer-Lundberg Model - Extensions
However, it has been observed that e�ects like contagion and
clustering in �nancial contexts are often caused endogenously.
Therefore, Hawkes processes have gained attention due to their
ability to re�ect endogenously caused jumps of the intensity func-
tion.
Hawkes processes have e.g. been successfully applied to con-
struct stock price models including �nancial contagion, to model
mid-price changes in high-frequency trading and in order book
�ow modelling (Bacry (2015) gives a good overview).
Hawkes Processes
This section gives an introduction to Hawkes processes.First, we give the de�nition, explain the conditional intensity function and
highlight the immigration-birth-representation of a Hawkes process.We then focus on the special case of a Hawkes process with exponentiallydecaying intensity and give some properties of the number of jumps of such
a process over a �xed interval.
The last part of this section focuses on simulation, parameter estimation
and goodness of �t testing of a Hawkes model.
Hawkes Process: Counting Process
The Hawkes process introduced by Hawkes (1971) is a simple
point process that can model a sequence of arrivals over time,
e.g. trade orders, bank defaults or incoming claims.
The counting process N(t) refers to the number of arrivals over
time, and the corresponding point process (t1, t2, ...) refers to the
arrival times (thus the "jumps" of N(t)).
Consider a counting process (N(t) : t ≥ 0) with history
(H(t) : t ≥ 0) that satis�es:
P(N(t+ h)−N(t) = m|H(t)) =
λ∗(t)h+ o(h), m = 1
o(h) m > 1
1− λ∗(t)h+ o(h), m = 0
Hawkes Process: Counting Process
Hawkes Process: Conditional Intensity Function
The Hawkes process has self-exciting property, clustering e�ect
and long memory. This is re�ected in the conditional intensity
function:
λ∗(t) = λ+
t∫0
µ(t− s)dN(s)
where λ > 0 is the background intensity and µ > 0 is the exci-
tation function describing how much the intensity is a�ected by
past jumps.
Using the observed sequence of arrival times (t1, t2, · · · , tk) up to
time t, the conditional intensity can be written as:
λ∗(t) = λ+∑ti<t
µ(t− ti)
Hawkes Process: Conditional Intensity Function
We can see that each new arrival causes the conditional intensity
function to jump up instaneously, then decay back to the back-
ground intensity until it jumps again at the next arrival. The
intensity depends on the whole past history of the process.
Platzhalter
Hawkes Process: Conditional Intensity Function
Of course we can analogously de�ne a multi-dimensional Hawkes
process:
De�nition 1 A Hawkes process is a counting process Nt such
that the intensity vector can be written as
λit = µi +D∑j=1
∫φij(t− t′)dNj
t′ (2)
where the quantity µ = {µi}Di=1 is a vector of exogenous in-
tensities and Φ(t) = {φij(t)}Di,j=1 is a matrix kernel which is
component-wise positive (φij(t) ≥ 0 for each 1 ≤ i, j ≤ D),
component-wise causal (if t < 0, φij(t) = 0 for each 1 ≤ i, j ≤ D)and each component φij(t) belongs to the space of L1-integrable
functions.
Hawkes Process: Immigration-Birth-Representation
A Hawkes process X can be represented as a Poisson cluster
process with the following structure:
a) Immigrants (cluster centers) are distributed according to a
homogeneous Poisson process I with points Xi ∈ (0,∞) and in-
tensity λ > 0.b) Each immigrant (of generation 0) Xi generates a cluster
Ci = CXi, a random set with the following branching structure:
Given generations 0,1, ..., n ∈ Ci, each Y ∈ Ci (of generation n)generates a Poisson process on (Y,∞) of o�spring (of generationn+ 1) with intensity function µ(· −Y ), where µ : (0,∞)→ (0,∞]is a non-neg. Borel function called fertility rate.
c) Given the immigrants, the centered clusters
Ci −Xi = {Y −Xi : Y ∈ Ci} for Xi ∈ I are i.i.d. and independent
of I.
d) X is the union of all clusters⋃iCi.
Hawkes Process: Immigration-Birth-Representation
Let 0 < n :=∞∫0µ(s)ds < 1.
If an immigrant enters the system at time ti ∈ R, they produce
o�spring at rate µ(t − ti) at future times t > ti. Their o�-
spring (called �rst generation) again produces o�spring (second
generation) and so on - members of all generations are called
descendants of the original arrival.
Let Zi denote the random number of o�spring in the nth gener-
ation (where Z0 = 1 denotes the immigrant).
Then E[Zi] = ni and the expected number of descendants for
one immigrant is
E[∞∑i=1
Zi] =∞∑i=1
E[Zi] =∞∑i=1
ni =
n
1−n, n < 1
∞ n ≥ 1
Hawkes Process: Immigration-Birth-Representation
We can see that the condition n < 1 avoids explosion of the
process (where one immigrant would generate in�nitely many
children).
Most properties of the Hawkes process rely on this so-called sta-
tionarity which we will always assume to hold from now on.
Note that for n ∈ (0,1), the branching ratio n can be interpreted
as the probability that a random arrival was generated endoge-
nously (a child) as we can look at the ratio of descendants over
the whole "family" (descendants + original immigrant):
E[∞∑i=1
Zi]
1 + E[∞∑i=1
Zi]=
n1−n
1 + n1−n
= n
Hawkes Process: Immigration-Birth-Representation
The immigration-birth representation is interesting for us due to
its interpretation in the risk model context:
We observe standard claims which arrive according to the points
of I and trigger other claims according to the branching struc-
ture described before.
The fertility rate is mon. decreasing, so the process has a self-
exciting structure.
This risk process is closely related to the shot-noise Cox model
(doubly stochastic Poisson model) studied e.g. in Albrecher
(2006), with the main di�erence that the other model incorpo-
rates time-dependent intensity in�uenced by exogenously caused
jumps ("shocks", environmental factors) as opposed to endoge-
nously caused clustering in the Hawkes model.
Hawkes Process: Compensator
De�nition 2 (Compensator) For a counting process N(·) the
non-decreasing function
Λ(t) =
t∫0
λ∗(s)ds
is called the compensator of the process.
Note that M(t) := N(t)− Λ(t) is a local H(t)-martingale.
Exponential Hawkes Process: De�nition
We now want to make a concrete choice for the excitation func-
tion of the Hawkes process, where we choose the most commonly
used exponential decay, thus µ(t) = αe−βt.This might seem restrictive (and it is), but until now (almost)
all applications of Hawkes processes use this speci�cation as it
simpli�es many theoretical derivations as we will see in the fol-
lowing. Thus, from now on our conditional intensity function
will be:
λ∗(t) = λ+ α∑ti<t
e−β(t−ti)
Thus, each arrival makes the intensity immediately jump up by
α and over time the impact of the arrival decays exponentially
at rate β.
Exponential Hawkes Process: De�nition
For the special case of exponentially decaying intensity, given
an initial condition λ∗(t) = λ0, the conditional intensity process
satis�es the SDE
dλ∗(t) = β(λ− λ∗(t))dt+ αdN(t) (3)
Also note that the joint process X(t) = (λ∗(t), N(t)) is a Markov
process on the state space D = R+×N which allows Da Fonseca
et al. (2014) to derive the useful results about properties of
the number of jumps of a Hawkes process over a �xed interval
described in the following.
Exponential Hawkes Process: Properties
Proposition 4 Given a Hawkes process X(t) = (λ∗(t), N(t)) with
dynamic given by (3), the long-run expected value of the number
of jumps during an interval of length τ is given by:
limt→∞
E[N(t+ τ)−N(t)] =λ
1− α/βτ = lim
t→∞E[λ∗(t)]τ (4)
The variance is given by:
V (τ) = limt→∞
E[(N(t+ τ)−N(t))2]− E[N(t+ τ)−N(t)]2 (5)
=λ
1− α/β(τ(
1
1− α/β)2 + (1− (
1
1− α/β)2)
(1− e−τ(β−α))
β − α(6)
The covariance for two non-overlapping intervals of length τ withlag δ > 0 is given by:
Cov(τ, δ) = limt→∞
E[(N(t+ τ)−N(t))(N(t+ 2τ + δ)−N(t+ τ + δ))]
(7)
− E[N(t+ τ)−N(t)]E[N(t+ 2τ + δ)−N(t+ τ + δ)]
(8)
=λβα(2β − α)(e(α−β)τ − 1)2
2(α− β)4e(α−β)δ (9)
Note that taking the limit for t → ∞ (thus putting the process
into its long-run stationary regime) to simplify dependence with
respect to the initial value λ0 requires again the stability of the
process, which for exponential intensity means∞∫0αe−βsds = α
β < 1.
Exponential Hawkes Process: Properties
Proposition 5 A direct consequence from the last result is the
autocorrelation function of the number of jumps over intervals
of length τ separated by a time lag of δ:
Acf(τ, δ) =e−2βτ(eατ − eβτ)2α(α− 2β)
2(α(α− 2β)(e(α−β)τ − 1) + β2τ(α− β))e(α−β)δ
(10)
Note that this expression is always positive for α < β (stationarity
condition) and it is exponentially decaying with the lag δ.
Exponential Hawkes Process: Simulation
In order to simulate a Hawkes Process, there are di�erent pos-
sible methods. We elect to use the modi�ed thinning algorithm
introduced in Ogata (1981) and described again in Laub et al.
(2015).
The original thinning algorithm was used by Lewis et al. (1979)
to simulate a non-homogeneous Poisson process with time-dependent
rate λ(t) by generating a homogeneous Poisson process with rate
M > λ(·) and probabilistically removing points so that the re-
maining points satisfy the intensity λ(t).
An analogous approach that requires updating the upper bound
M during the simulation can be used to simulate a Hawkes pro-
cess.
Exponential Hawkes Process: Simulation
text
Exponential Hawkes Process: Simulation
text
Exponential Hawkes Process: Parameter Estimation
Given a set of arrival times (t1, t2, · · · , tk) assumed to come from
a Hawkes process, we would like to generate parameter estimates
(λ, α, β) by the method of maximum likelihood estimation. We
start by citing the result from Daley, Vere-Jones (2003):
Theorem 4 (Hawkes Process Likelihood) Let N(·) be a reg-ular point process on [0, T ] for some �nite T > 0 and let t1, · · · , tkbe a realisation of N(·) over [0, T ]. Then the likelihood L of N(·)is expressible in the form
L = (k∏i=1
λ∗(ti))exp(−T∫
0
λ∗(s)ds) (11)
Exponential Hawkes Process: Parameter Estimation
Given the likelihood function from (11), the log-likelihood for
the interval [0, tk] is given as
l =k∑i=1
log(λ∗(ti))−tk∫
0
λ∗(s)ds =k∑i=1
log(λ∗(ti))− Λ(tk) (12)
For a Hawkes process with exponential decay, the compensator
can be explicitly computed as
Λ(tk) = λtk −α
β
k∑i=1
[e−β(tk−ti) − 1] (13)
Exponential Hawkes Process: Parameter Estimation
In order to make the computation feasible, the term
A(i) =
0 i = 1i−1∑j=1
e−β(ti−tj) = e−β(ti−ti−1)(1 +A(i− 1)) i ∈ {2, · · · , k}
is introduced, such that the log-likelihood (12) of a Hawkes pro-
cess with exponentially decaying intensity is given by
l =k∑i=1
log(λ+ αA(i))− λtk +α
β
k∑i=1
[e−β(tk−ti) − 1] (14)
Exponential Hawkes Process: Parameter Estimation
In general, the maximum likelihood estimation will be very ef-
fective and its consistency, asymptotic normality and e�ciency
were proved in Ogata (1978).
However, for real datasets there are several signi�cant challenges
such as bias for small sample sizes, high number of local optima
and O(k) complexity for large sample sizes which made Filiminov
et al. (2013) state that
"Our overall conclusion is that calibrating the Hawkes process
is akin to an excursion within a mine�eld
that requires expert and careful testing before any conclusive
step can be taken."
Exponential Hawkes Process: Goodness of Model Fit
After estimating parameters, the next important step is assessing
the goodness of �t of the Hawkes process model for real data.
An essential result for this is the following theorem from e.g.
Brown et al. (2002).
Theorem 5 (Random Time Change Theorem) Let {t1, t2, · · · , tk}be a realisation over time [0, T ] from a point process with con-ditional intensity function λ∗(·). If λ∗(·) is positive over [0, T ]and Λ(T ) < ∞ a.s. then the transformed points {t∗1, · · · , t
∗k} =
{Λ(t1), · · · ,Λ(tk)} form a Poisson process with unit rate.
As we know the closed form of the compensator (13), we can
test the quality of the parameter estimation by transforming the
original timepoints and performing standard �tness tests for a
unit rate Poisson process on the transformed datapoints.
Exponential Hawkes Process: Goodness of Model Fit
As suggested in Laub et al. (2015), we use the following steps:
• QQ-plot of tranformed interarrival times {t∗1, t∗2−t∗1, · · · } against
Exp(1)-distribution
• Independence plot of the points (Ui+1, Ui), where
Ui = F (t∗i − t∗i−1) = 1− e−(t∗i−t
∗i−1)
• Checking for autocorrelation in the sequence of transformed
interarrival times
Risk Model with Hawkes Process and RMGCHP
Now we combine the �rst two sections by studying a risk model whereclaims arrive according to a Hawkes process.
First, we mention some results obtained in this area by past work. We then
introduce the Risk Model with General Compound Hawkes Processes
(RMCGHP) suggested by Swishchuk (2017) and the theoretical results
obtained for it (Law of Large Numbers and Functional Central Limit
Theorem).
Risk Model with Hawkes Process
The �rst work to consider a risk model with Hawkes claims ar-
rivals was Stabile et al. (2010), who derive the asymptotic be-
havior of in�nite and �nite horizon ruin probabilities and asymp-
totically e�cient simulation laws using that compound Hawkes
process ful�ls a large deviation principle and assuming light-tailed
claims.
Their work was extended by Zhu (2013) who considered (subex-
ponential) heavy tailed claims.
Risk Model with Hawkes Process
Dassios and Zhao (2012) consider a risk process with the arrival
of claims modelled by a dynamic contagion process, generalising
the Hawkes process and the Cox process with shot noise intensity
and thus including both self-excited and externally excited jumps.
They derive generalisations of the Cramer-Lundberg inequality,
Lundbergs equation, some asymptotics as well as bounds for the
probability of ruin with special attention on the case of exponen-
tial jumps.
Dassios and Jang (2012) study a bivariate shot noise self-excitingprocess for insurance, including a constant rate of exponential
decay that could be interpreted as the time value of money.
They derive theoretical distributional properties and use numer-
ical examples to show that this point process could be used for
the modelling of discounted aggregate losses from catastrophic
events.
Risk Model with Hawkes Process
Cheng and Seol (2018) derive di�usion approximations and thus
expressions for the ruin probabilities of risk models with Hawkes
claims arrivals, providing numerical examples for exponential and
Gamma-distributed jumps.
They construct the di�usion approximation analogously to the
classical case and �nd that the di�usion limit is a Gaussian pro-
cess that can be decomposed into a centered Gaussian process
and an independent Brownian motion.
Contrary to the classical case, the variance function of the dif-
fusion limit is nonlinear in t in general, and can be computed
explicitly for a Hawkes process with exponential decay.
Risk Model Based On General Compound Hawkes Process
As a generalisation of the classical risk model, Swishchuk (2017)
proposes a risk model with general compound Hawkes process
(RMGCHP):
R(t) = u+ ct−Nt∑k=1
a(Xk) (15)
where u denotes the initial capital, c denotes the (continuous)
premium rate and the number of claims in the interval [0, t) is a
Hawkes process Nt.
The claim sizes Xk follow a continuous-time Markov chain on
state space X = {1, ..., n} independent of Nt and a(x) is a contin-
uous and bounded function on X. Special cases of this RMGCHP
would be a(Xk) := Xk following a Markov Chain and a(Xk) := Xki.i.d.
RMGCHP: Theoretical Results
The �rst important result of Swishchuk (2017) is a Law of Large
Numbers for the RMGCHP:
Theorem 6 (LLN) Let R(t) be the risk model de�ned in (15),
and let Xk be a Markov Chain with state space X and stationary
probabilities π∗n. We suppose that 0 < µ =∞∫0µ(s)ds < 1. Then
limt→∞
R(t)
t= c− a∗
λ
1− µ(16)
where a∗ =∑k∈X
a(k)π∗k
RMGCHP: Theoretical Results
From this Law of Large Numbers follow the net pro�t condition
and premium principle:
Corollary 1 (Net Pro�t Condition)
c > a∗λ
1− µ(17)
Corollary 2 (Premium Principle)
c = (1 + θ)a∗λ
1− µ(18)
where θ is the safety loading.
RMGCHP: Theoretical Results
The second important result in Swishchuk (2017) is the following
Functional Central Limit Theorem:
Theorem 7 (FCLT) Let R(t) be the risk model de�ned in (15),
and Xk be an ergodic Markov Chain with stationary probabilities
π∗n. We suppose that 0 < µ =∞∫0µ(s)ds < 1 and
∞∫0sµ(s)ds < ∞.
Then:
limt→∞
R(t)− (ct− a∗N(t))√t
D= σΦ(0,1)
(or in Skorokhod topology (see Skorokhod (1965))):
limn→∞
R(nt)− (cnt− a∗N(nt))√n
D= σW (t) (19)
where Φ(·, ·) is the std. Normal cdf and W(t) is a std. Wienerprocess.
σ := σ∗√λ/(1− µ), (σ∗)2 :=
∑i∈X
π∗i ν(i)
a∗ :=∑i∈X
π∗i a(i), b(i) := a∗ − a(i)
νi := b(i)2 +∑j∈X
(g(j)− g(i))2P (i, j)
− 2b(i)∑j∈X
(g(j)− g(i))P (i, j)
g := (P + Π∗ − I)−1(b(1), ..., b(n))T
where P is the transition matrix for Xk and Π∗ is the matrix ofstationary probabilities of P .
RMGCHP: Theoretical Results
The FCLT allows us to approximate the risk process R(t) by the
jump-di�usion-process D(t):
R(t) ≈ u+ ct−N(t)a∗+ σW (t) := u+D(t)
where a∗ and σ are de�ned as above, N(t) is a Hawkes process
and W (t) is a standard Wiener process.
The rate of approximation is given by
E|R(t)− (ct− a∗N(t))− σW (t)| ≤1√tC(c, a∗, σ, λ, µ)
or
E|R(tn)− (cnt− a∗N(nt))− σW (t)| ≤1√nC(c, a∗, σ, λ, µ, T )
from Swishchuk (2000).
RMGCHP: Theoretical Results
We use the di�usion approximation to calculate the ruin proba-
bility in a �nite time interval and the ultimate ruin probability.
Theorem 8 (Finite horizon ruin probability)
Ψ(u, τ) = Φ(−u+ (c− a∗λ/(1− µ))τ
σ√τ
)
+ e−2(c−a∗λ/(1−µ))
σ2 uΦ(−
u− (c− a∗λ/(1− µ))τ
σ√τ
)
Theorem 9 (Ultimate ruin probability)
Ψ(u) = e−2(c−a∗λ/(1−µ))
σ2 u
Implementation of RMGCHP with Empirical Data
We would now like to implement the results for RMGCHP with real dataprovided by ERGO Group.
We �rst explain the dataset and data preparation steps.We then show that a Poisson model is not suitable for this data, thus we
estimate parameters for a Hawkes model and test the goodness of �t of theobtained model with respect to real claim arrival times.
Based on the obtained Hawkes model, we use the results from Swishchuk(2017) to calculate premium rates and ruin probabilities using the di�usion
approximation.
In order to test the appropriateness of the approximation, we then compare
the standard deviation of the empirical process (for large timescales) with
the theoretical di�usion coe�cients.
Empirical Data: Description
ERGO Group has kindly sent �ve very comprehensive data sets,
one for each reporting year 2010 to 2014.
These include claims that were reported to the company in the
respective year, although the claim might have occurred in an
earlier year and might cause payments from the reporting year
on into other years in the future. The data sets are about claims
from various kinds of 'Rechtsschutzversicherung' (insurance cov-
ering claims from legal disputes, i.e. lawyer fees).
Empirical Data: Description
Each data set consists of the following columns:
VNR: Policy Number (e.g. one client)
SCHDLNR: Running number of claim per policy (a client can
have several claims over the years)
SCHDDAT: Date of claim occurrence (this can be earlier than
the reporting year)
ZAHLUNG: Claim payment (one claim can result in several pay-
ments over time, this is the main point of interest)
ZHLGDATUM: Date of the claim payment (temporal structure
of this is main point of interest)
LEISTUNGSART: Type of insurance (there are many classes of
legal expenses insurance, in total 44 classes in this dataset)
STATUS: Has the case been closed or is it still open?
MELDJAHR: Reporting Year
Empirical Data: Description
Example of the dataset from the reporting year 2010:
text
The �rst three rows belong to the same policy and claim oc-
currence (on 28.03.2002). The claim was reported in 2010
and has led to three payments by the insurer on future dates
(06.05.,06.07. and 11.08.2010) with di�erent claim sizes (745.78,
502.78 and 4.43). The case has been closed.
Empirical Data: Preparation
For a selected subset we extract the columns ZAHLUNG (claim
payment) and ZHLGDATUM (claim payment date) and modify
ZHLGDATUM as follows:
For the reporting year 2010, the starting point 0 is chosen as
31.12.2009 and the time scale is in days. Each date is thus as-
signed a number (days from the start), e.g. 15.01.2010 → 15 or
03.02.2010 → 34.
For �tting data from the reporting year 2011 only, the date
31.12.2010 is chosen as 0 and so on.
For �tting data from all reporting years simultaneously, the start-
ing point is 31.12.2009 and numbers are assigned continuously,
i.e. 01.01.2011 → 366.
Empirical Data: Preparation
On some days, there is more than one occurrence which is im-
possible for a simple point process. In this case, occurrences are
distributed uniformly (non-random) over the day, e.g. if there
are 3 occurrences on day 15, they are assigned 15, 15.33 and
15.66.
This is questionable, but there is no precedent in literature for
insurance data and distributing occurrences randomly uniformly
over the day (as has been done over milliseconds for �nancial
data) leads to very unstable parameter estimations over each
run.
For the following demonstration, we choose the dataset of claims
of type "Firmen-Arbeitsschutz" which have been reported in the
years 2010 to 2014 and which have occurred exactly three years
before their reporting (thus between 2007 and 2011).
This dataset has 1205 events over a period of T = 2882 days.
Empirical Data: Check of Poisson Assumption
First, we plot the number of payment occurrences per week (7
days) over the whole time period and inspect if clustering can
be seen. text
Empirical Data: Check of Poisson Assumption
To justify the use of a Hawkes process, we next plot the inter-
arrival times of claim occurrences against an exponential distri-
bution in a QQ-plot. If the data followed a memoryless Poisson
distribution, we should see a good �t.
→ At this point, we can conclude that a classical risk model
would not be appropriate.
Empirical Data: Parameter Estimation
We �t parameters λ, α, β using Maximum Likelihood Optimiza-
tion where we constrain the parameters by (0,0,0) and (500,500,500)
and vary starting values for each parameter between 0 and 10.
This seems reasonable as λ for a Hawkes process should be
smaller than the "Poisson" rate of 12052882 = 0.4181 events per
time step and α and β should not be huge either due to the
rather low frequency of events per time step.
Parameter estimates are stable within this range as long as the
starting values of λ and β are not both much higher than the one
for α in which case α = 0 which would not be a Hawkes process.
The estimates for this speci�c dataset are λ = 0.0186, α =
0.0238, β = 0.0249.
We observe αβ< 1 indicating stability of the process, although α
and β are quite close together.
Empirical Data: Goodness of Fit
As a �rst goodness of �t test, we transform arrival times using
the closed-form compensator with the estimated parameters and
plot their interarrival times against an Exp(1)-distribution.
Note: In literature considering Hawkes process �tting to empirical
data, this criterion is never shown to test the goodness of �t (as
it is generally di�cult to get a good �t here).
Empirical Data: Goodness of Fit
To test independence of tranformed interarrival times, we plot
the points (Uk, Uk+1) as described above.
Ideally, we should see uniformly scattered points here.
Note: Again, this criterion is generally di�cult to meet for em-
pirical data.
Empirical Data: Goodness of Fit
As described above, it is generally di�cult to meet the standard
goodness of �t criteria for empirical data.
For most empirical datasets, the goodness of model �t as de-
scribed above per se is mediocre. Realistically, we should be
doubtful whether the data follows an exponential Hawkes pro-
cess (as this is a restrictive assumption).
However, even if the data does not strictly follow a Hawkes pro-
cess with exponential intensity, it might still be possible to de-
scribe the characteristics of the data well with a Hawkes process!
Empirical Data: Goodness of Fit
To this end, we compare the expected value and the autocorre-
lation of the number of jumps on an interval for the empirical
data with the theoretical values derived by Da Fonseca et al.
(2014).
We �nd that for all datasets the Hawkes process estimates the
number of jumps over any interval very well.
For some datasets, although the �t from the QQ-plot above
might not be very good, the Hawkes process almost perfectly
matches the autocorrelation for intervals of 7 (14,28) days and
time lags ranging from 7 to 180 (240) days.
Note that a Poisson process would assume the autocorrelation
to be 0 for all lags which is clearly not the case for empirical
data.
Empirical Data: Goodness of Fit
For our dataset, we compare the theoretical autocorrelation of
the number of jumps of a Hawkes process with λ = 0.0186, α =0.0238, β = 0.0249 (line) with the autocorrelation of the number
of jumps over the same intervals and lags for the empirical arrival
times (points). Interval lengths are 28 days and lags range from
7 to 180 by steps of 7.
Empirical Data: Goodness of Fit
We observe that the empirical autocorrelation is decreasing with
the time lag (as it should be for a Hawkes process) but unfor-
tunately the decrease is not exponential. The slow decay might
indicate a Hawkes process with power law decay might be more
suitable. However, in past literature, many authors state that
although empirical evidence generally favours power law decay,
they choose to work with exponential decay as it is analytically
much more convenient. As our work is the �rst one with em-
pirical insurance data, using exponential decay could thus be
justi�ed.
This slow autocorrelation decay however favours |β−α| ≈ 0 which
distorts the ratio αβ away from the empirical branching ratio ('pri-
mary' claims/immigrants vs. 'secondary' claims/children).
Empirical Data: Risk Process
After �tting a Hawkes process to the empirical claim times, we
would like to simulate the risk process and implement the results
from Swishchuk (2017). Again, the RMGCHP is de�ned in (15)
as
R(t) = u+ ct−Nt∑k=1
a(Xk)
where u denotes the initial capital, c denotes the (continuous)
premium rate and the number of claims in the interval [0, t) is a
Hawkes process Nt.
The claim sizes Xk follow a continuous-time Markov chain on
state space X = {1, ..., n} independent of (Nt) and a(x) is a
continuous and bounded function on X. Special cases of this
RMGCHP would be a(Xk) := Xk following a Markov Chain and
a(Xk) := Xk i.i.d.
Empirical Data: Risk Process
We do all the following computations for the case of i.i.d. claim
sizes with two states as well as claims following a Markov Chain
with 2,3,4 and 10 states. In order to choose the function values
a(Xk), we follow the quantile-based approach used in Swishchuk
et al. (2017) described below.
For a Markov Chain with n states, we divide the empirical claim
sizes along n quantiles and assign to each 'section' the condi-
tional mean claim size. In order to obtain the transition matrix,
we count empirical occurrence frequencies of transitions from
one claim size to another.
Empirical Data: Risk Process
For the example dataset, e.g. for four states, we would obtain
the conditional mean claimsizes
a = (a(1), a(2), a(3), a(4))
= (157.788,512.2633,1043.4839,2277.3009)
and transition matrix0.2947020 0.2384106 0.2450331 0.22185430.2748344 0.2748344 0.2384106 0.21192050.2408027 0.2575251 0.2575251 0.24414720.1926910 0.2325581 0.2524917 0.3222591
Empirical Data: Risk Process
In the next step, we can �nd the stationary distribution as
π∗ = (π∗1, π∗2, π∗3, π∗4) = (0.2508306,0.2508306,0.2483389,0.25)
Then we can �nd the value of a∗ required for the LLN (16) and
FCLT (19):
a∗ =∑k∈X
a(Xk)π∗k = · · · = 996.5322
Note that this value is very close to the mean claimsize of
996.5711 as should be expected for our choice of a(Xk) and
the stationary distribution above.
Empirical Data: Risk Process
As we do not have information on the initial capital u or on
the premium rate c, we set the initial capital for our dataset as
u = 100 and calculate the premium rate using the expected value
principle in (18) with a safety loading θ = 0.1:
c = (1 + θ)a∗λ
1− µ= 461.5339
Empirical Data: Risk Process
Given u and c, we can now plot the empirical risk process using
the actual claimsizes and claim occurrence times:
Note that the �nal capital at time T for the empirical process is
130272.5 (if ruin did not occur before time T here).
Empirical Data: Risk Process
Analogously, we can simulate risk processes using a Hawkes pro-
cess with the estimated parameters λ = 0.0186, α = 0.0238,
β = 0.0249 as arrival times and a simulated Markov Chain with
transition matrix P and claim sizes a(Xk) for the di�erent states:
Empirical Data: Risk Process
In order to evaluate how well the simulated risk process �ts the
empirical one, we could use the following criteria adapted from
Zhang (2016): First, we look at how well the simulated processˆR(t) matches the empirical one in terms of �uctuations:
S(L) =1
L
L∑i=1
max(Ri(t))−min(Ri(t))
max(R(t))−min(R(t))
where L is the number of simulated paths.
Furthermore, we consider how far the �nal capital of simulated
paths is from the empirical one by considering:
F (L) =1
L
L∑i=1
Ri(T )
R(T )
Empirical Data: Risk Process
We compare the results for a "classical" Poisson risk model with
i.i.d. claim sizes and three Hawkes models with Hawkes pro-
cess arrivals and i.i.d. claims (two states) and Markov Chain
claims (2,3,4,10 states) (as the results are very similar for all
Hawkes models, we only print 2 and 10 state case here). We use
L = 1000 simulations.
We can see that a Poisson model would underestimate �uctu-
ation whereas a Hawkes model (independent of the claim size
modelling) overestimates it. Furthermore, a Hawkes model un-
fortunately systematically overestimates the �nal capital (fc).
Empirical Data: Risk Process
Model S(N) F (N) Mean (fc) Std. (fc)
Poisson 0.40817 1.0335 134639.5 31063.27
Hawkes
(i.i.d.)
1.8706 3.6356 473626.1 449849.9
Hawkes (2-
state MC)
1.9065 3.6543 476054.1 470973.7
Hawkes
(10-state
MC)
1.8716 3.5501 462483.5 460574.8
Empirical Data: Risk Process
We would now like to implement the results from the FCLT and
di�usion approximation (19) derived in Swishchuk (2017). To
this end, we �rst compute the values of a∗, σ∗ and the di�usion
coe�cient σ = σ∗( λ1−α/β) for MC claims with di�erent numbers
of states.
Model a∗ σ∗ σ
Hawkes (i.i.d.) 996.5711 708.753 459.8908
Hawkes (2 states MC) 996.0189 718.3519 466.1193
Hawkes (3 states MC) 996.7685 832.8063 540.3856
Hawkes (4 states MC) 996.5322 898.3323 582.9037
Hawkes (10 states MC) 996.6928 1002.043 650.199
Note that a∗ stays constant (which makes sense considering our
choice of a(Xk)), but σ∗ and accordingly σ increase with the
number of states.
Empirical Data: Ruin Probabilities
Given the di�usion approximation from Swishchuk (2017) with
di�usion coe�cients σ calculated before, we can use (8) to cal-
culate ruin probabilities over intervals of length T .
We compare the obtained numbers with ruin probabilities ob-
tained from L = 1000 simulations of the risk process until time
T .
Model Theoretical RP Simulated RP
Hawkes (i.i.d.) 0.6724 0.176
Hawkes (2-state MC) 0.6993 0.199
Hawkes (3-state MC) 0.7502 0.182
Hawkes (4-state MC) 0.781162 0.19
Hawkes (10-state MC) 0.8199346 0.189
Empirical Data: Error Estimation
In order to understand the discrepancies between theoretical and
empirical ruin probabilities, we would like to assess the appropri-
ateness of the di�usion approximation. We proceed as suggested
in Swishchuk et al. (2017).
Given the FCLT
limn→∞
R(nt)− (cnt− a∗N(nt))√n
D= σW (t)
we would like to compare the standard deviation of the RHS
multiplied by√n, that is
√n√tσ∗
√λ
1−α/β to the counterpart on
the LHS, that is the standard deviation of the process
R(nt)− (cnt− a∗N(nt)) = u−N(nt)∑k=1
(a(Xk)− a∗) (20)
Empirical Data: Error Estimation
For the empirical dataset, we choose t to be our original time
scale of one day and �rst let n run from 7 to 1435 by steps of 7.
At each step nt, we compute the value of the process (R(int)−(cint− a∗N(int)))− (R((i− 1)nt)− (c(i− 1)nt− a∗N((i− 1)nt))),
thus we consider intervals of e.g. one week in the �rst step.
We then compare the standard deviation of these values to the
standard deviation theoretically obtained on the RHS of (19)
multiplied by√n.
Note that this approximation should be naturally only accurate
for large n, however due to our relatively short time frame of
2882 days, for large n the LHS of (19) is based on only few
observations.
Empirical Data: Error Estimation
The following plot shows the empirical standard deviation (points)
for di�erent models against the theoretical standard deviation
(lines).
The empirical standard deviation is only plotted for a model with
i.i.d. claims (two sizes), as the results look very similar for all
models on this scale.
Empirical Data: Error Estimation
We have to keep in mind that empirical standard deviations for
large values of nt are based on very few samples.
If we look at the standard deviations for a sequence of n from 1 to
35 (�ve weeks) in steps of 1 day, we obtain the following picture
where now all values are based on at least 80 observations.
The �rst plot shows the result for a Hawkes model with i.i.d.
claims (two sizes).
Empirical Data: Error Estimation
We now compare the results for Hawkes models with Markov
Chain claims with di�erent numbers of states.
Evaluation and Next Steps
In the following, I would like to summarize some concerns about
the results obtained so far and possible next steps:
1. Parameter Estimation for Hawkes process leads to good
model �ts (according to QQ-plot of transformed interarrival
times and independence plot) only for a minority of datasets.
Note: Most applications of Hawkes processes to empirical data
do not show the Hawkes process �t per se, but �nd other cri-
teria for model �t (i.e. autocorrelation �t or signature plot for
�nancial data).
We could thus use other criteria, e.g. good ruin probability ap-
proximations or error estimations for the di�usion approximation.
Evaluation and Next Steps
2. Data Description and Autocorrelation plot suggest that a
Hawkes process (with exponential decay) is not perfectly suit-
able.
As we can see from the empirical datasets, the response to an
initial claim arrival is not instantaneous, but there is usually a
time period between related claim payments (weeks or months
on a total timescale of seven years, thus we cannot make it close
to instantaneous by compressing the timescale).
Considering our work is the �rst one with Hawkes processes for
empirical insurance data, its use could probably still be justi�ed,
but we have to consider this when evaluating risk model �ts.
Next Steps: Power Law Kernel
One possibility is �tting a Hawkes process with power-law kernel
instead of an exponential kernel.
In this case, the conditional intensity function is
λ∗(t) = λ+∑ti<t
k
(t− ti + c)1+η
where k, c, η > 0.
In this case, interpretation of the parameters is not so straight-
forward, but roughly one could say that k corresponds to α de-
scribing the upward jump of the intensity caused by an arrival
(the magnitude of its in�uence), η corresponds to β determin-
ing how longlasting the impact of the arrival is and c describes
a temporal shift to keep the intensity bounded when (t − ti) is
close to 0.
Next Steps: Power Law Kernel
For a power law kernel we �nd that the stationarity condition is
kc−η
η< 1
and the compensator is given as
Λ(t) =
t∫0
λ∗(s)ds = λt+k
η
∑tj<t
(c−η − (t− tj + c)−η)
Next Steps: Power Law Kernel
Thus, we can deduce the loglikelihood function analogously as
for the exponential case in Laub et al. (2015) and given in (14)
and get:
l =n∑i=1
log(λ+ki−1∑j=1
(ti−tj+c)−(1+η))−λT+k
η
n∑i=1
((T−ti+c)−η−c−η)
where (t1, ..., tn) are the observed arrivals over the interval [0, T ].
Unfortunately, in this case there is no easy way to avoid the
double summation in the �rst part of the loglikelihood function
which leads the optimization to be very slow for a large num-
ber of arrivals. (In the exponential case, we used a recursive
computation derived by Ozaki (1979)).
Next Steps: Power Law Kernel
One startling observation so far for constrained MLE for a power
law kernel is that for simulated data, even given the correct pa-
rameters as starting values, the estimated parameters (λ, k, c, η)
are rather far away from the real ones, but the conditional in-
tensity function over time looks quite similar.
We simulate a Hawkes process with power law kernel and parame-
ters (λ, k, c, η) = (0.5,1,1,2) over an interval of [0,100]. Parame-
ters are estimated as (λ, k, c, η) = (0.3959,499.9999,2.8507,4.8385)
for an upper bound of 500 for each parameter. We plot the con-
ditional intensity function for both sets of parameters below.
Next Steps: Power Law Kernel
text
Next Steps: Power Law Kernel
For the empirical dataset from above, we calculate parameter
estimates (λ, k, c, η) using the lower bound (0,0,0,0) and varying
the upper bound for each parameter until 5000.We �nd that λ, c and η are mostly independent of the starting
values, but k is always estimated as the upper bound (due to
the long computation time, a large number of runs with di�er-
ent values was not feasible). We nevertheless plot the intensities
with bounds 500 and 5000 and compare them with the intensity
from the exponential kernel estimated before.
We �nd that on the overall time period [0,2882], the intensi-
ties look quite similar, however, if we zoom in on the period of
[0,200], we observe that a power law kernel would favor a faster
decay than an exponential kernel (which would contradict our
intention of using it). This might of course be caused by the
problems when estimating k and could be studied further.
Next Steps: Power Law Kernel
text
Next Steps: Power Law Kernel
text
Evaluation and Next Steps
3. Risk Model Computations and Ruin Probabilities are inaccu-
rate when comparing theoretical formulas and simulations.
This holds for all datasets and even for simulated data when
using the correct parameters as estimations.
A possible reason for this might be that α ≈ β, so the Hawkes
process is close to unstable and thus outcomes are very "volatile"
over simulations.
Evaluation and Next Steps
4. Claim Size Modelling with a Markov Chain is not consistent
with empirical process.
The way the empirical insurance claim portfolio is constructed
(various claims with more than one payment are bundled together
on the same timescale), claim sizes that directly follow each
other in the overall portfolio are not necessarily related as they
generally don't come from the same claim process. Thus, the
use of a Markov Chain that connects claim sizes is not really
justi�able. One possibility would be to use i.i.d. exponentially
distributed claims (as is commonly done in insurance literature).
Next Steps: Claim Sizes as i.i.d. Exp(γ)
The risk model would have the following form:
R(t) = u+ ct−Nt∑i=1
Yi
where u is the initial capital, c is the premium rate and the num-
ber of claims in the interval [0, t) is a Hawkes process Nt.
The claims Yi are i.i.d. Exp(γ)-distributed random variables with
mean µG = 1γ independent of (Nt).
Next Steps: Claim Sizes as i.i.d. Exp(γ)
In order to check whether this approach would be suitable for
our empirical data, we �rst plot the empirical c.d.f. of claimsizes
against an Exp(γ) distribution where we choose γ = 1mean(claimsizes)
.
Next Steps: Claim Sizes as i.i.d. Exp(γ)
As two further qualitative tests we could again use a QQ-plot
against the Exponential distribution and the probability integral
transform which should ideally lead to i.i.d. Unif [0,1] variables
(independence plot).
text
text
Both qualitative test lead us to believe that modelling claimsizes
as i.i.d. exponentially distributed would be a suitable approach
for this dataset.
Next Steps: Claim Sizes as i.i.d. Exp(γ)
If we compute the di�usion approximation for this model based
on the sequence of n = (1, ...,35) days, we get the following
result. Note that for this case, the value of σ∗ on the RHS of
equation (19) would be σ∗ = sd(Xk) =√V ar(Xk) = 1
γ which is
simply the mean of the empirical claimsizes.
Next Steps: Markov-Modulated Claimsizes
For other empirical datasets, the assumption of i.i.d. Exp(γ)
claimsizes does not seem to hold.
Thus, as a generalization, one could consider studying a risk
model where claimsizes are assumed to come from two di�erent
Exponential distributions, say Exp(γ1) and Exp(γ2), depending
on the state of an underlying (unobservable) Markov Chain (with
two states). Thus, we would have a process (Zt, Yt) where (Zt)
is a Markov Chain on the state space S = {1,2} and (Yt) are the
claimsizes of the risk model. The distribution of an incoming
claim at time t would only depend on the state of St and be
independent of other (Y·).
Next Steps: Markov-Modulated Claimsizes
A general case of such a semi-Markov risk model (for Poisson
arrivals) was studied in Reinhard (1984) and asympotic non-ruin
probabilities were obtained explicitly for the case above (two
states and Exponential distributions).
More recently Yu and Li (2005) gave non-ruin probabilities for
more general claim size distributions.
Asmussen (1986) studies risk theory in a Markovian environment,
studying a variety of methods for assessing the ruin probabilities,
in particular Cramér-Lundberg approximations and di�usion ap-
proximations with correction terms.
Next Steps: Di�usion Approximation
The very recent work of Cheng and Seol (2018) also studies a
di�usion approximation for a risk model with Hawkes claims ar-
rivals and exponentially distributed claimsizes. They construct
the di�usion approximation analogously as described for the clas-
sical risk model in Chapter 1 and �nd that the limit is a Gaussian
process with non-linear variance function (as opposed to the dif-
fusion towards a standard Brownian motion for the Poisson case).
Next Steps: Di�usion Approximation
They start with the risk model
R(t) = u+ ct−Nt∑i=1
Yi
where (Yi) are i.i.d. claims with E[Y1] = m1 < ∞ and E[Y 21 ] =
m2 < ∞, independent of the claims arrival process (Nt) which
is assumed to follow a stationary Hawkes process with intensity
λ∗(t) = λ+∑ti<t
µ(t− ti).
Let µ =∞∫0µ(t)dt <∞.
Next Steps: Di�usion Approximation
Similarly to the standard di�usion approximation in the "classi-
cal" case, they construct a sequence of processes (Rλ) (where λ
is the background rate of the Hawkes process) as
Rλ(t) = u+ cλt−Nλt∑
i=1
1√λYi
where cλ =√λm1
1−µ + k for a constant k > 0 and Nλt describes the
number of arrivals of a Hawkes process with background rate λ.
They study the limit for λ→∞.
Next Steps: Di�usion Approximation
Intuitively, we go from a process with few, big jumps to a process
with smaller jumps at a higher rate. To illustrate this, we choose
a Hawkes process with exponential intensity,
i.e. λ∗(t) = λ+∑ti<tαe
−β(t−ti) and µ = αβ .
We assume (Yi) to be i.i.d.Exp(γ)-distributed with mean m1 = 1γ .
We de�ne the sequence of risk processes
Rn(t) = u+ c(n)t−N
(n)t∑i=1
Y(n)i
where N(n)t is the number of arrivals of a Hawkes process with
background intensity nλ and (Y (n)i ) are i.i.d. Exp(
√nγ)-distributed,
i.e. E[Y (n)i ] =: m(n)
1 = m1√n.
Next Steps: Di�usion Approximation
The premium rate is set as c(n) = m11−α/β((1+θ)λ+
√n−1), where
θ is again the safety loading from the expected value principle
(note that for n = 1 we obtain the premium rate we had used
before for the "original" process).
We then let n→∞.
Next Steps: Di�usion Approximation
The following plot shows realisations of the risk process for dif-
ferent choices of n for a Hawkes process with initial parameters
(λ, α, β) = (1,2,5) and initial claimsizes according to an expo-
nential distribution with γ = 1/4. The initial capital is set as
u = 10 and the safety loading is θ = 0.1.
Next Steps: Di�usion Approximation
Cheng and Seol (2018) prove the following theorem:
Theorem 10 Let Xλt =
Nλt∑
i=1Yi be the aggregate claims process
and assume that µ(·) is a decreasing function with∞∫0tµ(t)dt <∞.
Then, as λ→∞,
Xλt −
λm1t1−µ√λ
→ G (21)
weakly in (D([0,∞) ,R), J1) where G is a mean-zero almost surely
continuous Gaussian process with covariance function (t ≥ s),
Cov(G(t), G(s)) = m21
t∫s
s∫0
φ(u−v)dvdu+m2s
1− µ+2m2
1
s∫0
t2∫0
φ(t2−t1)dt1dt2
(22)
where φ : [0,∞)→ [0,∞) satis�es the integral equation
φ(t) =µ(t)
1− µ+
∞∫0
µ(t+ v)φ(v)dv +
t∫0
µ(t− v)φ(v)dv (23)
As a result,
Rλt → u+ kt−G(t) (24)
Next Steps: Di�usion Approximation
For the special case of a Hawkes process with exponential inten-
sity, i.e. µ(t) = αe−βt, they compute explicitly (for t ≥ 0)
φ(t) =αβ(2β − α)
2(β − α)2e−(β−α)t
and thus
V ar(G(t)) = (β
β − αm2+
αβ(2β − α)
(β − α)3m2
1)t−m21αβ(2β − α)
(β − α)4(1−e−(β−α)t)
(25)
Next Steps: Di�usion Approximation
We choose the same parameters as above - (λ, α, β) = (1,2,5),(Yi) Exp(1/4), u = 10 and θ = 0.1 - and simulate 500 runs of
the risk process R(n)t for n = 100 over the interval [0,100]. At
each timestep, we compute the variance over 500 realisations
(black line) with the variance from (25) (red line) and �nd that
they are very similar.
Next Steps: Di�usion Approximation
Cheng and Seol (2018) further derive approximations for the
ruin probability based on the di�usion approximation, but unfor-
tunately they can only be computed numerically. In their paper,
they provide numerical examples for exponentially and gamma-
distributed claimsizes (based on simulated data).
Next Steps: Optimal Investment
Recall: Theorem 7 derived by Swishchuk (2017) allows us to
approximate the risk process R(t) by the jump-di�usion-process
D(t)
R(t) ≈ u+ ct−N(t)a∗+ σW (t) := u+D(t) (26)
text
where D(t) is N ((c − a ∗ λ1−µ)t, σ2t)-distributed, where σ is the
di�usion coe�cient from (19).
Next Steps: Optimal Investment
Recall: In his presentation during the Hawks Seminar on July
4, Mohamed brie�y mentioned the work of Browne (1995) on
optimal investment for an insurer who can invest in a risky asset
(St) following
dSt = St(µdt+ σdWt)
when the insurer's risk process is described by
R(t) = u+ ct+ βW1t (27)
where W1t is a Brownian motion such that E[WtW1t] = ρt and ρ
is the correlation coe�cient.
Next Steps: Optimal Investment
As in this model, the risk of bankruptcy cannot be eliminated,
they solve for the strategy that minimizes the probability of ruin
for an insurer with exponential utility function. They show that
this strategy is equivalent to maximizing utility from terminal
wealth, given there is no riskfree asset.
This work is interesting for us, as the representation of our risk
process in (26) is closely related to the risk process used in
Browne (1995) in (27). We would hope to use the same steps
applied from his work to �nd an optimal strategy for a risk model
based on Hawkes claims arrivals (which to the best of our knowl-
edge has not been addressed so far).
References
• Albrecher, H. et al. (2006). Ruin probabilities and aggre-grate claims distributions for shot noise Cox processes.
• Asmussen, S. (1986). Risk theory in a Markovian environ-ment
• Asmussen, S. and Albrecher, H. (2010). Ruin Probabilities.
• Bacry, E. et al. (2015). Hawkes processes in �nance.
• Brown, E. et al. (2002). The time-rescaling theorem and itsapplication to neural spike train data analysis.
• Browne, S.(1995). Optimal Investment Policies for a Firmwith a Random Risk Process: Exponential Utility and Mini-mizing the Probability of Ruin
References
• Cheng, Z. el al. (2018). Gaussian Approximation of a Risk
Model with Stationary Hawkes Arrivals of Claims.
• Cramer, H. (1955). Collective Risk Theory: A Survey of
the Theory from Point of view of the Theory of Stochstic
Processes.
• Da Fonseca, J. and Zaatour, R. (2014). Hawkes process:
Fast calibration, application to trade clustering, and di�usive
limit.
• Daley,D., Vere-Jones, D. (2013). An Introduction to the
Theory of Point Processes.
References
• Dassios, A. and Jang, J. (2012). A bivariate shot noise self-exciting process for insurance.
• Dassios, A. and Zhao, H. (2012). Ruin by Dynamic Conta-gion Claims.
• Filimonov, V. and Sornette, D. (2013). Apparent criticalityand calibration issues in the Hawkes self-excited point processmodel: application to high-frequency �nancial data.
• Hawkes, A. (1971). Point Spectra of Some Mutually ExcitingPoint Processes.
• Jaisson, T. and Rosenbaum, M. (2014). Limit theorems fornearly unstable hawkes processes.
References
• Laub, J. et al. (2015). Hawkes Processes.
• Lu, Y. and Li, S. (2005) On the Probability of Ruin in aMarkov-modulated Risk Model.
• Lundberg, F. (1903). Approximerad Framställning av San-nolikehetsfunktionen, Återförsäkering av Kollektivrisker.
• Mikosch, T. and Samorodnitsky, G. (2000). Ruin Probabilitywith Claims modelled by a stationary ergodic stable process.
• Ogata, Y. (1981). Information Theory.
• Reinhard, J. (1984). ON A CLASS OF SEMI-MARKOVRISK MODELS OBTAINED AS CLASSICAL RISK MOD-ELS IN A MARKOVIAN ENVIRONMENT.
References
• Stabile, G. et al. (2010). Risk Processes with Non-stationaryHawkes Claims Arrivals.
• Swishchuk, A. (2017). Risk Model Based on General Com-
pound Hawkes Processes.
• Swishchuk, A. et al. (2017). General Semi-Markov Model
for Limit Order Books.
• Swishchuk, A. et al. (2017, preprint) Compound Hawkes
Processes in Limit Order Books.
• Swishchuk, A. (2000). Random Evolutions and Their Appli-
cations: New Trends.
• Zhang, C. (2016). Modeling High Frequency Data Using
Hawkes Processes with Power-law Kernels.
• Zhu, L. (2013). Ruin probabilities for risk processes with
non-stationary arrivals and subexponential claims.
References
• Zhang, C. (2016). Modeling High Frequency Data Using
Hawkes Processes with Power-law Kernels.
• Zhu, L. (2013). Ruin probabilities for risk processes with
non-stationary arrivals and subexponential claims.
The End
Thank You!
Q&A time!