Engineering Applications of POT, GeV and Compound Poisson Models : A
Unified approach via Bayesian Thinking
Eric Parent, Jacques Bernier & Jean-Jacques Boreux
UMR 518 Math. Info. App. ENGREF/INRA/INAPG
équipe MOdélisation, Risque, Statististique, Environnement
Statistical Modeling of Extreme in Data Assimilation and Filtering ApproachesStrasbourg , 23-26 June, 2008
Some nice things to do with Exponentially marked Poisson process
• Application 1 : Fully Observed , Pickands’ POT!
• Application 2 : Max Observed, GeV (Gumbel)
• Application 3 : SumObservedinsteadof max• Application 3 : SumObservedinsteadof max
• Application 4 : App3+Go Hierarchical
Two parameters only for the trajectory of a marked Poisson Process
u0
Xi
Xi+1
Xi+2
Xn
0 T
[ ] ( )| , exp( )
!
n
T
TN n T T
n
µµ µ= = −
[ ] ( )0 0| , 1 exp ( )i iX x X u x uρ ρ≤ > = − − −[ ]0 0| , 1 exp(( / ) log(1 ( )) si 0i iX x X u x uρ ρ β β β≤ > = − − − ≠
Application 1 : Rains at Bar/Seine
The point Poisson process with exponential marks is
completely observed
Nj
ρµ
The exponentially marked point Poisson process is completely observed
ρ et µ parameters
for(i in 1 : nj)
for(j in 1 : m)
X ij
j
N et X observables
θ
Y
parameters
observable
Histogram of The total number of rain events
Expected values for A Poisson dist. with Average number = 4
Number of rainy eventsat Bar/Seine(15 July-15 August)
Water quantities during rainy events
Starting date of rainy events
Bayesian inference easy!The conjugacy« miracle »
µ
0.06
0.08
0.1
0.12
0.14
0.16
0.18
prio
r [ η
]
0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 50
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
α
prio
r [α
]
Poisson
Gamma prior
X
N
ρ
0 5 10 150
0.02
0.04
η
Gamma prior
Exponential
Quantity of water during each rainy event
Number of events
Application 2 : Max annual flow at Bar/Seineat Bar/Seine
The point Poisson processwith exponential marks is
incompletely observed
(only the max per year)
0 2000 4000 6000 8000 10000 120000
500
1000
1500débits
jours
Daily flows of river SEINE at Bar/SEINE
0 2000 4000 6000 8000 10000 120000
500
1000
1500
jours
crues
Yearly Max of river SEINE at Bar/SEINE
Floods of river SEINE at Bar/SEINE
Nj
ρµ
The Exponentially marked Poisson process is only partially observed
ρ and µ parameters
for(i in 1 : nj)
for(j in 1 : m)
X ij
j
N and X latent variables
Y observed
Y j =Max(Xij)
θ
Z
parameter
latent v.
[ ]θZ
Formulation
Hierarchical scheme
Z
Y
latent v.
observable
[ ]θ,ZY
[ ] [ ] [ ]∫ ×=z
dZZZYY θθθ ,
likelihood
Bayesian inference by Data Augmentation =« reconstruction »
[X|Y,N,ρ,µ] is a tuncated exp!
µ
0.06
0.08
0.1
0.12
0.14
0.16
0.18
prio
r [ η
]
0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 50
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
α
prio
r [α
]
PoissonPrior Gamma
X
N
ρ
0 5 10 150
0.02
0.04
η
Prior Gamma
Exponentiel
Event number
Y Annual Max
Quantity of water during each rainy event
µρ
Estimations on max of P.P. with Exp. marks (GUMBEL model)
µ
ρ
Corr(ρ,µ)=0.53
Estimations on max of P.P. with Exp. marks (GUMBEL model)
0ne century return period flood
Probable number of floods during year 1955( given y1955=1163)
PREDICTIONS thanks to the latent P.P with Exponential marks
Observable latent V. !
What might have been the magnitudeof the second flood in 1955? ( given the max y1955=1163)
PREDICTIONS thanks to the latent P.P with Exponential marks
latent V. !
ρ
Poisson pp with exp. marks123 events, 31 years, 21144 mm/10
accumulated above a threshold of 147 mm/10
Gumbel modelfor annual max
µ
Large events of 19553 floods
17 january: 1163 mm/10
11 february: 583 mm/10
28 february: 282 mm/10
Records historiques de la Garonne à Mas d ’Agenais
DATE HAUTEUR ( m ) DEBIT ESTIME (m3/s)Avril 1770 10.34 7000 à 7400Sept. 1772 6300Mars 1783 7000 à 7200Mai 1827 6500Mai 1835 6400Janv. 1843 6500Juin 1855 9.96 7000Juin 1855 9.96 7000Mai 1856 9.62 6200Juin 1856 9.88 6600Juin 1875 10.56 7000 à 7500 (peut-être 8000)Janv. 1879 9.62 6300Fev. 1879 10.02 7000Mai 1918 9.51 6000Mars 1927 9.97 6700Mars 1930 10.72 7000 à 7500 (peut-être 8000)Mars 1935 9.95 6700Fev. 1952 10.26 6000 à 7000Janv. 1955 9.32 5200 à 5700
DEBIT q(i) ( m3/s )
4500
5000
5500
6000
6500
7000
Dépassements de la Garonne au delà de 2500m3/s
2500
3000
3500
4000
8 avri
l 191
3
28 m
ars 1
914
12 av
ril 19
15
16 m
ars 1
917
17 m
ars 1
920
5 fév
rier 1
922
30 dé
cem
bre 1
923
23 av
ril 19
2615
déce
mbr
e 193
0
22 m
ars 1
931
12 dé
cem
bre 1
932
24 m
ars 1
937
28 ja
nvier
1939
27 ju
in 19
40
14 no
vem
bre 1
941
19 av
ril 19
4414
janv
ier 19
5212
déce
mbr
e 195
9
5 fév
rier 1
961
18 fé
vrier
1966
31 m
ai 19
6824
mar
s 197
1
26 fé
vrier
1973
29 no
vem
bre 1
974
Flood and latent variables
Crues historiques
1900 2000
Valeurs manquantes Z Données systématiques
Data Augmentation algorithm
• Generate K , the number of missing data
• Draw Z for missing
• Poisson pdf withparameter µH
• Trucated GEV • Draw Z for missing flood data
• Compute posterior pdf of parameters θ on augmented sample
• Trucated GEV with parameter θ
• Same pdf structure as the already collected data
Bayesian Estimation with historical data
100
200
300
400
500
600k
100
200
300
400
500
600alpha
100
200
300
400
500
600
700mu
0 200 400 600 800 10003000
4000
5000
6000
7000
8000
9000QUANTILES MOYENS ET INTERVALLES DE CREDIBILITE A 70%
Période de retour
-0.5 0 0.50
0 1000 20000
0 2 40
Estimation sur données historiques
200
300
400
500
600
700QUANTILE Q10
200
400
600
800
1000QUANTILE Q100
200
400
600
800
1000QUANTILE Q1000
Moyenne Médiane Borne Sup 95% Borne Inf 95%mu 2.33 2.32 2.64 2.01k 0.22 0.22 0.32 0.11alpha 1477.63 1468.41 1739.05 1234.42Q10 5808.35 5807.08 6049.74 5564.52Q100 7205.26 7166.80 7759.84 6805.37Q1000 8060.79 7967.45 9101.08 7375.53
5000 6000 70000
100
5000 100000
200
5000 100000
200
Conclusions Mas d AgenaisMédiane Borne Sup 95% Borne Inf 95%
Q10 5520 5810 5990 6050 5190 5565Q100 7004 7170 8450 7760 6350 6805Q1000 8020 7970 10880 9100 6930 7375
Les données historiques ne changent guère l estimation de la valeur centrale, mais réduisent par 2 sur ce cas l ’incertitude d estimation des valeurs de projets
Estimation Borne sup 95% Borne inf 95%Q10 : 5510 5940 5090Q100 : 6810 7550 6070Q1000 : 7550 8730 6370
Max Vrais
Application 3 : Montly rains at Ghezala dam, TunisieGhezala dam, Tunisie
The point Poisson processwith exponential marks is
incompletely observed
(only the sum per period)
Nj
ρµ
The Exponentially marked Poisson process is only partially observed
for(i in 1 : nj)
for(j in 1 : m)
X ij
Nj
∑=
=jn
iijj XY
1
θ
Z
parameter
latent v.
[ ]θZ
Formulation
Hierarchical scheme (again…)
Z
Y
latent v.
observable
[ ]θ,ZY
[ ] [ ] [ ]∫ ×=z
dZZZYY θθθ ,
likelihood
La « loi des fuites »: Y~fuitepdf(ρ,µρ,µρ,µρ,µ)
• Def : Compound Poisson of exp. marks
)(~),exp(~; µρ PoisNXXY i
N
i∑=
• Other définition : Poisson convolution of gamma variates
)(~),exp(~;1
µρ PoisNXXY ii
i∑=
=
−
===
=
∞
=>
−−
∑
0-
10
1-
1e
1)!1(!
e],1y[Y
y
ky
ykk
ek
y
kµ
µ µµρ
250
SimulationsN=poissrnd(mu,1,1000)for i=1:length(N)
if N(i)==0 y(i)=0;elsey(i)=gamrnd(N(i),1); end,
end
1
2
3
4
5
6
7
8
9
10
µ=1.2,ρ=1,50 données
0 5 10 150
50
100
150
200
250
µ=2,ρ=1,1000 données
0 1 2 3 4 5 6 7 80
0 5 10 15 20 25 30 35 400
5
10
15
20
25
30
35
40
µ=14,ρ=1,500 données
Real data: monthly rain- a bucket at Ghezala-dam : february, august
4
5
6
7
8distribution des Pluies mensuelles
GHEZALA – BARRAGEFEVRIER
20
25distribution des Pluies mensuelles
0 50 100 150 200 250 3000
1
2
3
pluie mensuelle en mm
0 10 20 30 40 50 600
5
10
15
20
pluie mensuelle en mm
GHEZALA - BARRAGE AOUT
Loi des fuites• Moments
22),(
),(
ρµµρ
ρµµρ
×=
×=
YVar
YE
• f.c.
−=
−
+=
=
∑∞
=
−−
−
is
iss
ek
y
kees
eEs
Y
k
ykk
isyY
isYY
1exp)(
)!1(!e)(
),()(
1
1-
µψ
µψ
µρψ
µµ
Bayesian inference not difficult!
µ
0.06
0.08
0.1
0.12
0.14
0.16
0.18
prio
r [ η
]
prior0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
α
prio
r [α
]
Poisson
Y
K
ρ
0 5 10 150
0.02
0.04
η
prior
Gamma
GHEZALA :Posterior parameter Distributions (MCMC Gibbs )
0.05 0.1 0.15 0.2 0.250
10
20
30
40
ro0.05 0.1 0.15 0.2 0.250
5
10
15
20
ro
raoblackwell
30 2
GHEZALA- BARRAGE , AOUTDensités a posteriori des paramètres
0 0.05 0.1 0.15 0.20
10
20
30
40
ro0 0.05 0.1 0.15 0.2
0
5
10
15
20
25
ro
raoblackwell
40 0.25
GHEZALA - BARRAGEDensités de probabilités a posteriori des paramètres
0 0.5 1 1.5 20
10
20
mu0 0.5 1 1.5 2
0
0.5
1
1.5
mu
raoblackwell
0 5 10 150
10
20
30
mu0 5 10 15
0
0.05
0.1
0.15
0.2
mu
raoblackwell
mm
mm
3.8ˆ;12.0ˆ;9.0ˆ
4.15ˆ;065.0ˆ;7ˆ1
1
======
−
−
ρρµρρµ En February
En August
Modèle des fuites
• Various hydrological situations (zero inflated Ghezala summer dist.)
• Help of Tom Bayes (relying on conditional structure)structure)
• Convenient Interpretation of parameters + credibility bands
mm
mm
3.8ˆ;12.0ˆ;9.0ˆ
4.15ˆ;065.0ˆ;7ˆ1
1
======
−
−
ρρµρρµ In February
In August
Pedictive Validation of la “ loi des fuites ”
2
2.5
3Distributions prédictives modélisée et non paramétrique (+)
log1
0. d
e la
pro
b. d
e de
pass
emen
t
GHEZALA - BARRAGEAOUT
0 10 20 30 40 50 600
0.5
1
1.5
valeurs de la variable des fuites
-log1
0. d
e la
pro
b. d
e de
pass
emen
t
Distributions prédictives cumulées annuelles, sur 10 et 50 ans
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
probabilitédenondépassement
GHEZALA - BARRAGE FEVRIER
Distribution prédictive :
- annuelle
- du minimum sur 10 ans
- du minimum sur 50 ans
1
0 50 100 150 200 250 3000
0.1
0.2
valeur annuelle P1 P10 P50
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
valeur annuelle P 1 P 10 P 50
pro
ba
bili
té d
e n
on
dé
pa
ss
em
en
t
G HE ZA LA - B A RRA G E A O UT
Dis tribut ion prédic t ive annuelle
m inim um s ur 10 et 50 ans 0 avec probabilité 1
Application 4 : Trawl data in St-Laurence bay, CanadaSt-Laurence bay, Canada
A compound Poisson distribution for zero-inflated data
Context
Moncto
n
Yearly scientific catches
with a trawl in the bay
Data collection
49° 49
66° 65° 64° 63° 62° 61° 60°
415
416
417425
426
Every September since 1971
66° 65° 64° 63° 62° 61° 60°
46°
47°
48°
46
47
48
417
418419
420
421
422 423
424
426
427
428
429431
432
433
434
435
436
437
438
439
401
402
403
Sampling design
Standard trawl sweep of
3,24kms
Biomass weighted
403
403
403
402
402
402
401
401
401
401
401
strate
0
0
0,024
0
0,001
0
0,002
0
0
0
0,456
starfish
26,5
18,5
28
15
17
19,5
19,5
30,5
19
25
26,5
depth
17,6
19,3
13,4
16,3
16,8
18,2
13,9
3,7
11,7
11,4
14,9
temperature
-61,621245,7772
-61,769845,7037
-61,845245,8712
-63,272545,8683
-63,457245,9542
-63,430346,0482
-62,7346,4845
-63,25346,5448
-63,723246,6475
-63,848246,8368
-61,99946,5003
longitudelatitude
2,11126564
1,31965306
0,69245353
0,05808044
0,01246536
0,09576385
0,5569532
0,15432443
0,13635805
0,19407908
totconsum
pelite
gravel with occasional sand patches
pelite
pelite
pelite
pelite
gravel with occasional sand patches
coarse sand
coarse sand
coarse sand
gravel with occasional sand patches
primary
416
416
416
416
416
415
415
415
415
415
415
403
403
0,074
0,026
0,021
0
0
0
0
0
0
0
0
0
0
140,5
135,5
128
100,5
125,5
200
228,5
190
313,5
251
207
24
26,5
2,6
3,2
2,8
1,9
3,4
4,4
4,6
4,1
5,4
5,1
4,7
14,8
17,6
-63,63148,653
-63,685748,6887
-63,929348,6447
-64,097748,5487
-63,361548,5197
-63,814348,851
-63,989248,8862
-64,25348,9933
-63,867748,9662
-63,646348,8453
-63,162748,5592
-61,603745,8392
-61,621245,7772
0,04041808
0,05543526
0,02925672
0,02270026
0,03399162
0,15000818
0,29882976
0,28090004
0,16415286
0,37630839
0,17525881
0,64985972
2,11126564
glacial drift
glacial drift
coarse sand
pelite
glacial drift
pelite
pelite
pelite
pelite
pelite
fine sand
gravel with occasional sand patches
pelite
The Challenges…� Define a random structure to represent bottom-trawl surveys
data
�Find a parsimoneous alternative to a mixture model with
delta distribution
Hypotheses behind la « loi des fuites »
49° 49
66° 65° 64° 63° 62° 61° 60°
415
416
417
424
425426
49° 49
66° 65° 64° 63° 62° 61° 60°
415
416
417
424
425426 Case 1: Some biomass is collected
=
1 strata
=
1 homogeneous life zone
66° 65° 64° 63° 62° 61° 60°
46°
47°
48°
46
47
48418419
420
421
422 423
424 427
428
429431
432
433
434
435
436
437
438
439
401
402
403
66° 65° 64° 63° 62° 61° 60°
46°
47°
48°
46
47
48418419
420
421
422 423
424 427
428
429431
432
433
434
435
436
437
438
439
401
402
403
=a clump with a random
biomass quantity
=The sum in each clump
Case 2: No biomass is caught
hyperparameter
s
49° 49
66° 65° 64° 63° 62° 61° 60°
415
Spatial Model 1 : Régionalisation
66° 65° 64° 63° 62° 61° 60°
46°
47°
48°
46
47
48
416
417
418419
420
421
422 423
424
425426
427
428
429431
432
433
434
435
436
437
438
439
401
402
403
Ni,k
µiρρρρi
a b c dBetween strata
variability
Within strata
variability : la
loi des fuites
Yi,k
i,k
Xi,k,j
loi des fuites
Inference of a mixed effect model with hierarchical submodels
µs
0 5 10 150
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
η
prio
r [ η
]
prior
prior0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
α
a,b c,d
Each strata s is a fishing zone (µ ρ )
Y
K
Poisson
Gamma
ρs
-Bayesian techniques
-EM algorithms
fishing zone (µs,ρs)
µ
L
Urchins in 2002
Starfish in 2002
Adjacency Map
N
Spatial model 2: IAR on a lattice
Trend
Variabilité spatialeCAR Gaussien
Résidu Non spatialisé
0.002km
Neighboring cohérency between strata
CAR Gaussien
Inférence sur µi
Carte des moyennes a posteriori pour µi
Distributions a posteriori pour exp(-µi)
Latent field for localbiomass abundance
Latent field for localChance of a clump
Spatial model 3: A three stage hierarchical model
++
++
++
++
++
++
+
+
++++
+++
++
+ +
++
+
++
+
++
+
47.5
48.0
48.5
49.0
y[z
== 0
]
urchin - 2000
+ ++
++
+++
+ +++++
+++
++
++
+
++++
+++++
+++++++
-64 -63 -62 -61
46.0
46.5
47.0
x[z == 0]
y[z
== 0
]
Model Structure
Latent LogGaussian field for local chance of a clump
0 . 9 Log(µ)~N(β,Σ) Σ(i,i’)= σ2exp-φd(i,i’) κ
β σ2 φκ
Unif Prior range so that correlationat max dist <0.66and <0.99 at min dist
Unif(0.05,1.95
0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 50
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
α
prio
r [α
]
0 5 1 0 1 50
0 . 0 2
0 . 0 4
0 . 0 6
0 . 0 8
0 . 1
0 . 1 2
0 . 1 4
0 . 1 6
0 . 1 8
η
prio
r [η
]
0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 50
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
α
prio
r [α
]
Prior Gamma=ρ
Assume stationaryclump biomass abundance
Log(µ)~N(β,Σ) Σ(i,i’)= σ exp-φd(i,i’)
47.5
48.0
48.5
49.0
latit
ude
o urs ins 2 0 0 0
++
++
++
++
++
++
+
+
+++
++++
++
+
++
+
++
+
+
++
Champ Gaussien latent Moyen
-6 5 -6 4 -6 3 -6 2 -6 1
46.0
46.5
47.0
lo ng i tud e
latit
ude
++
++
++++
+ +++++
+++
++
++
+
+
+++
+
++++
+++++++
47.5
48.0
48.5
49.0
latit
ude
o urs ins 2 0 0 0
++
++
++
++
++
++
+
+
+++
++++
++
+
++
+
++
+
+
++
Mediane de la repartition spatiale predictive de la biomasse
-6 5 -6 4 -6 3 -6 2 -6 1
46.0
46.5
47.0
lo ng i tud e
latit
ude
++
++
++++
+ +++++
+++
++
++
+
+
+++
+
++++
+++++++
La « loi des fuites »Conclusions
• Parsimonious
• Conceptual interpretation
• Inference benefits from conditional structure• Inference benefits from conditional structure
• Infinitely divisible
• Lego brick for various constructions
Further tracks
• Multivariate constructions• Spatio-dynamics modelling• Normal Convolutions
following Calder, Wickle,Craighmiles…following Calder, Wickle,Craighmiles…• Gamma Convolutions
following Wolpert, R.L. and Ickstadt, K. (1998). Poisson/Gamma random field models for spatial statistics. Biometrika, 85, 251-267.
• Markov Point Processes following Berliner
θ1θ1
• Modelling : Capacity to accommodate complexity
Parameters, states process and observations process can be modelled independently
θ = (θ1,θ2)
Z: Latent Process
Parameters
[Z|θθθθ]
[θθθθ]
Conditionnal modelling strategy in Hierarchical Bay esian Models
θ2θ2
Modelling
Z: Latent Process
p(effect|cause)
Individuals, populations, time, space
Y: Observations
[Z|θθθθ]
Random process of states Z conditioned upon θθθθ
[y|X, θθθθ]
Random process of the observations conditioned upon Z and θθθθ
General Comments
● Easy for modelling➢ Thinking conditionally breaks an uneasy problem into tractable pieces
➢ convenient to incorporate knowledge within the various layers of the structure
➢ easy to adapt/improve because locally defined
mind over-modelling...➢ mind over-modelling...
●Handy for inference➢ inputs similar to outputs and probability distributions
➢ probabilistic formalization by reverse conditionning
➢ software available (AppliBUGS ENGREF since 2007)
Quelques bonnes lectures• Redécouvrir la théorie du Risque en
Environnement . Revue de l ’AIGREF, n°spécial sur les risques en 2003 , pages 18 à 24.
• Bernier J., Parent E., Boreux. JJ. (2000), • Bernier J., Parent E., Boreux. JJ. (2000), Statistique pour l ’Environnement. LAVOISIER, Eds TEC et DOC.
• Bernier J., Parent E. (2007), Le Raisonnement Bayésien . Springer Verlag France