Threshold Selection for PrecipitationExtremes
Uli Schneider∗ and Philippe Naveau∗∗
EGU, April 27, 2004
∗Geophysical Statistics Project, NCAR∗∗Dept. of Applied Mathematics, University of Colorado and
Lab. des Sciences du Climat et de l’Environnement, CNRS
Outline
Extreme value theory (threshold models, advantagesand limitations)
A new approach – folding
Conclusions
Threshold Selection for Precipitation Extremes
Extreme value theory
In classical statistics: model the AVERAGE behaviorof a process.
In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).
Usually deal with very small data sets!
Threshold Selection for Precipitation Extremes
Extreme value theory
In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).
Usually deal with very small data sets!
Threshold Selection for Precipitation Extremes
Extreme value theory
In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).
Usually deal with very small data sets!
Threshold Selection for Precipitation Extremes
Threshold models
Model exceedances over a high threshold u –X − u|X > u.
1950 1960 1970 1980 1990 2000
010
020
030
040
0
years
daily
pre
cipi
tatio
n
Daily precipitation for Boulder, Colorado [1/100 in]
u
Threshold Selection for Precipitation Extremes
Threshold models
Model exceedances over a high threshold u –X − u|X > u.
1950 1960 1970 1980 1990 2000
010
020
030
040
0
years
daily
pre
cipi
tatio
n
Daily precipitation for Boulder, Colorado [1/100 in]
u
Threshold Selection for Precipitation Extremes
Threshold models
Model exceedances over a high threshold u –X − u|X > u.
u
Threshold Selection for Precipitation Extremes
Threshold models
Model exceedances over a high threshold u –X − u|X > u.
u
u
Threshold Selection for Precipitation Extremes
Threshold models
Model exceedances over a high threshold u –X − u|X > u.
u
Threshold Selection for Precipitation Extremes
The gen. Pareto distribution (GPD)
The distribution of Y := X − u|X > u converges to(as u → ∞)
H(y) = 1 − (1 + ξy
σ)−
1
ξ .
H(y) is called the “Generalized Pareto” distribution(GPD) with 2 parameters.
shape parameter ξ
scale parameter σ
u
Threshold Selection for Precipitation Extremes
Extrapolation beyond the data
Return levels (quantiles) outside the data range are oftenthe quantity of interest:
Given m, what is the return level z such that there isa 1/m% probability to exceed z?
P (X > z) =1
m
E.g. for precipitation: the “infamous” 100-year flood
Easy to compute once the parameters of the modelare estimated.
Threshold Selection for Precipitation Extremes
Advantages and limitations
From a theoretical viewpoint
(+) “universal” approach(-) asymptotic result : convergence in u andsample size might be very slow.
From a statistical viewpoint
Choosing the threshold: trade-off – A highthreshold yields a better GPD approximation,whereas a low threshold leaves more data points.Goodness of fit – Is it reasonable to removeobservations in order to fit a pre-fixed model?
From a scientific viewpoint: Threshold determinesthe question: What is an extreme value?
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
move above u
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
keep it here
u
move above u
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
move above u
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
move above u
[0,F(u))
F
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
move above u
[0,F(u))
F
[F(u),1)unif.
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – idea
Main idea: Want to use information from the databelow the threshold as well.
u
move above u
[0,F(u))
F
[F(u),1)
F −1
unif.
Threshold Selection for Precipitation Extremes
Folding – formula
[0,F(u))
F
[F(u),1)
F −1
unif.
X Y(u)
Y (u) :=
{
F−1(
F (u)F (u)F (X) + F (u)
)
if X ≤ u
X if X > uwhere F = 1−F
If X ∼ F (x), then Y (u) has the same distribution asX|X > u.
[0,F(u))
F
[F(u),1)
F −1
unif.
X Y(u)
Idea
Estimate F (in the “middle” of the distribution) withthe empirical cdf Fn.
Estimate F−1 with a “preliminary” GPD.
Threshold Selection for Precipitation Extremes
Folding – formula
[0,F(u))
F
[F(u),1)
F −1
unif.
X Y(u)
Y (u) :=
{
F−1(
F (u)F (u)F (X) + F (u)
)
if X ≤ u
X if X > uwhere F = 1−F
If X ∼ F (x), then Y (u) has the same distribution asX|X > u.
Problem: F is unknown.
[0,F(u))
F
[F(u),1)
F −1
unif.
X Y(u)
Idea
Estimate F (in the “middle” of the distribution) withthe empirical cdf Fn.
Estimate F−1 with a “preliminary” GPD.
Threshold Selection for Precipitation Extremes
Folding – formula
[0,F(u))
F
[F(u),1)
F −1
unif.
X Y(u)
Idea
Estimate F (in the “middle” of the distribution) withthe empirical cdf Fn.
Estimate F−1 with a “preliminary” GPD.
Threshold Selection for Precipitation Extremes
Folding – simulation results
Normal dist. with 100 data points (“F unknown”)
20 40 60 80 100
0.5
1.0
1.5
2.0
2.5
3.0
m (years)
retu
rn le
vels
RETURN LEVELS
red = "true", green = with foldling, white = conventional
Threshold Selection for Precipitation Extremes
Folding – simulation results
Normal dist. with 100 data points (“F known”)
20 40 60 80 100
−20
24
6
m (years)
retu
rn le
vels
RETURN LEVELS
red = "true", green = with foldling, white = conventional
Threshold Selection for Precipitation Extremes
Folding – simulation results
Cauchy dist. with 100 data points (“F unknown”)
20 40 60 80 100
−50
050
100
150
m (years)
retu
rn le
vels
RETURN LEVELS
red = "true", green = with foldling, white = conventional
Threshold Selection for Precipitation Extremes
Folding – simulation results
Cauchy dist. with 100 data points (“F known”)
20 40 60 80 100
050
100
m (years)
retu
rn le
vels
RETURN LEVELS
red = "true", green = with foldling, white = conventional
Threshold Selection for Precipitation Extremes
Folding – An analytical result
Assuming that ξ = 0, it can be shown that thevariance of the estimator for σ using the foldingprocedure, Var(σ̂), for a GPD(ξ = 0, σ) can bereduced compared to the conventional estimator.
V ar(σ̂) ≤σ
n
The reduction in variance is a function of thethreshold u and the “quality” of the approximation forF.
Simulation results suggest that the folding worksbetter for heavy-tailed (Cauchy) distributions.
Threshold Selection for Precipitation Extremes
Folding – An analytical result
Assuming that ξ = 0, it can be shown that thevariance of the estimator for σ using the foldingprocedure, Var(σ̂), for a GPD(ξ = 0, σ) can bereduced compared to the conventional estimator.
V ar(σ̂) ≤σ
n
The reduction in variance is a function of thethreshold u and the “quality” of the approximation forF.
Simulation results suggest that the folding worksbetter for heavy-tailed (Cauchy) distributions.
Threshold Selection for Precipitation Extremes
Conclusions
Increasing the threshold according to model fitdiagnostics may be misleading in assessing thequality of the fit.
Using more information from below the thresholdseems to yield more robust estimates.
Using the folding procedure may lead to morefreedom to “define” extreme values in applications.
Threshold Selection for Precipitation Extremes
APPENDIX – GPD convergence
200 400 600 800 1000
1.0
1.5
2.0
2.5
3.0
3.5
4.0
−0.2
0−0
.15
−0.1
0−0
.05
shape parameter for normal distribution (simulated)
Threshold Selection for Precipitation Extremes