NANYANG TECHNOLOGICAL UNIVERSITY
SCHOOL OF
ELECTRICAL AND ELECTRONIC ENGINEERING
MARKOV RANDOM FIELDS GENERALIZED PARETO DISTRIBUTION
FOR MULTI-SITE DATASETS (MRF-GP)
SUBMITTED BY
ZHOU QIAO
A Final Year Project presented to
Nanyang Technological University, Singapore
In partial Fulfillment of The Requirement for The Degree of Bachelor of
Engineering (Electrical & Electronic Engineering)
Year: 2011/2012
Supervised by prof Justin Dauwels
Assessed by Prof Chua Chin Seng
1
TABLE OF CONTENTS
ABSTRACT ........................................................................................................................ i
ACKNOWLEDGEMENT ................................................................................................ ii
LIST OF FIGURES ......................................................................................................... iii
LIST OF CHAPTERS ..................................................................................................... iv
CHAPTER 1: INTRODUCTION .................................................................................... 1
1.1 Motivation ........................................................................................................................ 1
1.2 Scope and Objectives ............................................................................................................ 2
1.2.1 Research Scope ............................................................................................................... 2
1.2.2 Research Objectives ....................................................................................................... 3
1.3 Main Results .......................................................................................................................... 3
1.4 Thesis Organization ............................................................................................................... 3
CHAPTER 2: MODELING EXTREME EVENTS ....................................................... 5
2.1 Overview ............................................................................................................................... 5
2.2 Parameter Dependence Modeling .......................................................................................... 6
2.2.1 Directional Model ........................................................................................................... 6
2.2.2 Seasonal Model .............................................................................................................. 8
2.2.3 Spatial Model.................................................................................................................. 9
2.3 Model Discussions and Limitations .................................................................................... 11
CHAPTER 3: PRELIMINARIES ................................................................................. 13
3.1 Extreme Value Modeling .................................................................................................... 13
3.1.1 Extreme Value Theory ................................................................................................. 13
3.1.2 Generalized Extreme Value Model .............................................................................. 14
3.1.3 Peak Over Threshold (POT) Model- Generalized Pareto Distribution......................... 15
2
3.2 Thin Membrane Gaussian Graphical Model ....................................................................... 16
3.2.1 Markov Random Field .................................................................................................. 16
3.2.2 Gaussian Graphical Model (GGM) ........................................................................ 17
3.2.2.1 GGM Basics ........................................................................................................... 17
3.2.3 Thin Membrane Model ................................................................................................. 19
3.3 Estimation Method .............................................................................................................. 21
3.3.1 Maximum Likelihood (ML) ......................................................................................... 21
3.3.2 Maximum A Posterior (MAP) ..................................................................................... 22
CHAPTER 4: NON-PARAMETRIC METHOD TO MODEL SPATIAL
DEPENDENCE ............................................................................................................... 24
4.1 Model Construction ............................................................................................................. 24
4.1.1 Introduction to the MRF-GP Model ............................................................................. 24
4.1.2 Locally Data Fitting ...................................................................................................... 25
4.1.3 Threshold Optimization Method .................................................................................. 26
4.1.3.1 Threshold Selection ................................................................................................... 26
4.1.4 GP Parameter Optimization .......................................................................................... 31
4.2 Smoothing Parameter Selection .......................................................................................... 32
4.2.1 Cross-validation (CV) .................................................................................................. 33
4.2.2 Maximum a posterior (MAP) ....................................................................................... 33
4.2.3 Iterative Conditional Modes (ICM) .............................................................................. 35
4.2.4 Limitations .................................................................................................................... 37
CHAPTER 5: SMOOTHING PARAMETER SELECTION USING
EXPECTATION MAXIMIZATION (EM) .................................................................. 38
5.1 Model in Matrix Form ......................................................................................................... 38
5.1.1 The Prior Model: Gauss- Markov Random Field ......................................................... 38
5.1.2 Conditional Distribution ............................................................................................... 39
5.1.3 The Posterior Distribution ............................................................................................ 40
5.1.4 Measurements Bootstrapping ....................................................................................... 41
5.1.5 Threshold Smoothing ................................................................................................... 42
3
5.1.6 GP Parameter Smoothing ............................................................................................ 43
5.2 The Exponential Family ...................................................................................................... 44
5.3 Expectation Maximization Algorithm ................................................................................. 45
5.3.1 Introduction to Expectation Maximization ................................................................... 45
5.3.2 Jensen‟s Inequality ....................................................................................................... 45
5.3.3 The EM Algorithm ....................................................................................................... 46
5.3.4 Expectation Maximization in MRF-GP ........................................................................ 47
5.4 Model Implementation by EM ............................................................................................ 48
5.4.1 The Expectation Step .................................................................................................... 49
5.4.2 The Maximization Step ................................................................................................ 50
5.5 Discussions .......................................................................................................................... 51
CHAPTER 6: RESULTS AND DISCUSSIONS .......................................................... 53
6.1 Parameters Initialization and Data Generation .................................................................... 53
6.2 Theoretical Expectations ..................................................................................................... 54
6.3 Threshold Sensitivity and Uncertainty Analysis ................................................................. 54
6.4 Smoothing Sensitivity and Uncertainty Analysis ................................................................ 56
6.5 The Sizing Effect ................................................................................................................. 58
CHAPTER 7: CONCLUSION AND RECOMMENDATIONS ................................. 61
7.1 Summary of the Contributions ............................................................................................ 61
7.2 Recommendations for future works .................................................................................... 63
REFERENCES ............................................................................................................... 64
APPENDIX- LIST OF CODES ....................................................................................... 66
i
ABSTRACT
In this thesis, the author proposes a new nonparametric approach to model the extreme
behaviors of the multi-site time-series data, taking into account the covariate dependence
of neighboring sites using Gaussian Markov random fields.
The modeling of extreme or catastrophic events has shown a rising popularity and
significance recently, especially in areas such as weather forecasting, flood measurement
and environmental assessment. Original methods only fit the extreme data to the
Generalized Pareto (GP) distribution locally. However, dependence between neighboring
sites is obvious. We define the marginal events with observed values over threshold as
extreme, where initial threshold surface is inferred using quantile regression. We first fit
the threshold exceedance to the Generalized Pareto (GP) distribution, which provides
good asymptotic property to the underlying system when the threshold is sufficiently high
in theory. We propose that the covariate dependence exists in the underlying system, and
hence prediction precision is significantly enhanced if the dependence structure is
considered properly. We use the locally fitted results as the initial estimate for GP
parameters. Further, we assume the observed data are the latent variables mixed with
Gaussian noises with zero mean and unknown variance that can be learned from data
using bootstrapping. The thin membrane model provides the prior information and is also
adopted as the panelized functions added to the underlying distributions, controlled by a
set of smoothing parameters. The parameters of GP distribution are smoothed based on
the sites‟ locations and learned from the data using expectation-maximization. Results of
simulation study demonstrate the superiority of the MRF-GP over the locally-fit models.
Sensitivity and uncertainty analysis are also performed to inspect model‟s precision of
inference. In future, we aim to enhance the model performance by extending the
monoscale Gaussian graphical model to its multiscale, which captures the long-range
dependency by introducing several coarser scales.
Keywords: Gauss- Markov Random Field, GP Distribution, Maximum a Posterior, Thin
Membrane Model, Covariate Effects, Spatial dependence, Expectation Maximization
ii
ACKNOWLEDGEMENT
The author would like to dedicate this page to everyone involved in this project and
helped me along the way to make this project a success.
First of all, I would like to express my highest gratitude for my FYP supervisor-
Professor Justin Dauwels for his consistent insights, consultations, mentorships,
motivations and facilities support throughout the project.
The author thanks Dr. Yu Hang for his inspirational guidance and motivations. I
acknowledge the enlightening discussions with Mr Choo Zheng and Miss Wang
Xueou for their kind collaborations and helpful feedbacks. I further acknowledge the
support of Shell International Research and Massachusetts Institute of Technology
(MIT), the insightful advice and previous works done by Philip Jonathan and his
research team.
Furthermore, I want to dedicate my sincere gratitude to all the teaching faculty and
academic stuff that have taught me and helped me during my undergraduate study in
Nanyang Technological University: Thanks Prof Zhang Qing for bringing me into the
research world through the URECA program; Thanks Prof Er Meng Joo for his
insightful teaching and guidance through the Design and Innovation Project. Thanks
prof Nicolas Privault for his encouragement and mentoring in the Stochastic Process
course.
Last but not least, special thanks for Prof Chua Chin Seng for taking out the precious
time to do my project assessment. I also acknowledge Mr Tan from Machine
Learning Lab and other officers in the Biomedical Research Lab for their technical
support.
iii
LIST OF FIGURES
Figure 2.2.1: Different method to prove the superiority of directional model
Figure 2.2.2: Empirical density of storm peak events at Gulf of Mexico
Figure 2.2.3: Observations of the strong positive correlation between the neighboring sites
Figure 3.2.2.2: The conditional independence of the Gaussian graphical model
Figure 3.2.3.1: Simplified version of the 6 by 13 grid structure
Figure 4.1.2 (a): The locally fit threshold surface
Figure 4.1.2 (b): The locally fit shape parameters surface
Figure 4.1.2 (c): The locally fit scale parameters surface
Figure 6.1: Threshold surface constructed from the quadratic model
Figure 6.3.1: GP Parameters changed with threshold for locally fitted model (sample=
1250)
Figure 6.3.2: GP parameters changed with threshold for MRF-GP model (sample= 1250)
Figure 6.4.1: GP parameters changed with for locally fitted model (sample= 1250)
Figure 6.4.2: GP parameters changed with for MRF-GP model (sample= 1250)
Figure 6.5.1: GP Parameters changed with threshold for locally fitted model (sample= 315)
Figure 6.5.2: GP parameters changed with threshold for MRF-GP model (sample= 315
Figure 6.5.3: GP parameters changed with for locally fitted model (sample= 315)
Figure 6.5.4: GP parameters changed with for MRF-GP model (sample= 315)
Figure 7.2: Multi-scale gauss graphical grid structure (3D visualization from MATLAB)
iv
LIST OF CHAPTERS
CHAPTER 1: INTRODUCTION
In this chapter, the author first introduces the background of the topic in concern,
followed by the motivation for conducting this research. After these, the scope and
objectives of this project are explained and main results are demonstrated in brief. Finally,
the organization of this thesis is presented.
CHAPTER 2: LITERATURE REVIEW
This chapter starts with an overview of the previous studies regarding the extreme value
modeling. We introduce two types of modeling framework that addresses the dependence
structure estimation of the multivariate dataset. Specifically, we first discuss the
conditional approach that captures the probability dependence of pair-wise locations in
brief. After that, we introduce the various modeling approaches that handle the covariate
dependence. Three currently popular parametric covariate-dependent models are
presented in details, namely, directional model, seasonal model and spatial model. A
discussion of the limitations of these existing modeling techniques ends this chapter.
CHAPTER 3: PRELIMINARIES
This chapter covers the major preliminary topics that are essential to the comprehension
of the designed MRF-GP model. The concept of extreme modeling and major categories
are illustrated briefly. We then concentrate on discussing the underlying thin membrane
Gaussian graphical model, with the significant terminologies and concepts explained in
details. The various estimation methods are discussed at the end of this chapter.
CHAPTER 4: NON-PARAMETRIC METHOD TO MODEL SPATIAL
DEPENDENCE
v
In this chapter, the proposed non-parametric MRF-GP model is presented. First of all, the
motivation and procedures for model construction are introduced with great details.
Following the introduction, the various smoothing parameter selection techniques are
demonstrated. To optimize the model parameters, some major estimation and
optimization methods are examined, including cross-validation, maximum a posterior and
iterative conditional modes. However, they all have their own flaws that cannot be solved
easily. Therefore, the limitations of these various methods are discussed. The motivation
for other selection methods ends this chapter.
CHAPTER 5: SMOOTHING PARAMETER SELECTION USING
EXPECTATION MAXIMIZATION
In this chapter, another smoothing parameter selection method- Expectation
Maximization- is proposed, motivated by the limitations of the abovementioned
estimation and optimization approaches. The MRF-GP model, with partial modifications,
is reproduced in its matrix form for the convenience of further mathematical operations.
To demonstrate the EM algorithm for parameter selection, some preliminary concepts are
prepared next. The detailed model implementation procedures using the EM algorithm is
introduced to conclude this chapter.
CHAPTER 6: RESULTS AND DISCUSSIONS
In this chapter, the model implemented results are prepared and scrutinized. First, the
approaches to initialize the parameters and therefore to construct the data are introduced.
Second, the theoretical expectations of the designed model are argued, with which the
performance of the MRF-GP model is expected to satisfy. After the model construction
criteria are fixed, various sensitivity and uncertainty tests are performed to assess the
robustness of the model and additionally, to prove the superiority of our model over other
suggested frameworks. Specifically, the threshold sensitivity and uncertainty tests are
performed and the MRF-GP model is proved to be almost surely superior to the locally-
vi
fit results with less sensitivity and uncertainty with respect to the threshold levels.
Followed by that, the smoothing parameter sensitivity and uncertainty measurements are
conducted and our model superiority is confirmed. We repeated the above discussion
procedures with reduced sample sizes to test the sizing effect. A summary of the
abovementioned procedures and results are provided at the end of this section.
CHAPTER 7: CONCLUSION AND RECOMMENDATIONS
In this chapter, the objective and conclusion of our research work are elaborated and my
contribution towards it is specified. Some limitations of the conducted researches are
admitted and justified, followed by my future action plans. After these, my
recommendations to future researchers in this field are presented.
* Chapter 7 concludes the whole thesis.
1
CHAPTER 1: INTRODUCTION
In this chapter, the author first introduces the background of the topic in concern,
followed by the motivation for conducting this research. After these, the objectives of this
project are explained and main results are demonstrated in brief. Finally, the organization
of this thesis is presented.
1.1 Motivation
The modeling of extreme or catastrophic events has shown a rising popularity and
significance recently. It is observed in UK a serious of severe fluvial flooding events,
which have largely affected communities over different parts of the country. These issues
have been addressed by the administrative parties. For the co-ordination of flood
mitigation and risk assessment activities, knowledge of the special characteristics of
fluvial flooding (Extreme River flows) is essential, especially the probability that the
flood will happen in one river during the following days when another river has flood.
Mastering such information can help us take certain precautions when the probability is
high enough, resulting in highly reduction of catastrophic loss.
Another example is about Gulf of Mexico, where the extreme sea states are our main
focus. Extreme sea states are often associated with hurricanes. Modeling those extreme
events is of high importance to the development of offshore facilities. Environmental
design criteria for offshore facilities in this area have inherent uncertainties and
dependencies. These can be functions of climate variability in different covariates
including time and space, and of storm direction and track. Modeling directional model
can help us build the offshore facility with different criteria for different direction, which
is economical as well as secure. Modeling temporal dependence can tell us when is the
best season for activities such as offshore drilling and tourist industry. Meanwhile,
modeling spatial dependence can give us a global view of all the extreme sea states in
Gulf of Mexico. As such, knowledge of the characteristics of the sea states in Gulf of
Mexico is crucial, and reliable extreme value models must incorporate the covariates
effects properly.
2
For reasons of the strong demands, several research works have been undertaken to
investigate and attempt to model the dependence structure of the multisite time-series
extreme dataset. To the best of our knowledge, vast majority of them adopted various
parametric approaches to model the underlying distributions. However, the parametric
models are inherent with several disadvantages. When one is confident to claim that the
interested data is derived from a specified probability model, the parametric statistics are
able to provide satisfactory information and knowledge of the underlying system. On the
other hand, its performance is significantly deteriorated when the underlying distribution
is unknown and no specified model can guarantee to fit the data well. In addition, the
parametric model requires more assumptions than the non-parametric one. Furthermore,
since exhaustive searching for optimal parametric format is not practically possible, it
often leads to significantly biased results if wrong or sub-optimal model is suggested.
In practice, for the purpose of extreme events inference or catastrophic modeling, the
non-parametric models are often preferred, since the underlying distribution is almost
surely unknown and parametric one will tend to provide suboptimal solution and bias the
conclusions.
Therefore, motivated by the rising significance and demands of extreme value modeling
and catastrophic events prediction, the author is encouraged to find means to address
these surging issues. Further, given the drawbacks of the currently dominant parametric
approaches, the author proposes a non-parametric model, MRF-GP, to handle these pain-
points.
1.2 Scope and Objectives
1.2.1 Research Scope
Due to the constraints imposed by time, capital, access to research facilities and other
factors, the author admits the limitations of the research scope. Future enhancements of
the model for higher estimation precision are included in our future action plans.
3
1.2.2 Research Objectives
In this thesis, the author will accomplish the following research objectives:
To review the existing research works regarding the extreme value modeling and
analyzing their pros and cons;
To propose a non-parametric model for catastrophic events prediction and
inference followed by essential case studies;
To validate the results of the simulation is in line with the theory;
To elaborate the superiority of the suggested MRF-GP model with evident
analysis;
To discuss its applications and recommendations for future research works.
1.3 Main Results
The numerical results obtained from the simulation studies are in line with our
expectation from theory, with acceptable variations and deviations. By inspecting the
results from two sets of comparison studies with locally-fit results, the superiority of the
designed approach is proved. The detailed results demonstrations and discussions are
shown in chapter 6.
1.4 Thesis Organization
This thesis covers 7 main chapters. Chapter one provides the introduction of the topics in
concern- extreme events modeling. The motivation for conducting this project is
elaborated in details. After that, the project scope and objectives are presented briefly.
The major research results are also included in this chapter.
In Chapter 2, I give an overview of the previous researches regarding the interested topics
and elaborate several prevalent approaches and models, selected from a large body of
research works. Furthermore, some limitations of those existing estimation frameworks
are argued in favor of the designed non-parametric MRF-GP model.
In Chapter 3, the essential preliminary theories and topics are covered in brief. We first
discuss the families of extreme value models. Following that, the thin membrane
4
Gaussian Graphical Model is introduced. A selection of estimation methods including
Maximum likelihood and Maximum a posterior estimation are also presented in this
chapter.
In Chapter 4, the proposed non-parametric MRF-GP model is presented in its theoretical
form. The motivation and procedures for model construction are introduced and the
various smoothing parameter selection techniques are demonstrated. However, the flaws
of these parameter selection approaches are obvious and ignores these drawbacks will
cause misleading results. The motivation for other selection methods ends this chapter.
In Chapter 5, another smoothing parameter selection method is proposed, motivated by
these parameter selection limitations. The MRF-GP model, with partial modifications, is
reproduced in its matrix form for the convenience of further mathematical operations. To
demonstrate the EM algorithm for parameter selection, some preliminary concepts are
prepared next. The detailed model implementation procedures using the EM algorithm is
introduced to conclude this chapter.
In Chapter 6, the major results and presented with detailed discussions and analysis. The
sensitivity and uncertainty of the estimation and prediction by the MRF-GP model are
elaborated. The conclusion of MRF-GP‟s superiority ends this chapter.
In Chapter 7, the final conclusions and recommendation are made. Limitations for the
research are admitted and some suggestions for future research works are proposed. A
final summary of the topics in concern ends the whole thesis.
5
CHAPTER 2: MODELING EXTREME EVENTS
This chapter starts with an overview of the previous studies regarding the extreme value
modeling. We introduce two types of modeling framework that addresses the dependence
structure estimation of the multivariate dataset. Specifically, we first discuss the
conditional approach that captures the probability dependence of pair-wise locations in
brief. After that, we introduce the various modeling approaches that handle the covariate
dependence. Three currently popular parametric covariate-dependent models are
presented in details, namely, directional model, seasonal model and spatial model. A
discussion of the limitations of these existing modeling techniques ends this chapter.
2.1 Overview
A large body of statistical research works has been launched to routinely investigate the
covariate dependence among multi-site time series dataset in extreme value analysis1, for
instance, Davison and Smith [1990], Robinson and Tawn [1997]. The research literatures
we review are mainly on the application regarding the offshore facilities design criteria
for the hurricane dominant region such as the Gulf of Mexico. The covariates being taken
into consideration are spanning among a large family, although the spatiality,
directionality and seasonality are the ones of major concerns.
Ledford and Tawn [1997] and Heffernan and Tawn [2004] discuss the modeling of joint
depend extremes using conditional approach, in which the extrapolation method for
limited samples are introduced and therefore statistical accuracies are to be enhanced.
Scotto and Guedes-Soares [2000] describe modeling using non-linear threshold. Spatial
models for extremes are designed for estimation of predictive distribution by Cores and
Casson [1998], Casson and Coles [1999] and Cores and Tawn [1996, 2005]. For spatial
applications, a spatio-directional model for extreme waves is designed by Philip Jonathan
and Kevin Ewans [2009] for the application of Gulf of Mexico. Paul Northrop and Philip
Jonathan [2010] further discuss the spatially-dependent non-stationary extremes with
applications in the same region. Regarding directionality, a large body of works is also
1 Philip Jonathan, Kevin Ewans (2008). Modelling The Seasonality of Extreme Waves In the Gulf of Mexico.
Sell Technology Centre Thornton and Sell International Exploration and Production. Proceedings of OMAE 2008, the 27
th International Conference on Offshore Mechanics and Arctic Engineering.
6
available, such as the offshore facility design criteria proposed by Jonathan and Ewans
[2007], Ewans and Jonathan [2007] and Jonathan and Ewans and George Forristall
[2008], in which a detailed model comparison is suggested. Addressing seasonal
dependencies, Anderson et al. [2001] performed a seasonal analysis and asserted that the
advantages for adopting model incorporating covariate dependence is apparent, unless it
can be proved statistically positive that a model ignoring covariate effect is no less
appropriate and thus the save of extra efforts accountable for the covariate analysis can
outweigh the increase of statistical accuracy. Motivated by Anderson, Chavez- Demoulin
and Davison [2005] and Coles [2001] provide insights into the design of non-
homogenous Poison Model in which extremal properties are modeled as functions of
covariates. Demoulin and Davison also demonstrated the application of block
bootstrapping approach for uncertainty analysis.
We carefully studied the relating works in the investigation of covariate dependence
before we design our own model- Gauss-MRF-GP model. In this section, three major
braches of covariate analysis are studied, namely, spatial dependence, directional
dependence and seasonal dependence. Leading models regarding the abovementioned
concerns are illustrated and a summary of the previous works and their implications end
the discussion. The MRF-GP model that we proposed is followed in the next chapter.
2.2 Parameter Dependence Modeling
2.2.1 Directional Model
In this model, Extreme Value Model is built considering the directionality of the data.
Based on the assumption that the samples from neighboring sites are independent and
identically distributed, threshold exceedances of marginal distributions of the random
variables are fitted to Generalized Pareto Distribution family, where ,ux is a
specified threshold, is the shape parameter and 0 is the scale parameter. Maximum
likelihood estimation is used to estimate and , given a sample of data:
( |( ) ) ,
( )- (
⁄ )……………………..................... (1)
7
In directional extreme value model, using a Fourier form, and vary smoothly with
direction , [0,360). The parameter , and are estimated using roughness-
penalized maximum likelihood estimation, where the optimal value of roughness is
chosen using cross-validation. For simplicity, direction sectors, partition of [0,360),
instead of all directions is used. The cumulative distribution of the maximum Hssp
in any
directional sector is modeled and discussed. Simulation studies are also provided.
Jonathan, Ewans and Forristal [9] compared the directional extreme value and constant
model, which is the extreme value model that assumes that extremal characteristics are
constant with direction, in details. Superiority of the directional extreme value model is
proved by different methods, which is concluded in Fig 2.2.1.
Different methods to prove the superiority of directional model
(using artificial data)
Methods Directional model Constant model
Fit GP to the dataset
The variation of γ and σ
changed with threshold is as
expected, and a relatively low
threshold can guarantee GP
distributon
The variation of γ and σ
changed with threshold is not
as expected, only a high
enough threshold can
guarantee GP distribution
Draw empirical cdf The quality of the fit to two models cannot be easily distinguished
Likelihood ratio testThe probability of rejecting the constant model in favour of the
directional model is high for all but the highest threhold
Estimate the quartiles of
the distribution of the 100-
year maximum
In excellent agreement with
theory (the underlying
distribution of artificial data)
over a range of thresholds
Estimates vary with threshold
Figure 2.2.1 Different method to prove the superiority of directional model
The directional extreme value model is also applicable to other model that considers
other covariate effects, such as seasonality of the data, by substituting the direction
8
parameter in the model into the other interested covariates parameter and performing
minor adjustment. Comparison of the directional model and constant model shows that,
when the datasets shows strong dependence on directionality, superior result will always
be got by taking the directionality of the data into account instead of ignoring direction
parameter and set the extremal characteristics constant in all directions.
2.2.2 Seasonal Model2
Statistics of extremes over threshold depend on seasons as well. In this model, a non
homogeneous Poisson process is adopted to capture the seasonal dependence, with a
simulation study of the storm peak events with respect to the Gulf of Mexico followed.
The extreme tail behavior over the threshold is characterized using the Generalized
Pareto Distribution. The GP parameters and rate of occurrence of extremes over threshold
are designed to vary seasonally, with the seasonally-varying threshold being estimated
independently. The model parameters are smoothed regulated by the roughness-penalized
maximum likelihood. Further, Cross-validation is used to learn the optimal level degree
of roughness.
Capturing covariate effects of the extreme storm peaks is crucial for the design of
offshore facilities. Statistically, it has been proven that the design criteria and precision of
estimation is superior for the model incorporating the covariate effects than we could
predict were we to base our belief on estimate which ignoring the covariate effect3
The data used in this simulation study are significant wave heights from the GOMOS
Gulf of Mexico hind-cast study (Ocean-weather, 2005), spanning from 1900 to
September 2005 inclusive. Data analysis shows that the effects of storm peak direction
and season on extreme sea states are interrelated (Shown in Figure 2.2.2).
2 Philip Jonathan, Kevin Ewans (2008). Modelling The Seasonality of Extreme Waves In the Gulf of Mexico.
Sell 3 Philip Jonathan, Kevin Ewans (2007). The effect of directionality on extreme wave design criteria. Shell
Research Limited & Shell International Exploration and Production.
9
Figure 2.2.2: Empirical density of storm peak events at Gulf of Mexico. Darker shading
represents higher density (Jonathan & Ewans, 2008)
To model the seasonality, the authors adopt a variable threshold µ to reflect the seasonal-
dependence observed from the extreme datasets, rather than a fixed shreshold.
2.2.3 Spatial Model
In environmental applications it is found frequently that the extremes of a variable of
interest, such as currently popular multi-site datasets, are non-stationary in nature,
varying systematically in space. And in these cases, it is commonly observed of non-
negligible inter-site dependence, which is desirable to be estimated accurately. However,
threshold selection may be problematic for modern extreme value models in this regard,
particularly when the extremes are non-stationary. Paul Northrop and Philip Jonathan
proposed a new method to infer this dependence structure implicitly, while adopting a
new approach in selecting covariate-dependent threshold using quantile regression model.
In the paper, the authors argue that if non-stationarity in inter-site extremes is obvious, a
non-constant threshold is advised to set such that the non-stationarity is reflected properly.
Based on this assumption, the authors proposed a quantile regression method to
determine the threshold for each location.
10
To illustrate this model, the authors consider the stochastic behaviors of extreme sea
states represented by the significant wave heights, using the hindast data from 72
locations in the Gulf of Mexico. The author asserted that, to enhance the precision of
estimation, it is advantageous to simultaneously model the wave data. And the observed
strong positive correlation with neighboring sites (Figure 2.2.3) shows evidence for this
hypothesis.
Figure 2.2.3 Observations of the strong positive correlation between neighboring sites
In this paper, the author fits the spatially dependent regression models by using the
methodology suggested by Chandler and Bate (2007) to handle the covariate dependence
in the data, assuming irrelevance of the potential covariate of seasonality and
directionality. The marginal distributions of the local maxima is modeled using the
Generalized Extreme Value (GEV) (Jenkinson, 1955), under the assumption that the
limiting distribution is non-degenerate and the vector of the maxima
* + are independently and identically distributed.
Then, for sufficiently large threshold, given that there is an exceedance, has
approximately a Generalized Pareto Distribution. Further, the authors suggest that the
11
parameterization of this model is invariant to the chosen threshold, which is
advantageous if a non-constant location-based threshold is utilized. And the selection
criteria underlined by the same exceedance probability is naturally sensible, which is
governed by:
Then, assuming the data from different clusters are independent, one can derive the log-
likelihood of the distribution function:
However, the above model building process is only true given that the inter-site
dependence is non-existed and the marginal distributions are indeed stationary, which are
both questionable. To address this problem, the authors proposed to enhance the point
process model by handling the dependence structure implicitly.
2.3 Model Discussions and Limitations
These models prove that the covariate dependence structure indeed exists in many dataset,
and it is necessary for theoretical analyst and practical performers to address these issues
to improve their precision of estimations.
First, comparison of the seasonal model with the models ignoring the seasonality reveals
that the advantages for adopting model incorporating covariate dependence is apparent,
unless it can be proved statistically positive that a model ignoring covariate effect is no
less appropriate and thus the save of extra efforts accountable for the covariate analysis
can outweigh the increase of statistical accuracy.
12
Second, comparison of the directional model and constant model shows that, when the
datasets shows strong dependence on directionality, superior result will always be got by
taking the directionality of the data into account instead of ignoring direction parameter
and set the extremal characteristics constant in all directions.
Third, comparison of the spatial model proposed by Paul Northrop and Phillp Jonathan,
in which the thresholds across different sites are varied with marginal covariate
characteristics to achieve a constant extreme quantile (exceedance probability), with the
constant model also confirms the existence of covariate effects. It is argued that the
former method is more logical to model a constant exceedance probability than a constant
physical value in the presence of non-stationary patterns, which is indeed the case for
majority of the multisite time-series data in practice.
Therefore, it is intuitively true that one can improve the precision of estimation of the
spatial characteristics by taking the covariate effects into consideration. However, as
elaborated in the first chapter, these approaches all stem from the parametric side of
modeling. Therefore, the generic disadvantages of parametric modeling are inherent in
these approaches, and the resultant models may thus be biased or suboptimal.
In sum, motivated by the significance of covariate dependence in the modeling process
and drawbacks of the existing approaches, we propose a non-parametric MRF-GP model
characterized by Markov properties to address these pain-points.
13
CHAPTER 3: PRELIMINARIES
This chapter covers the major preliminary topics that are essential to the comprehension
of the designed our generalized pareto distribution with Markov random field prior
(MRF-GP) model. The concept of extreme modeling and major categories are illustrated
briefly. We then concentrate on discussing the underlying thin membrane Gaussian
graphical model, with the significant terminologies and concepts explained in details. The
various estimation methods are discussed at the end of this chapter.
3.1 Extreme Value Modeling
3.1.1 Extreme Value Theory
Extreme value theory is a branch of statistic science that handles the limiting extreme
deviations apart from the median of a probability model. There are two main approaches
of this theory.
The first theorem of the extreme value theory is the named as Fisher–Tippett–Gnedenko
theorem ((Fisher and Tippett, 1928; Gnedenko, 1943). It governs the behaviors regarding
the asymptotic distributions of the extreme order statistics4. In 1958, Emil Julius Gumbel
asserted that for any well-behaved initial continuous distribution, only a few models are
needed to give the asymptotic estimation of the underlying distributions. Specifically, the
extremes of the i.i.d. distributed multivariate samples after proper renormalizations
converge to one of the three special distribution familities: namely, Gumbel, Weibull and
Frechet distribution, respectively.
In contrast, the second theorem of extreme value theory, also called the Pickands–
Balkema–de Haan theorem, provides the asymptotic tail distributions over peak of a
random variable x when the true distribution of it is unknown5.
The major difference between the two theorems stems from the process of initial data
generation. in case of theorem one, given all the values are already maxima, the data to
4 http://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem
5 Balkema, A., and Laurens de Haan (1974). "Residual life time at great age", Annals of Probability, 2,
792–804.
14
be fitted into models are generated in its full range. Theorem two, on the other hand, only
applies to the data that surpasses a specified threshold. This approach is adopted heavily
by the insurance and reinsurance industry, where only payouts over threshold are
concerned by the company.
3.1.2 Generalized Extreme Value Model
In statistics, the Generalized Extreme Value model developed directly from the extreme
value theory is a probability family that gives a generic form that contains the
combination of Gumbel, Fréchet and Weibull distributions. In fact, it is the limiting
distribution of properly normalized maxima of a sequence of i.i.d. distributed random
variables, which can be adopted to estimate the unknown extreme distributions. It has the
form of:
( ) 8 0 (
)1
⁄
9
……………….. (2)
Where is the position parameter, is the scale parameter and is the
shape parameter. It is notable that the shape parameter governs the tail behaviors of the
distribution, and therefore the extreme value family can be further divided into 3 sub-
family: Gumbel, Fréchet and Weibull distributions, corresponding to the case when the
shape parameter is zero, positive or negative, respectively.
3.1.2.1 Type1: Gumbel Distribution
When the shape parameter is equal to zero, the generalized extreme value distribution
follows a Gumbel sub-family, in the form of:
………………….........… (3)
3.1.2.2 Type2: Fréchet Distribution
15
When the shape parameter is positive, the generalized extreme value distribution follows
a Fréchet sub-family, in the form of:
………………… (4)
3.1.2.3 Type3: Weibull Distribution
When the shape parameter is negative, the generalized extreme value distribution follows
a Weibull sub-family, in the form of:
………………… (5)
3.1.3 Peak Over Threshold (POT) Model- Generalized Pareto Distribution
Given is an unknown distribution, one is interested in estimating the tail distributions,
( ) ( | ) ( ) ( )
( )
…………….….. (6)
Where y is the non-negative threshold exceedance. Pickands 6, Balkema and Haan
claimed in their papers that, given a set of independent and identically distributed random
variables ( ), the conditional excess distribution ( ) can be well fitted
by the asymptotic Generalized Pareto Distribution form, when the threshold selected is
sufficiently high.
In fact, the extreme value model we adopt in this paper is Generalized Pareto distribution
(GPD). In modeling extreme events, marginal distributions of the extreme of individual
6 Balkema, A, and Laurens de Haan (1974). „Residual life time at great age‟, Annals of Probability, 2, 792-
804
16
variables are inferred by using empirical data. Subsequently, threshold exceedances of
marginal distributions of variables are fit to GPD as GPD fits the extreme tail behavior of
variables most well by theory.7 The cumulative distribution function of GP distribution is
given in equation (1):
( |( ) ) ,
( )- (
⁄ )
Where ,ux u is a specified threshold, is the shape parameter and 0 is the scale
parameter.
3.2 Thin Membrane Gaussian Graphical Model
3.2.1 Markov Random Field
Markov random field (MRF) is multi-dimensional stochastic process defined on a
discrete lattice, which is a spatial analogue of the transition probabilities of a Markov
chain. Similar to Bayesian network, it is a model with Markov property used to represent
parameter dependencies. In a conditional MRF, the sites of interest interact with one
another via a neighborhood system, { }. Mathematically, it holds as follows:
( | ( )) (( | )
Where is the set of neighboring sites indexed by , and ( ) contains all the sites
excluding the site itself. In other words, the marginal distribution of a particular site is
only dependent on its neighboring sites, and the dependence structure with the other sites
is assumed to be non-existing inside a MRF network.
In addition, a Markov Random Field network belongs to a Gaussian stochastic process if
the neighboring dependence follows a multivariate normal distribution. However, the
7 Philip Jonathan, Kevin Ewans and George Forristall. “Statistical estimation of extreme ocean
environments: the requirement for modeling directionality and other covariate effects,” journal of Ocean
Engineering 35 (2008) 1211- 1225.
17
dependence structure can be under any family of distribution in nature. In this paper, we
choose the Gaussian MRF as our adopted model.
In sum, MRF is a natural framework for modeling covariate dependence, as the events
occurring in each individual site in the regions under consideration is intuitively
interdependent. As a result, it is advantageous to employ the MRF to model the parameter
dependence structure of the underlying marginal GP distribution. This way, the result is
smoothed and a better estimate of the reality is achieved.
3.2.2 Gaussian Graphical Model (GGM)
3.2.2.1 GGM Basics
Generally, the Graphical Model implements the Markov Random Field graphically. In
graph theory, a probability distribution can be captured by a graph G consisting of nodes
V and directed or undirected edges . Conventionally, every node is associated with a
random vector ( where N is the number of nodes) for which the
statistical dependences among the nodes will be represented by the corresponding edges.
Generally, probabilistic graphical models use a graph-based representation as the
foundation for encoding a complete distribution over a multi-dimensional space8
.
Commonly, there are two types of graphical models: directed and undirected. The former
one is often referred as Bayesian network, or belief network. Classical examples of this
type of directed acyclic graph includes Hidden Markov model and neural networks. For
this category, the conditional independence of each node given its parent values (those
vertices pointing directly to the interested node via a directed edge) is assumed to be local
and universal. Precisely, it can be modeled mathematically as:
8 http://en.wikipedia.org/wiki/Graphical_model
18
Where pa(v) is the set of its parents. However, in this paper, we mainly focus on the
undirected graphical models, specifically, the abovementioned Markov Random Field
(MRF), in which the directionality of edges are disregarded. In this regard, if the joint
distributions of all interested nodes are Gaussian in nature, then the MRF is literally a
Gauss- Markov Random Field. And the pdf of the Gaussian process u is defined as:
(* | +
| | 2
( ) ( )3
……………….... (7)
Where of dimension is the covariance matrix of the dataset with
= ( ) and is always positive definite.
3.2.2.2 The Conditional Independence
One important property of the Gaussian Graphical Model is tis conditional independence.
To elaborate this point, we draw a sample graphical model (Figure 3.2.2.2) here:
Figure 3.2.2.2 The conditional independence of the Gaussian Graphical Model
As shown in the above plot, we group the simple graphical model into three blocks (A, B,
C) and we then study its dependence structure. As defined by graphical theory, only the
19
objects with an edge connected are correlated. Therefore, block A and C are conditionally
independent given the information of block B (conditional on B). Mathematically:
( | ) ( | ) ( | )
In the case when the dependence structure follows a Gaussian family, the conditional
dependence and independence are ensured by considering its precision matrix.
3.2.3 Thin Membrane Model
Particularly, the thin membrane model, characterized by the MRF, is selected in this
paper as the distribution function of the priors. Suppose the Gauss MRF field under
consideration is smooth overall, with a certain degree of allowance for discontinuities, the
thin membrane model penalizes the differences between the neighboring nodes by
equation (8):
( | ) . ( )
( ) /..............……………….… (8)
Where is the Gaussian random process, parameterized by mean and covariance
matrix ; ( ) is the neighboring system indexed by ; and is a parameter, which may
also follows a common distribution, specifying the strength of inter-site penalty.
3.2.3.1 The Grid Structure
The underlying system we are interested in can be simplified into a 6 by 13 grid structure
(Figure 3.2.3.1), indexed from 1 to 78, respectively. Each index represents an individual
site, from where the marginal extreme events are observed and to be estimate using the
MRF-GP model. This is an implementation of the graphical model, which is
characterized by the Markov property. For instance, site one is directly connected with
site 2 and 14. Thus, according to the definition of thin membrane model, the marginal
20
distribution of site 1 is only dependent on the observations on site 2 and 14, regardless
what are occurring on other sites.
1 5432 6 10987 131211
27 31302928 32 36353433 393837
14 18171615 19 23222120 262524
40 44434241 45 49484746 525150
66 70696867 71 75747372 787776
53 57565554 58 62616059 656463
Figure 3.2.3.1: Simplified version of the 6 by 13 grid structure
3.2.3.2 The Precision Matrix
The dependence structure of the thin membrane model can be expressed in matrix form
captured by its unique precision matrix. We define as its precision matrix, which is
the inverse of its covariance matrix:
...............................................…..……… (9)
Where is the smoothing parameter controlling the penalizing strength, and is the
adjacent matrix, in the form of equation (10):
<
= .............................................…..…… (10)
21
The diagonal coefficients belong to { +, where is the number of adjacent sites
indexed by site n, , - (p is 78 in our case). The off-diagonal elements are zero if
there is no edge connecting the two sites or minus one if the connecting edge exists. This
matrix is sparse and its inverse almost surely exists. implements the thin membrane
model and characterizes the spatial property of the multi-site data. Additionally, the
smoothing parameter controls its dependence strength. This model indicates the initial
state of the interested system before further observations are made.
3.3 Estimation Method
3.3.1 Maximum Likelihood (ML)
Given a statistical model with unknown parameters, maximum likelihood estimate can be
used to calculate the unknown model parameters. Specifically, assuming a random
variable of interest follows a normal distribution with unknown mean and variance. One
can use MLE to estimate the distributional parameters. This process is accomplished by
treating the mean and variance as variables that are free to vary and find the parametric
value that makes the observed outcome the most probable9. Generically, MLE gives a
unified approach for parameter estimation. For a set of observed data and underlying
designed models, MLE selects the parametric values underlying the model that produces
the distribution giving the observed data the greatest probability or, on other word, the
maximized likelihood function.
To apply this algorithm, we first find the joint probability density function of the
underlying observed data , parameterized by the parameters vector :
* | +
Now, we look at this probability function from a different angle. We regard the observed
n-dimensional as fixed parameters and treat the parameter vector as variable that
9 http://en.wikipedia.org/wiki/Maximum_likelihood
22
can vary freely. The resultant distribution function is conventionally called likelihood
function in the form of equation (11):
.................… (11)
In practice, it is understandable that the log-likelihood is more convenient for further
operations and thus it is often desirable to transform this likelihood to this logarithm form:
By maximizing this likelihood function, the observed data will then be most probable.
Maximum likelihood estimation has several advantages in terms of optimization. First, it
has good convergence properties as the sample size increases, especially for many
conventional distribution families. Second, it is also a simple algorithm that is easy to
implement.
3.3.2 Maximum A Posterior (MAP)
A maximum a posterior (MAP) is a mode of the posterior distribution, which is closely
related to the Fisher‟s Maximum Likelihood algorithm10
. This method can be adopted to
search for point estimate of unobserved information based on empirical data.
MAP deviates from MLE largely because its extra incorporation of a prior distribution
over which one wants to investigate. Prior information is primitive characteristic
regarding the interested system before rather than those evident on recent observations.
Due to the introduction of the „a priori knowledge‟, MAP is regarded as the
regularization of the Maximum Likelihood estimation, and is capable to mitigate the
over-fitting of the underlying model.
10
http://en.wikipedia.org/wiki/Maximum_a_posteriori
23
If x is the observed sample of the unknown population and is the model parameter
underlying the system, the distribution of the observed outcomes follows:
From a different perspective, if we treat the parameter underlying the model as a random
variable that allows to vary freely, while x as a fixed set of quantity, then the posterior
distribution respect to after applying the Bay‟s Theorem is:
( | ) ( | ) ( )
( )
The MAP estimate of is the distribution of that gives the mode of the posterior
distribution, which can be written as:
( ) ( | ) ( | ) ( )
( )
…………….... (12)
Where ( ) is the prior distribution of . Notably, the MAP estimates coincide with the
MLE result when the prior is uniformly distributed. This is the case when there is no
prior information.
24
CHAPTER 4: NON-PARAMETRIC METHOD TO MODEL SPATIAL
DEPENDENCE
In this chapter, the proposed non-parametric MRF-GP model is presented. First of all, the
motivation and procedures for model construction are introduced with great details.
Following the introduction, the various smoothing parameter selection techniques are
demonstrated. To optimize the model parameters, some major estimation and
optimization methods are examined, including cross-validation, maximum a posterior and
iterative conditional modes. However, they all have their own flaws that cannot be solved
easily. Therefore, the limitations of these various methods are discussed. The motivation
for other selection methods ends this chapter.
4.1 Model Construction
In this sub-section, we propose a Gauss-MRF based GP distribution model to handle
specifically the spatial dependence structure revealed in the multi-site dataset. A brief
introduction of this model is presented, followed by the designed procedures. The
learning and inference algorithm of this model will be summarized at the back of this
session. A summary of this model completes this section.
4.1.1 Introduction to the MRF-GP Model
Motivated by the large body of previous works regarding the covariate effect
incorporation in statistical prediction models, in this thesis, we propose a new approach
to handle the spatial dependence characteristics based in the currently popular multi-site
and time-series data, in which the samples observed are spatially non-stationary in nature.
First, we assume all the data in various locations are independently and identically
distributed, following the generalized pareto distribution with unknown GP parameters,
we then handle the covariate effect implicitly to reflect the dependence. To start with, the
locally fitted thresholds are initially selected specified by a pre-determined universal
quantile level, referring to which the local GP parameters (shape and scale parameters)
are learned from the given samples as the initial estimates. However, the observed
marginal threshold is inherently blended with Gaussian noise, specified by the hidden
25
underlying threshold and variance. To address this issue, we introduce the Gauss-MRF
based thin membrane model, which is later employed as the roughness-penalized function
to regulate the threshold smoothing process. The interior point method is adopted to
implement the Maximum a Posterior (MAP) estimation for its simplicity.
For each site, the extreme wave heights over the smoothed threshold still follow the
Generalized Pareto Distribution (GP) strictly, characterized by the shape and scale
parameters, respectively. We build the joint distribution of the GP parameters
incorporating again the roughness-penalized function characterized by the thin membrane
model. We apply directly MAP estimation to determine the smoothed model parameters.
The smoothed results are compared with the locally-fitted model, from which a
discussion will be presented.
4.1.2 Locally Data Fitting
Locally fit threshold and GP parameters are necessarily needed as the initial value to find
the MAP estimation of all the unknown parameters based on interior point method.
To obtain the locally fit result, the dataset of interest are locally fitted into the
Generalized Pareto distribution framework, specified by pre-determined marginal
threshold with the same exceedance probability, governed by equation (1):
( |( ) ) ,
( )- (
⁄ )
Where x is the interested multisite date, gamma is the shape parameter, sigma is the scale
parameter and u is the self-specified threshold value.
The resulting model parameters- threshold, shape and scale parameters are recorded in
the Figure 4.1.2 (a), (b) and (c), respectively.
26
Figure 4.1.2 (a): The locally fit threshold surface
Figure 4.1.2 (b): The locally fit shape parameters surface
Figure 4.1.2 (c): The locally fit scale parameters surface
4.1.3 Threshold Optimization Method
4.1.3.1 Threshold Selection
0
2
4
6
0
5
10
15
3.8
4
4.2
4.4
4.6
0
2
4
6
0
5
10
15
3.8
4
4.2
4.4
4.6
0
2
4
6
0
5
10
15
-0.04
-0.02
0
0.02
0.04
0.06
0
2
4
6
0
5
10
15
-0.3
-0.2
-0.1
0
0.1
0
2
4
6
0
5
10
15
-0.3
-0.2
-0.1
0
0.1
0
2
4
6
0
5
10
15
-5
0
5
10
15
20
1
2
3
4
5
6
0
5
10
15
1.5
2
2.5
3
1
2
3
4
5
6
0
5
10
15
1.5
2
2.5
3
0
2
4
6
0
5
10
15
-0.1
-0.05
0
0.05
0.1
0.15
27
Proper threshold selection is crucial for the model to be practically useful. Jonathan and
Paul11
suggested that if non-stationary behaviors are observed within the data, a covariate
dependent non-stationary threshold should be adopted to reflect the non-stationary quality.
Therefore, we set a primitive self-defined constant probability exceedance probability,
instead of constant threshold, before we perform dependence handling and smoothing.
The standard approach to threshold selection is to fit the covariate-dependent model over
a sufficient large range of exceedance probability (quantile level) and expect for stability
in the parameter estimates (Paul and Jonathan, 2010).
Theoretically, if the designed model is applicable given a properly chosen latent
threshold surface with certain tolerance for variation, the resultant GP parameters after
smoothing must fulfill the following criteria to be reliable:
For shape parameter with respect to each individual site, the estimates varying with
different threshold value should be almost constant. In addition, for each corresponding
scale parameter , the stiffness of the trajectory should behave linearly, with the slope
approximately equal to . Adding to that, the randomly chosen percentile of wave
heights should be nearly constant for various threshold values, from which one can
confidently say that the model is indeed a good estimate of the underlying distribution.
Literally, these desirable characteristics can be regarded as the model building criteria.
Good compliance with these lines of argument implies the superior quality of model.
4.1.3.2 The Thin Membrane Model
First, we smooth the threshold of different sites based on thin membrane model
introduced above. Particularly, suppose the Gauss MRF field under consideration is
smooth overall, with a certain degree of allowance for discontinuities, the thin membrane
model (equation (8)) penalizes the differences between the neighboring nodes:
11
Paul Northrop and Philip Jonathan (2010). Modeling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights.
28
( | ) : ∑ ∑ ( )
( )
;
Where is the Gaussian random process, parameterized by mean and covariance
matrix ; ( ) is the neighboring system indexed by ; and is a parameter, which may
also follows a common distribution, specifying the strength of inter-site penalty.
This penalty-term is added into the smoothing process as the roughness-penalized
function to handle the inter-site dependence implicitly. The integration of penalizing
function is commonly adopted in recent works. This is due to fact that many multisite
data are non-stationary in nature but smooth overall. The parameters of extreme marginal
behaviors vary systematically according to the covariate effect, which perfectly match
with one‟s intuitions. In the absence of this penalty term, the variation among data may
be abrupt and the precision of estimation will be deteriorated. To demonstrate the
usefulness and significance of the penalty strength (stiffness) parameter , we perform a
simple extreme scenario analysis to investigate the model‟s asymptotic behaviors:
Scenario one: Alpha= 0
When Alpha is forced to zero, this term will be effectively removed from the smoothing
process. This implies that the resultant parameters will not be smoothed at all, following
strictly to the local fit behaviors. In this case, if we perform optimization method on this
model, the result will be converging to the locally fit parametric data almost surely, and
the effect of optimization will be pure maximization.
Scenario two: Alpha=
In contrast, when the smoothing regulating parameter is forced to infinity, the behaviors
of this model will be completely the opposite as one may assume. In this regard, the
penalizing term, which measures the stiffness of the parametric surface, will be extreme
large. There will be zero degree of flexibility (allowance) for the parameters across
different locations to vary, and therefore all the parameters will be approaching the same
29
converging value. As a result, this will be case of surely smoothing, and the surface
consisting of the all the parametric values across various sites will be flat. This is not
desirable.
As far as the threshold is concerned, our goal is to find the optimal value such that the
abovementioned modeling building criteria is met. The same thing applies to the GP
parameter selection. Therefore, it is important to adjust the , the stiffness of threshold
surface for each site, to find an optimal threshold level where the covariate effects are
properly handled.
With the assumption that the underlying threshold is distributed following a thin
membrane framework, we obtain:
( | ) : ∑ ∑ ( )
( )
;
Where v is the underlying threshold one is looking for, is the stiffness parameter for
threshold, V is the collection of data sites ( ) and ( ) is the neighboring sites
indexed by i.
4.1.3.3 Joint Threshold Distribution
The observed threshold, which can be determined using quantile regression, is the
underlying threshold mixed with Gaussian noise with zero mean and unknown variance.
Therefore, it follows ( ):
√ .
( )
/
…………….... (13)
It is worthy to mention that the variance is a variable that controls the flexibility of the
threshold movement. In other word, viewing the threshold over the grid as a surface, the
variance quantifies the space for threshold to vary. Therefore, when variance tends to be
30
large, the threshold is free to move centering in the underlying threshold. Furthermore,
since for threshold stiffness parameter, we have no prior information, we may intuitively
assume a uniform distribution for it. Therefore we can obtain:
( | ) ( | )
( )
Plugging into the distribution information, one can get the joint distribution with respect
to the underlying threshold v:
( | ) ∏
√ .
( )
/ √ . ( )
( ) / ( )
( )
…………….... (14)
4.1.3.4 Threshold Smoothing
To find the optimal threshold level, one can utilize the various optimization methods
available. In our case, we apply the interior point method powered by MATLAB through
maximizing equation (14) . Besides, since the distribution of the observed threshold is not
affecting the optimization result, it can be simply ignored. Therefore, we need to
maximize:
( | ) ( | )
( ) ( | ) ( | ) ( )
∏
√ .
( )
/ √ . ( )
( ) / ( ),
Where ( ) is uniformly distributed, is the observed threshold, and is the
underlying threshold characterized by the Markov random field.
31
4.1.4 GP Parameter Optimization
Locally fit threshold and GP parameters are necessarily needed as the initial value to find
the MAP estimation of all the unknown parameters based on interior point method.
4.1.4.1 The Thin Membrane Model
Next is to smooth the GP parameters. The thin membrane model is again selected as the
distribution structure of GP parameters ( , ) from different sites. Each site has
connection with its 4 immediate neighbors (except those on the border of the grid)
characterized by the Markov property:
( | ) ( .
/
( ) ) ...................................... (15)
And, similarly:
( | ) ( ( )
( ) ) ...................................... (16)
Where V is the set of all the sites, N(i) denotes the neighbor sites of site i.
4.1.4.2 Joint Parametric Distribution
For each site, the wave height data over threshold is still GP distributed with the
probability density function (pdf) of GP distribution given in equation (1):
( | )
,
( )-
( )
Moreover, since we have no prior information about the distribution of and , we
choose the uniform distribution covering the possible range of and , thus,
( | ) ( )
( )
32
4.1.4.3 GP Parametric Smoothing
To find the optimal parametric model, we want to maximize the joint likelihood of the
GP parameters. Besides, since the distribution of the original dataset is non-changing, it
has no effect on the optimization result as well. Thus, it is safe to neglect this term.
Therefore, our goal will be to maximize:
( | ) ( )
( )
: ∑ ∑ .
/
( )
; : ∑ ∑ (
( )
) ;∏
[
( )]
( )
…………….. (17)
Conventionally, we take the logarithm on both sides:
Log {** | + * ( | )+
∑ ∑ .
/
( )
( ) ∑ ∑ ( )
( )
( )
∑84
5 [
( )] 9
…………….. (18)
Therefore, our next step is to maximize the log likelihood function.
4.2 Smoothing Parameter Selection
In this sub-section, we attempt to apply Cross-validation, Maximum a posterior and
Iterative conditional modes to meet the maximization objectives in the smoothing
33
parameter selection process. However, all the three methods failed to provide satisfactory
results after the actual implementation. Discussions are given at the bottom of the sub-
section to summarize the limitations and disadvantages of the three used optimization
approach.
4.2.1 Cross-validation (CV)
Cross Validation, sometimes called rotation estimation12
, is a commonly used technique
for assessing the predictive performance of an interested dataset. In other word, it is
mainly adopted in the settings where the goal is prediction, by whom wants to assess the
accuracy of the prediction. Generically, the first step in CV is partitioning a sample data
into complementary subgroups (the number of partitions is called folds). Followed by
that, one will perform the modeling fitting on some subsets (known as the training set)
and validate the remaining sets (known as validation set) to test the performance. For the
enhancement of predictive power and minimization of variability, the rounds of cross-
validation are performed iteratively for the number partitions, and the validation results
are averaged over rounds.
To illustrate its usage, for a k-fold Cross-Validation, the dataset is first partitioned into k
subgroups, among which (k-1) groups will be treated as the training set and the remaining
one group is regarded as the validation set. The dataset are fitted into the designed model,
for which the dataset is supposed to be compatible, and the validation result is recorded.
This procedure is repeated for k times until each group is used exactly once for the
purpose of validation. The performances are averaged over the iterative rounds. The
implementation of this methodology specific to our MRF-GP model will be further
studied in the subsequent section.
4.2.2 Maximum a posterior (MAP)
12
Geisser, Seymour (1993). Predictive Inference. New York: Chapman and Hall. ISBN 0412034719.
34
In order to maximize the logarithm likelihood function, we can directly apply Maximum
a Posterior (MAP) estimation by setting the partial derivative with respect to each
unknown parameters to be 0:
To determine , the partial derivative with respect to
is forced to 0:
( | )
∑ .
/
( )
[
( )]
4
5
( )
( )
……………….. (19)
To determine , the partial derivative with respect to is forced to 0:
( | )
∑ ( )
( )
4
5
( )
( )
……………….. (20)
To determine , the partial derivative with respect to is forced to 0:
( | )
∑ ∑ .
/
( )
……………….. (21)
To determine , the partial derivative with respect to is forced to 0:
35
( | )
∑ ∑ ( )
( )
……………….. (22)
The code we use to achieve this process is fmincon function based on MATLAB. It
searches for a constrained minimum of a scalar function of several variables starting at a
given initial value. This is generally referred to as constrained nonlinear optimization13.
However, this built-in function of MATLAB has its drawbacks, which makes the MAP
approach not practically feasible in this case. Detailed discussion will be provided at the
end of this sub-section.
4.2.3 Iterative Conditional Modes (ICM)
Since it is difficult to maximize a Markov Random Field, Besag (1986) proposed a
method called iterated conditional modes (ICM) to perform this task alternatively. ICM is
a statistical deterministic algorithm to fix the parameter selection that maximizes the joint
probability of a Markov Random Field. It does this by iteratively applying the
optimization method to the local conditional probabilities14
. However this method
inherently has several drawbacks concerning its efficiency and optimization convergence
capabilities, which will be discussed in details in the subsequent section. First, the
threshold is selected and smoothed before we start to estimate the GP parameter.
To estimate GP parameter, we first need to find the expression for the smoothing
parameter for , respectively. In order to do this, we call back the equations (20)
and (21) and we can derive that:
13
http://www.caspur.it/risorse/softappl/doc/matlab_help/toolbox/optim/fmincon.html 14 http://en.wikipedia.org/wiki/Iterated_conditional_modes
36
.
/
( )
………………... (23)
( )
( )
……………....... (24)
In our case, since we are interested in a region consisting of 78 sites (grids), we hence
substitute 78 for p into the equations (23) and (24):
.
/
( )
And
( )
( )
Plugging the locally fit result of the GP parameters, these two results are used as the
initial values for the ICM iterations. Subsequently, these results will be put back into
equation (17), on which the optimization will be performed:
Log {** | + * ( | )+
∑ ∑ .
/
( )
( ) ∑ ∑ ( )
( )
( )
∑84
5 [
( )] 9
37
We apply interior point method to this function to update the GP parameters iteratively.
This procedure will be repeated iteratively until the parametric values converge. However,
the result is disappointing and some drawbacks are detected. In this simulation, due to
large coefficient of the logarithm term:
( ) and
( ) (p is equal to 78 in this
case), the effect of these two terms overweigh those of the penalizing functions. As a
result, the maximization results tend to converge to large and limited by the self-
defined upper bound, which is clearly a case of over-smoothing. Literally, its result is
similar to the constant model, in which all the parametric values are constant with the
covariate.
4.2.4 Limitations
First of all, the low efficiency and biasing of the results limits the practical usefulness of
the abovementioned approaches. After scrutinizing the posterior distribution we want to
maximize, we detect the tendency for the smoothing parameters to converge to infinity,
limited by the upper bound we set in the fmincon function. This is due to the fact that we
are only considering the smoothing parameters as constants, instead of regarding them as
random variables, which are correlated with the choices of the GP parameters. The reason
and impact of this bias in modeling process are explained in more details in Chapter 5.
Second, these methods tend to converge into the local maximum when performing the
optimizations. This is due to the fact that the Generalized Pareto distribution, which we
adopted as the model for marginal fitting is in fact non-linear and non-convex. Therefore,
there is no guarantee of the uniqueness of the maxima. As a result, the smoothing
parameters, which are sensitive to the choice of the initial values, will tend to converge
into the local minimum, leading to a sub-optimal solution.
Disappointed by the above methodologies, we further investigate our situations. Cross-
validation tends to behave like locally-fit; MAP and ICM tend to converge to constant
model. And we want to find some value in between purely smoothing and purely
maximization.
38
CHAPTER 5: SMOOTHING PARAMETER SELECTION USING
EXPECTATION MAXIMIZATION (EM)
In this chapter, another smoothing parameter selection method- Expectation
Maximization- is proposed, motivated by the limitations of the abovementioned
estimation and optimization approaches. The MRF-GP model, with partial modifications,
is reproduced in its matrix form for the convenience of further mathematical operations.
To demonstrate the EM algorithm for parameter selection, some preliminary concepts are
prepared next. Since the results based on Cross-validation and Iterative Conditional
Modes are not satisfactory, we adopt the Expectation- Maximization optimization method.
We first introduce the prior models adopted in this algorithm and the distribution function,
on which the Expectation- Maximization method is based. Sample size of individual
locations is enhanced adopting site averaging approach. Initial model parameters are
learned from data via bootstrapping. The implementation procedures of EM algorithm are
also demonstrated in details. A discussion concerning the converging issue and
robustness of the model concludes this section.
5.1 Model in Matrix Form
5.1.1 The Prior Model: Gauss- Markov Random Field
Our prior model is still based on the Thin Membrane Model introduced in chapter 2:
( | ) : ∑ ∑ ( )
( )
;
Rewritten in its vector form, we have:
( | ) (
)
…………….... (25)
Where is the smoothing parameter controlling the penalizing strength, and is the
adjacent matrix, which is in the form of equation (10):
39
<
=
The diagonal coefficients belong to { +, where is the number of adjacent sites
indexed by site n, , - (p is 78 in our case). The off-diagonal elements are zero if
there is no edge connecting the two sites or minus one if the connecting edge exists. This
matrix is sparse and its inverse almost surely exists. implements the thin membrane
model and characterize the spatial property of the multi-site data. Additionally, the
smoothing parameter controls its dependence strength. This model indicates the initial
state of the interested system, which is characterized by the Markov Property, before
further observations are made.
Based on the MRF-GP Model in section 5, our Priori distribution, which follows an
exponential family, can be expressed in its vector form as:
( | ) √ ( )
√ (
)
(
)
…………….... (26)
5.1.2 Conditional Distribution
This section of our model characterizes the behavior of the observed variable basing on
the prior information. According to MRF-GP model, the conditional distribution should
follow:
( | )
√ ( )
(√ ) (
( ) ( )
…………….... (27)
40
Where R is a sparse matrix with the diagonal elements being the variance indexed by
each site:
[
]
…………….... (28)
In fact, this is an underlying random variable x mixed with a Gaussian noise, whose value
will be determined by bootstrapping, learning from the data through quantile regression
and varying systematically with the space. Obviously, this distribution belongs to the
exponential family.
5.1.3 The Posterior Distribution
According to our MRF-GP, the posterior distribution that we want to maximize is
governed by:
( | ) ( | )
( ) ( | ) ( | ) ( )
∏
√ (
( )
)
√ : ∑ ∑ ( )
( )
; ( )
In vector form,
( | ) (
) (
( ) ( )
41
(
(
) )
…………….... (29)
We define and . C is the called the selection
matrix, which will be used to select the data where the observation is available. For cases
with no missing data, selection matrix is the identity matrix with the dimension the same
as the underlying dataset. J is regarded as the posterior adjacent matrix. Additionally,
since is a diagonal matrix with the same sparsity as , adding the conditional
term preserves the structure of the original system15
, which is desirable. With the
selection matrix C being an identity matrix, the posterior distribution can be simplified as:
( | ) (
)
…………….... (30)
5.1.4 Measurements Bootstrapping
In this section, we present a bootstrapping method to infer the uncertainties of extreme
value model parameters (GP parameters in or case) and thresholds directly, given n
observations for each individual location. Bootstrapping is a standard approach in
statistical inferences. It measures the parameter uncertainties by re-sampling the original
data sample at random16
.
Some works assume the Gaussian noise mixed inside the Markov Random Field is
uniform throughout the interested region17
, with zero mean and constant variance
independent with space or other covariates. In our paper, we propose that the variance
inherited in the system is not uniform. Rather, it varies with covariates systematically,
15
Myung Jin Choi and Alan S. Willsky (2007). Multiscale Gaussian Graphical Models and Algorithms for large-scale inference. Massachusetts Institute of Technology, Electrical and Computer Science. 77 Massachusetts Ave., Cambridge, MA 02139, USA. 16
Philip Jonathan, Kevin Ewans (2008). Uncertainties in extreme wave ueight estimates for hurricane- dominated regions. OMAE-06-1067. 17
Philip Jonathan, Kevin Ewans, George Forristall (2008). Statistical estimation of extreme oceam environments: the requirement for modeling directionality and other covariate effects.
42
which can be learned from the data using bootstrapping. The calculated variance of
threshold and GP parameters are prepared and utilized in the EM inference step. The 95%
confidence intervals of GP parameters are also inferred and the results are discussed in
Chapter 6. The general procedures for estimating the parameter variance and
uncertainties are as follows:
1. Estimate the GP parameters ( ) and threshold u using the whole of the original
data sample as the initial values.
2. Generate m data sub-samples for each individual site * +
by re-sampling
the n observations of each sites at random with replacement, assuming no missing
data, where is the original observations at site and is determined by the
grid structure and * + with ( ) is the collection of all the
sub-samples.
3. For each site, estimate the GP parameters ( ) and threshold level (by fixing the
same quantile) for each of the m sub-samples. We record the vector =
{ }, where ( ) is the set of parameters for sample n.
Normally, m is of the order of 1000 and is equal to 3000 in our case.
4. Next, we can obtain the variance estimates for and respectively by calculating
the variance of .
5. To estimate the 95% confidence interval for the model parameters, we find the critical
vales (
) of such
are the 2.5% and 97.5% quantile of the parameter
vectors, respectively.
5.1.5 Threshold Smoothing
The observation vector of threshold is obtained by computing the same quantile for all
the sites. The noise covariance matrix is assumed to be a diagonal matrix with the
variance for each site estimated using bootstrap approach introduced by Jonathan18
(see
details in section 5.1.4.). We select the threshold smoothing parameter by trial and
18
P. Jonathan, K. Ewans, “Uncertainties in extreme wave height estimates for hurricane-dominated regions”, Journal of offshore mechanics and arctic engineering, vol. 129/1, August, 2007.
43
error in order to measure the effect of the roughness of threshold surface on GP
parameter surface.
For each selected smoothing parameter , we update the next estimator of threshold
based on the posterior distribution of the initial value obtained from quantile regression,
with variance inferred from the bootstrapping model. This estimator ( ) is defined as the
optimal value that can maximize its posterior distribution ( | ) using the Expectation
Maximization algorithm (see details in section 5.3):
( | ) (
(
) )
…………….... (31)
Therefore, we apply MAP method to equation (31) to find the optimal estimate of .
Practically, we set its derivative to zero:
(
( ) )
Hence we obtain:
( )
…………….... (32)
5.1.6 GP Parameter Smoothing
Based on the smoothed threshold, we estimate the observed value of GP parameters and
using maximum likelihood method. The smoothed GP parameters are then estimated
using similar process in the above section.
( )
( )
………………... (33)
44
And
( )
( )
…………….... (34)
However, the smoothing parameters and is learned from the data using the
expectation maximization introduced in the next section.
5.2 The Exponential Family
Since the prior and conditional distributions are given as
( ) ( )
√ (
)
And
( | )
( )
(√ ) (
( ) ( ))
We can derive the joint distribution of x and y as
( | ) ( )
√ (
)
( )
(√ ) (
( ) ( ))
(
( ))
(
( ) ( )
( ))
…………….... (35)
45
Therefore, the joint distribution follows an exponential form. It is notable that, we will
utilize the convex property of the exponential family in the later parameter selection
process.
5.3 Expectation Maximization Algorithm
5.3.1 Introduction to Expectation Maximization
The EM algorithm is an iterative method adopted to estimate parameters in latent variable
models. It is a general purpose algorithm for finding the maximum likelihood estimates
with a good converging property. Specifically, due to the convex property of the
exponential family, the convergence of EM is guaranteed.
Let ( ) where p=78 be the observed data, and ( ) be the
hidden random variables, and be the set of model parameters, then we want to find the
parameter set that can make the current observation the most probable. Therefore, the
log-likelihood function we hope to maximize is:
( ) ( | ) ∫ ( | )
……………....... (36)
And
∫ ( | ) ( | )
∫ ( | )
……………........(37)
5.3.2 Jensen’s Inequality19
Generally, convex function satisfies the Jensen‟s inequality, which states that for
, -, any convex function ( ) follows:
( ) ( ) ( ) ( ( ) )
…………….... (38)
19 Kin Y. Li(2000). Mathematical Excalibur. Volume 5, Number 4.
46
However, this concept is not limited to the bivariate cases. And given and
is non-negative, ( ) ( ) , it can be generalized to:
( ) ( ) ( ) ( ) ( )
Therefore, one can easily derive that:
, ( )- ( , -)
…………….... (39)
This conclusion will be used in the derivation of the EM algorithm in the subsequent
section.
5.3.3 The EM Algorithm
Due to the convexity of the exponential family, we apply Jensen‟s in-equality to obtain
the lower bound of the likelihood function:
( ) ( | ) ∫ ( ) ( | )
( ) ,
( | )
( ) 6
( | )
( )7
∫ ( | ( )) ( | ) ∫ ( | ( )) ( | ( ))
( )
…………….... (40)
Since the lower bound is strictly non-above the likelihood functions, it is intuitively true
that the process of maximizing the likelihood is in fact equivalent to maximize its
concave lower bound, and the choice of function q that may make the lower bound a
maximum is the posterior distribution ( | ( )). To prove this, we force the equality
of the above formula, and we find that when ( ) ( | ( )), the equilibrium is
satisfied as expected:
47
( ) ∫ ( | ) ( | )
( | ) ∫ ( | ) ( | )
( | )∫ ( | ) ( | ) ( )
Where ( ) the likelihood function one is hopes to maximize and ( ) is the concave
lower bounded or EM objective function defined by the Jensen’s Inequality. Therefore, as
one maximizes the objective function, the maximization of the log likelihood function is
also ensured. In sum, the iterations consist of two standard steps:
In the Expectation step: one updates the posterior distribution ( ) conditional on the
observations and meanwhile keep the fixed, with the next estimate obtained from
( )
In the Maximization step, one fixes the and maximize the concave lower bound:
( )
5.3.4 Expectation Maximization in MRF-GP
The Expectation Step
In the E-step, we fix the smoothing parameters and update the next estimator of the latent
variables based on the posterior distribution of the previous values by choosing ( )( )
( | ( )) and estimate , | -. And the resultant estimator is defined as the
current optimal value that can maximize its posterior distribution ( | ). Therefore, we
apply MAP method to ( | ) to find the optimal estimate of . Practically, we set its
derivative to zero:
( | )
48
The Maximization Step
In this step, we want to maximize the lower bound, ( ) defined in,
( ) ∫ ( ) ( | )
( )
∫ ( | ( )) ( | ) ∫ ( | ( )) ( | ( ))
∫ ( | ( )) ( | ) ( ( )( ))
…………….... (41)
Where the second term is the entropy or uncertainty term of ( )( ) and is irrelevant to .
Therefore, we define ( ( )) as the first term of the above equation and estimate
that maximize
( ( )) | ( ), ( | )-
…………….... (42)
These steps will repeat until the convergence, which is ensured by the convexity of the
exponential family.
5.4 Model Implementation by EM
In our model, we have two smoothing parameters that need to be inferred:
{ }
…………….... (43)
And our objective is to estimate the smoothing parameter that best explains the data we
observe. In other words, given observation , we seek the parameter which maximizes
the log-likelihood:
49
( ) ( | ) ∫ ( | ) ∫ ( ) ( | )
( )
5.4.1 The Expectation Step
In E-step, we choose ( ) and estimate , | -. And we
obtain:
( | ) ∫ ( | )
…………….... (44)
Where is the next estimates, y is the current observed value,
, and . The error covariance matrix is the inverse of J, which measures the
updating value‟s error correlation with the initial observation:
[(( ( ))( ( )) | )]
…………….... (45)
It is notable that, this process involves the matrix inverse. For large scale sets of variables,
the inverse of the variable matrix may be sometimes problematic and intractable.
The expression for J is
. is the information
matrix of the underlying dataset in diagonal form, therefore, it is guaranteed to have
matrix inverse. Generally, the inverse of addition of two matrix and will
almost be surely existing if the smoothing parameter alpha is sufficiently small. However,
it is possible that the matrix will be badly conditioned or singular when the alpha value
keeps rising. In this case, alternative method is adopted. To handle these exceptions, we
assume to be infinity, as the value of it in fact increases exponentially during iterations.
Therefore, we just need to choose that maximize
( ) ( ) subject to
the constraint that all the components in equals to each other.
50
5.4.2 The Maximization Step
The Maximization step finds the next parameter that maximize ( ( ) ) given by:
( ( ) ) ∫ ( )( ) ( | )
( )( )
As discussed in the previous section, it is indeed the same to maximize the Q-function:
( ( )) | ( ), ( | )-
| ( )[
]
( )
( )
…………….... (46)
Where . . ( ( )
) / ( )
( )/
…………….... (47)
To maximize the Q-function, we set its derivative with respect to ( ( ))
to 0.
Therefore,
(
( ))
And
( )
…………….... (48)
Where is the number of sites, which is 78 in our case based on the grid structure
designed in chapter 3.
51
5.5 Discussions
In our model, the GP parameters are estimated using this algorithm. For each iteration
step, the difference between the updated and previously observed likelihood is
conditionally guaranteed to be non-negative. Therefore, given the probability to be
maximized is Gaussian and convex, the convergence is assured and the issue of
stabilizing to a local maxima is prevented. In addition, this model is more reliable than
other implementation algorithms. After performing the ICM method, the estimated
smoothing parameter is forced to infinity, which is unrealistic. This can be justified by
scrutinizing the likelihood we attempts to maximize:
∑ ∑ .
/
( )
( ) ∑ ∑ ( )
( )
( )
∑84
5 [
( )] 9
The ICM algorithm is based on an assumption that smoothing parameter and is a
constant value that is independent with the GP parameters. Therefore, we have every
freedom to select any combinations of , , and to achieve the maximization
purpose without considering the dependence structure that should not be ignored. Under
this false assumption, one natural choice is force and to infinity and let
.
/
( ) and ( )
( ) to zero. By this way, the
maximization is achieved. However, all the GP parameters across the different sites are
forced to the same. The parameter surface is flat. This is not realistic and misleading.
In fact, the smoothing parameters are not constants. Instead, they are also random
variables that are interrelated with the choice of GP parameters, and considering them
independently will leads to the above consequence. EM algorithm takes this dependence
into consideration and always regards the smoothing parameters as random variables,
which can be reflected from the formula:
52
( | )
is the next estimate of GP parameters and contains the information of ,
In this way, the relationship of and GP parameter is constructed and the distribution of
the smoothing parameter is taking into consideration properly.
53
CHAPTER 6: RESULTS AND DISCUSSIONS
In this chapter, the model implemented results are prepared and scrutinized. First, the
approaches to initialize the parameters and therefore to construct the data are introduced.
Second, the theoretical expectations of the designed model are argued, with which the
performance of the MRF-GP model is expected to satisfy. After the model construction
criteria are fixed, various sensitivity and uncertainty tests are performed to assess the
robustness of the model and additionally, to prove the superiority of our model over other
suggested frameworks. Specifically, the threshold sensitivity and uncertainty tests are
performed and the MRF-GP model is proved to be almost surely superior to the locally-
fit results with less sensitivity and uncertainty with respect to the threshold levels.
Followed by that, the smoothing parameter sensitivity and uncertainty measurements are
conducted and our model superiority is confirmed. We repeated the above discussion
procedures with reduced sample sizes to test the sizing effect. A summary of the
abovementioned procedures and results are provided at the end of this section.
6.1 Parameters Initialization and Data Generation
The artificial data is generated according to the fitted model proposed by P. Northrop and
P. Jonathan 20
, where threshold is quadratic in longitude and latitude while GP
parameters and are constant. Here, we select a quadratic surface with respect to
threshold changing with longitude and latitude shown in Fig. 1 and GP parameters to be
(-0.3 4.4). Specifically, the threshold surface (Figure 6.1) is constructed according to the
polynomial format:
( ) ( ) ( ) ( ) ( ) ( )
………………... (49)
Where x and y are the longitude and latitude respectively, ( ) and ( ) are linear
functions in the form of (x-a) and (y-b); and ( ) is a quadratic function in the form of
( ( ) ) , and are different coefficients with fixed values.
20
P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes with application to
hurricane-induced wave heights”, Publisher: Department of Statistical Science, University College London
54
Since our simulation study is based on synthetic data only, we assign the set of
parameters * + with arbitrary values.
Fig. 6.1: Threshold surface constructed from the quadratic model
6.2 Theoretical Expectations
Theoretically, if the designed model is applicable given a properly chosen latent
threshold surface with certain tolerance for variation, the resultant GP parameters after
smoothing must fulfill the following criteria to be reliable:
For shape parameter with respect to each individual site, the estimates varying with
different threshold value should be almost constant. In addition, for each corresponding
scale parameter , the stiffness of the trajectory should behave linearly, with the slope
approximately equal to . Adding to that, the randomly chosen percentile of wave
heights should be nearly constant for various threshold values, from which one can
confidently say that the model is indeed a good estimate of the underlying distribution.
To test the performance of our model, we do the following analysis
6.3 Threshold Sensitivity and Uncertainty Analysis
0
2
4
60 2 4 6 8 10 12 14
1.5
2
2.5
3
3.5
4
55
For the first simulation study, we generate 1250 samples for each site. Our task here is to
assess the quality of MRF-GP model fitting. Given a fixed selection of , which
controls the strength of the smoothing penalty of the threshold in the thin membrane
model, we examine the variation of gamma, sigma, respectively and compare its
performance with that of the locally fitted model. To prove the superiority of our method,
we measure how well it meets the abovementioned model construction criteria in practice.
For shape parameter gamma, we expect it to remain steady as a function of threshold; for
scale parameter sigma, we expect it to be linear function with gradient equal to gamma,
varying systematically with threshold levels.
In order to implement these assessments, we first plot the function of GP parameters
changed with threshold for the locally-fit and MRF-GP model, and the results of these
two experiments for site 3 are shown in Fig. 6.3.1 and Fig. 6.3.2 respectively. Other sites
demonstrate similar performance.
Fig 6.3.1: GP parameters changed with threshold for locally fitted model. 6.3.1(a): shape
parameter; 6.3.1(b): scale parameter ((Sample Size= 1250))
2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
threshold
shape p
ara
mete
r
estimate value
95% uncertainty interval
expected behavior
2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
threshold
scale
para
mete
r
estimate value
95% uncertainty interval
expected behavior
56
Fig. 6.3.2 GP parameters changed with threshold for MRF-GP model 6.3.2(a): shape
parameter; 6.3.2(b): scale parameter ((Sample Size= 1250))
Theoretically, if the underlying distribution follows the GP form strictly, varying
threshold will not affect the shape parameter estimates while the scale parameter will be a
linear function of threshold with gradient in a reasonable interval. By inspecting the
above plotted figures, a stronger level of deviation from the estimate values is observed
in the locally-fit results. Based on this strong assumption, we can claim with confidence
that our model fits the artificial data properly with superior performance. Additionally,
compared with the results from the locally fit model, the MRF-GP model provides a
better parameter estimates for the theoretical value with less variations, which can be
reflected from the narrower 95% uncertainty band.
6.4 Smoothing Sensitivity and Uncertainty Analysis
To investigate the sensitivity of the model with varying smoothing parameters, we
temporarily keep the quantile level unchanging while varying the smoothing parameter
for the threshold ( ). The purpose of this analysis is to prove the superiority of our
model with less parameter sensitivity, compared with the locally-fit results. In this case
2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
threshold
shape p
ara
mete
r
estimate value
95% uncertainty interval
expected behavior
2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
threshold
scale
para
mete
r
estimate value
95% uncertainty interval
expected behavior
57
study, we set the quantile of threshold to be 0.4. We plot the GP parameters of all sites
varied with the smoothing parameter , and the results are shown in Figure 6.4.1 and
6.4.2 respectively.
Fig. 6.4.1 GP parameters changed with for locally fit model (Sample Size= 1250)
Fig. 6.4.2 GP parameters changed with for MRF-GP (Sample Size= 1250)
-1 0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
log(u)
shape p
ara
mete
r
-1 0 1 2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
log(u)
scale
para
mete
r
-1 0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
log(u)
shape p
ara
mete
r
-1 0 1 2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
log(u)
scale
para
mete
r
58
It is proven by P. Northrop and P. Jonathan that for a proper chosen threshold smoothing
parameter, which is determined by in our model, a constant model is sufficiently good
to fit surface and surface properly. In other words, a properly constructed threshold
surface, controlled by , may lead to a stiffer GP parameter surfaces. Demonstrated
from the above two figures, the performance of our model is better since for a suitable
range of bounded by 100 in this case, both surface and surface are flat.
6.5 The Sizing Effect
To illustrate the effect of varying sample sizes, we repeat the above analysis procedures
by reducing the sample size to 315 per site.
First, we compare the threshold sensitivity for the MRF-GP model with the locally-fit
results, and the resulting GP parameters changed with threshold level are plotted in
Figure 6.5.1 and 6.5.2, respectively.
Fig. 6.5.1 GP parameters changed with threshold for locally fitted model (Sample Size=
315)
2 3 4 5 6 7 8 9 10
-1
-0.5
0
0.5
threshold
shape p
ara
mete
r
estimate value
95% uncertainty interval
expected behavior
2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
5.5
6
threshold
scale
para
mete
r
estimate value
95% uncertainty interval
expected behavior
59
Fig. 6.5.2 GP parameters changed with threshold for MRF-GP (Sample Size= 315)
Apparently, our model still gives a better estimate. The performance is acceptable with
small variations with the theoretical values when there are enough samples (for the range
when threshold level is less than 8). However, the performance understandably
deteriorates when the threshold level is sufficiently high, since the limited number of
samples is not enough to guarantee a consistent estimate. Moreover, due the reduced
sample size, the 95% uncertainty interval is larger than that when sample size is 1250.
In addition, the sensitivity of the smoothing parameter ( ) is also presented, and the GP
parameters changed with for MRF-GP model and the locally-fit results are plotted in
Figure 6.5.3 and 6.5.4, respectively.
2 3 4 5 6 7 8 9 10
-1
-0.5
0
0.5
threshold
shape p
ara
mete
r
estimate value
95% uncertainty interval
expected behavior
2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
5.5
6
thresholdscale
para
mete
r
estimate value
95% uncertainty interval
expected behavior
60
Fig. 6,5.3: GP parameters changed with for locally fit model (Sample Size= 315)
Fig. 6.5.4: GP parameters changed with for MRF-GP (Sample Size= 315)
As illustrated from the above figures, our model still keeps a good performance even
though the sample size decreases, with a clear superiority with the locally-fit results.
-1 0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
log(u)
shape p
ara
mete
r
-1 0 1 2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
log(u)
scale
para
mete
r
-1 0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
log(u)
shape p
ara
mete
r
-1 0 1 2 3 4 5 6 7 8 9 101.5
2
2.5
3
3.5
4
4.5
5
log(u)
scale
para
mete
r
61
CHAPTER 7: CONCLUSION AND RECOMMENDATIONS
In this chapter, the objective and conclusion of our research work are elaborated and my
contribution towards it is specified. Some limitations of the conducted researches are
admitted and justified, followed by my future action plans. After these, my
recommendations to future researchers in this field are presented. This sections concludes
the whole thesis.
7.1 Summary of the Contributions
The modeling of extreme or catastrophic events over a threshold has gained a rising
popularity and significance recently. Original methods only fit the extreme data to the
Generalized Pareto (GP) distribution locally. However, dependence between neighboring
sites is obvious. In this paper, the author proposes a nonparametric method to handle the
covariate dependence of neighboring sites spatially using Gaussian Markov random fields.
The thesis starts with an introduction section concerning the topic of extreme value and
catastrophic modeling, followed by a detailed literature view covering a discussion
regarding the disadvantages and limitations of the existing approaches. Motivated by the
severity of the catastrophic events, surging demands of the extreme value knowledge
with critical applications and drawbacks of the original methods designed to tackle these
issues, the author is encouraged to propose this currently-new MRF-GP approach to
model the peak-over-threshold (POT) distribution non-parametrically.
To prepare for the model construction and analysis, the essential preliminary knowledge
including the categorization of extreme value modeling, concept of thin membrane gauss
graphical models and estimation methods are elaborated. After these, the algorithm of the
proposed MRF-GP is introduced in great length, with several popular model optimization
methods discussed and tested. This method admits the marginal distribution for each
individual site as Generalized Pareto distribution, in agreement with the previous
approaches. It first assumes that the observed values are the underlying latent random
variables mixed with a Gaussian noise with unknown variance and zero mean. It then
penalizes the distributions using the thin membrane Gaussian graphical models by
62
considering the spatial dependence characterized by the Markov random field, controlled
by a set of smoothing parameters. However, the implementation results are disappointing
and biasing, leading to various sub-optimal solutions. After careful inspection, the
mechanism of these failures is scrutinized and the author concludes that the issue is
largely resulting from the sensitivity and difficulty in the smoothing parameters selection
process. Motivated by these discoveries, the author proposes another parameter selection
method adopting the expectation maximization approach.
After the model implementation by the EM algorithm, the results are recorded and
discussed. And it shows that the results match with the theoretical expectations well.
Specifically, the gamma parameters for individual site are almost constant regardless of
the choice of threshold level. In addition, the sigma parameters behave almost linearly
with the gradient approximately equal to the corresponding gamma. After comparison
with the locally fit results, the proposed MRF-GP model demonstrates clear superiority
across all aspects with less deviations and fluctuations from the expected values.
Furthermore, the performance maintains satisfactory when the sample sizes are
reasonably reduced.
In sum, the author successfully achieved the objective of this research project for extreme
events modeling designs and accomplished the following tasks independently:
Reviewed large amount of the existing research works regarding the extreme
value modeling and analyzed their pros and cons;
Proposed a non-parametric model for catastrophic events prediction and inference
based on the designed model construction criteria;
Implemented the model using various approaches based on MATLAB, followed
by essential case studies and simulation results;
Validated the results of the simulation to ensure the compatibility with the theory;
Demonstrated the superiority of the suggested MRF-GP model with evident
analysis;
Discussed its applications and recommendations for future research works.
63
7.2 Recommendations for future works
We plan to apply this method on the real hind-cast dataset of Gulf of Mexico. Due to the
unavailability of the dataset temporarily, we‟ve made the request from Jonathan21
. We
will implement the model on these data as soon as they are available.
In future, the estimation precision can be enhanced from various ways. For instance, we
will use the multiscale model suggested by Myung Jin Choi22
, which captures the long-
range dependency by introducing several coarser scales (Figure 7.2).
Figure 7.2: Multi-scale gauss graphical grid structure in a snapshot (3D visualization
from MATLAB)
21
P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights”, Publisher: Department of Statistical Science, University College London. 22 M. J. Choi, A. S. Willsky, “Multiscale Gaussian graphical model and algorithms for large-scale inference”, Statistical Signal Processing, 2007.
64
REFERENCES
[1]. Philip Jonathan, Kevin Ewans (2008). Modeling the seasonality of extreme waves in
the Gulf of Mexico. Shell technology centre thornton and sell international
exploration and production. Proceedings of omae 2008, the 27th international
Conference on Offshore Mechanics and Arctic Engineering.
[2]. Philip Jonathan, Kevin Ewans (2007). The effect of directionality on extreme wave
design criteria. Shell Research Limited & Shell International Exploration and
Production.
[3]. Balkema, A., and Laurens de Haan (1974). "Residual life time at great age", Annals
of Probability, 2, 792–804.
[4]. Philip Jonathan, Kevin Ewans and George Forristall. (2008) “Statistical estimation of
extreme ocean environments: the requirement for modeling directionality and other
covariate effects,” journal of Ocean Engineering 35 (2008) 1211- 1225.
[5]. Paul Northrop and Philip Jonathan (2010). Modeling spatially-dependent non-
stationary extremes with application to hurricane-induced wave heights.
[6]. Geisser, Seymour (1993). Predictive Inference. New York: Chapman and Hall.
ISBN 0412034719.
[7]. Myung Jin Choi and Alan S. Willsky (2007). Multiscale Gaussian Graphical Models
and Algorithms for large-scale inference. Massachusetts Institute of Technology,
Electrical and Computer Science. 77 Massachusetts Ave., Cambridge, MA 02139,
USA.
[8]. Philip Jonathan, Kevin Ewans (2008). Uncertainties in extreme wave ueight estimates
for hurricane- dominated regions. OMAE-06-1067.
[9]. P. Jonathan, K. Ewans (2007). “Uncertainties in extreme wave height estimates for
hurricane-dominated regions”, Journal of offshore mechanics and arctic engineering,
vol. 129/1, August, 2007.
[10]. Kin Y. Li (2000). Mathematical Excalibur. Volume 5, Number 4.
[11]. P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes
with application to hurricane-induced wave heights”, Publisher: Department of
Statistical Science, University College London
65
[12]. Lian Heng (2011). MAS453-Data mining: Session 4: Frequentist and Bayesian
statistics, lecture notes, Nanyang Technological University, Singapore.
[13]. Caroline Keef, Jonathan Tawn and Cecilia Svensson (2009). Spatial risk
assessment for extreme river flows. Appl. Statist (2009) 58, Part 5, pp.
[14]. Caroline Keef, Jonathan Tawn and Cecilia Svensson (2009). Spatial dependence
in extreme river flows and precipitation for Great Britain. Journal of Hydrology 378
(2009) 240-252.
[15]. Sean Bormen (2004). The expectation maximization algorithm- a short tutorial.
[16]. Kevin Ewans and Philip Jonathan (2008). The effect of directionality on Northern
North Sea extreme wave design criteria. Nov 2008, Vol. 130/ 041604-1, journal of
offshore mechanics and arctic engineering.
[17]. Philip Jonathan and Kevin Ewans (2009). A spatio-directional model for extreme
waves in the Gulf of Mexico. Proceedings of OMAE 2009, the 28th
international
conference on offshore mechanics and arctic engineering. 31 May- 4 June, 2009,
Honolulu, U.S.A.
66
APPENDIX- LIST OF CODES
1. Main.m
clear; %clc; %matlabpool 4
%% read data load synthetic_data; XDat=PkHs(1:1250,:); [n,p]=size(XDat);
%% predifine Prm NEP=0.6; %0.4;%[0:0.01:0.1,0.12:0.02:0.24,0.26:0.04:0.3,0.35:0.05:0.6]; N=3000; alpha_u_array=0:25:500; Grid=reshape(1:78,13,6)'; Jp=thin_membrain(Grid); Thrh_array=zeros(length(alpha_u_array),78); Gammah_array=zeros(length(alpha_u_array),78); Sigmah_array=zeros(length(alpha_u_array),78); Gamma0_array=zeros(length(alpha_u_array),78); Sigma0_array=zeros(length(alpha_u_array),78); alpha_s=zeros(1,length(alpha_u_array)); alpha_g=zeros(1,length(alpha_u_array));
%% site averaging %XDat=site_average(XDat0,Grid); X=boostrap(XDat,N); [Thr0,Noise_Var] = Thr_boostrap (XDat,NEP,N,X); %initial values and
variances
%% for i=1:length(alpha_u_array) alpha_u=alpha_u_array(i); %% smooth threshold %[Thrh,alpha_u]=EM_Smth(Thr0,Noise_Var,Jp); %options=optimset('MaxFunEvals',1e20,'MaxIter',1e20,'TolFun',1e-
10,'TolProjCG',1e-10); Thrh=(((alpha_u*Jp+diag(Noise_Var.^-1))\diag(Noise_Var.^-
1))*Thr0')'; %Thrh=fminsearch(@(Prm)Smth_Post_func(Prm,Thr0,Noise_Var,alpha_u,Jp
),ones(1,78),options); Thrh_array(i,:)=Thrh;
%% smooth GP parameters [Gamma0,Sigma0,G_Var,S_Var] = GPPrm_boostrap (XDat,Thrh,N,X); %parfor j=1:p %[Gamma0_array(i,j),Sigma0_array(i,j)]=X_gpfit(XDat{j},Thrh); %end Gamma0_array(i,:)=Gamma0;
67
Sigma0_array(i,:)=Sigma0; [Sigmah,alpha_s(i)]=EM_Smth(Sigma0,S_Var,Jp); [Gammah,alpha_g(i)]=EM_Smth(Gamma0,G_Var,Jp); Gammah_array(i,:)=Gammah; Sigmah_array(i,:)=Sigmah; End
2. X_GpRnd.m
function y=X_GpRnd(n,Gmm,Sgm,Thr);
%function y=X_GpRnd(n,Gmm,Sgm,Thr); % %Philip Jonathan, Statistics & Chemometrics, Thornton %Kevin Ewans, EP Projects, Rijswijk % %ShellX V1.R2.M1 20100912 % %Generates random numbers from specified Generalised Pareto
Distribution % %Input %n 1 x 1 Number of random drawings %Gmm 1 x 1 GaMMa value %Sgm 1 x 1 SiGMa value %Thr 1 x 1 THReshold value % %Output %y n x 1 array of GP random numbers % %History %20100912 - V1.R2.M1
r=rand(n,1);%n is sample size
tGmm=Gmm;tGmm(Gmm==0)=NaN; %manage Gmm=0 if present
y=(Sgm/Gmm)*(r.^(-Gmm)-1)+Thr; %random numbers from GP
%Normal completion return;
3. X_gpfit.m
function [Gamma,Sigma]=X_gpfit(XDat,Thrh)
p=size(XDat,2); Gamma=zeros(1,p); Sigma=zeros(1,p); for j=1:p tX=XDat(:,j)-Thrh(j); tX=tX(tX>0);
68
tPrm=gpfit(tX); Gamma(j)=tPrm(1); Sigma(j)=tPrm(2); end
4. Smth_Post_func.m
function f=Smth_Post_func(Prm,val0,NV,alpha,Jp)
f=(val0-Prm)*diag(NV.^-1)*(val0-Prm)'+Prm*alpha*Jp*Prm';
5. Post_func.m
function f=Post_func(Prm,val0,NV) %,Jp)
f=(val0-Prm*ones(1,78))*diag(NV.^-1)*(val0-
Prm*ones(1,78))';%+Prm(1:end-1)*Prm(end)*Jp*Prm(1:end-1)'-
78*log(Prm(end));
6. thin_membrain.m
function Jp=thin_membrain(Grid)
[m,n]=size(Grid); Jp=zeros(m*n); for i=1:m if i==1 for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; end end elseif i==m for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=2;
69
Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; end end else for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=4; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; end end end end
7. boostrap.m
function X = boostrap(XDat,N)
[n,p]=size(XDat); X=zeros(n,p,N);
for i=1:N; %2:N-1 I=floor(rand(n,1)*n)+1; I(I==n+1)=n; X(:,:,i)=XDat(I,:); %X(:,:,i+1)=XDat(I(l+1:2*l),:); end
8. Thr_boostrap.m
function [Thr,Noise_Var] = Thr_boostrap (XDat,NEP,X)
N=size(X,3); p=size(XDat,2);
70
Qx_bt=zeros(N,p); Thr=quantile(XDat,NEP);
for i=1:N Qx_bt(i,:)=quantile(X(:,:,i),NEP); end
Noise_Var=var(Qx_bt);
9. EM_Var.m
function [valh,varh]=EM_Var(val0,alpha,Jp)
p=size(Jp,1); var0=1;
while 1 x=(alpha*Jp+var0*eye(p))\val0'*var0; varh=p/(sum((val0'-x).^2)+trace(eye(p)/(alpha*Jp+var0*eye(p)))); if abs(varh-var0)<1e-4 break; else var0=varh; end end
valh=x';
10. EM_Smth.m
function [valh,alphah]=EM_Smth(val0,Noise_Var,Jp) p=size(Jp,1); alpha0=0; Rinv=diag(Noise_Var.^-1); while 1 x=((alpha0*Jp+Rinv)\Rinv)*val0'; alphah=p/(trace(Jp/(alpha0*Jp+Rinv))+x'*Jp*x); if abs(alphah-alpha0)<1e-4 || rcond(alpha0*Jp+Rinv)<1e-16 break; else alpha0=alphah; end end if rcond(alpha0*Jp+Rinv)>1e-16 valh=x'; else options=optimset('MaxFunEvals',1e20,'MaxIter',1e20,'TolFun',1e-
10,'TolProjCG',1e-10); Prmh=fminsearch(@(Prm)Post_func(Prm,val0,Noise_Var),1,options);
71
valh=Prmh*ones(1,p); alphah=inf; end