Essays in Applied Bayesian Particle and Markov Chain
Monte Carlo Techniques in Time Series Econometrics
2014-9
Nima Nonejad
PhD Thesis
DEPARTMENT OF ECONOMICS AND BUSINESS
AARHUS UNIVERSITY DENMARK
Essays in Applied Bayesian Particle and Markov
Chain Monte Carlo Techniques in Time Series
Econometrics
By Nima Nonejad
A PhD thesis submitted to
School of Business and Social Sciences, Aarhus University,
in partial fulfillment of the requirements of
the PhD degree in
Economics and Business
October 2014
Contents
Contents i
Preface iv
Summary v
Resume vii
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory 1
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Modeling Structural Breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1. Change-point model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2. Mixture innovation model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4. Breaks in the conditional variance . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5. Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.6. Calculating the predictive likelihood and the predictive mean . . . . . . . . . . 7
1.3. Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1. Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.2. Full sample estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.3. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7. Prior Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7.1. CPHAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7.2. MHAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach 25
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2. Change-point ARFIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3. Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1. Breaks in µ and d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2. Only breaks in σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.3. Bayes factors and marginal likelihood computation . . . . . . . . . . . . . . . . 32
CONTENTS
ii
2.4. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1. Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2. Change-point identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.3. Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.4. Deviance information criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.5. Higher number of change points . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.6. Sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5. Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6. Application to S&P 500 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6.3. Robustness to minimum duration restrictions . . . . . . . . . . . . . . . . . . . 46
2.6.4. Prior sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6.5. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6.6. Structural breaks and GARCH effects . . . . . . . . . . . . . . . . . . . . . . . 49
2.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models 51
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2. Markov Chains and Particle Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.1. A Particle filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.2. Bayes factors and marginal likelihood computation . . . . . . . . . . . . . . . . 57
3.3. Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4. Unobserved Components Model of US Inflation . . . . . . . . . . . . . . . . . . . . . . 62
3.5. Long Memory with Stochastic Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5.1. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.2. US core inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.3. Subsample estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.4. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.5. Parameter sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.6. Unobserved Components Model with SVM Effects . . . . . . . . . . . . . . . . . . . . 77
3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks 81
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2. Why is PG-AS Useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3. Particle Gibbs with Ancestor Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1. Model comparison using the output from PG-AS . . . . . . . . . . . . . . . . . 88
4.4. Dow Jones Industrial Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5. Structural Breaks and PG-AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.1. Simulation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5.2. Real output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5.3. Structural break ARFIMA-SV model . . . . . . . . . . . . . . . . . . . . . . . . 99
CONTENTS
iii
4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A. Appendix for Chapter 1 105
A.1. Estimation of the change-point model . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
A.2. Marginal likelihood computation for the change-point model . . . . . . . . . . . . . . . 106
A.3. Estimation of the MIA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
B. Appendix for Chapter 2 109
B.1. A direct approach for evaluating the likelihood function . . . . . . . . . . . . . . . . . 109
C. Appendix for Chapter 3 110
C.1. Simulation evidence: SV-MA(1) and SVM . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.2. Particle Gibbs with ancestor sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
D. Appendix for Chapter 4 114
D.1. Prior sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
D.2. Sensitivity of PG-AS with respect to M . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 117
Preface
In many econometric problems there are no obvious ways to address (a): nonlinearities, (b): time-
variation in the model parameters, or (c): dealing with high-dimensional models. These extensions are
often necessary from a theoretical point of view but very difficult to implement in practice. However,
by capitalizing on general advances in computer speed and computational power, Bayes’ theorem can
be used to generate effective estimation procedures that can handle (a), (b) and (c). The performance
of these procedures can then be evaluated using non-Bayesian criteria. Furthermore, these techniques
enable researchers to build relatively complicated econometric models which in turn better fit economic
theory. However, interest in these methods varies greatly between learning institutions. One major
reason can be a general lack of undergraduate or graduate courses on Bayesian methods and Markov
chain Monte Carlo techniques1.
In the light of the above discussion, this dissertation aims at illustrating the attractive aspects
of Bayesian inference which are purely computational and qualitative. This is done by using Gibbs
sampling, particle Gibbs with ancestor sampling and particle marginal Metropolis-Hastings to draw
from the analytical intractable posterior density defined from Bayes’ theorem. In each chapter, I either
try to extend a recent estimation procedure, or develop more flexible computational techniques such
that one can estimate new or extend already known econometric models.
I would like to thank the School of Business and Social Sciences at Aarhus University as well as
the Center for Research in the Econometric Analysis of Time Series (CREATES), funded by the
Danish National Research Foundation for hosting me, providing me with excellent research facilities,
a stimulating intellectual environment and generous financial support. Finally, a number people have
contributed in the making of this dissertation. First and foremost, I would like to thank my adviser
Asger Lunde for his guidance and support. I am also very grateful to Niels Haldrup, Henning Bunzel
and CREATES for providing me with the necessary computational facilities.
Updated preface
The pre-defence took place on October the 2nd 2014 in the presence of the jury composed of Eric
Hillebrand (Aarhus University and CREATES), Jim Griffin (University of Kent), Michel van der Wel
(Erasmus University Rotterdam and CREATES) and Asger Lunde. The jury made a number of
constructive comments and suggestions in order to improve the papers contained in the dissertation.
I have tried to implement as many of these as possible given the time constraint in this updated version.
Nima Nonejad
Aarhus, October 2014
1This is of course very strange because there is no shortage of good textbooks which cover the advances that haverevolutionized the field of Bayesian econometrics since the late 1980s. Among these are textbooks such as: Bauwens,Lubrano and Richard (1999), Koop (2003), Lancaster (2004) and Geweke (2005).
Summary
Each chapter of this dissertation stands as an independent article and is therefore self-contained.
The analyses demonstrate the usefulness of combining Bayesian methods with simulation methods
such as Gibbs sampling, particle Gibbs with ancestor sampling and particle marginal Metropolis-
Hastings with special focus on statistical analysis of economic time series data. I provide motivation
for the econometric techniques used in each individual chapter. All necessary computational details
are provided in the Appendices.
Chapter 1 proposes a flexible model that is able to simultaneously approximate long-memory be-
havior and incorporate structural breaks in the model parameters. This is achieved by combining the
mixture innovation (MIA) specification of Gerlach et al. (2000) with the heterogeneous autoregres-
sive (HAR) model of Corsi (2009). It is an open question if the MIA specification will perform well
when dealing with the sort of structural changes present in realized volatility data. The main purpose
of this chapter is to shed light on this question. Therefore, I compare the performance of the MIA
specification with an existing method for modeling structural breaks, namely, the change-point (CP)
specification of Chib (1998). I believe that applying the MIA specification to realized volatility data
and comparing its performance with the change-point specification, which in the literature is consid-
ered as a “state-of-the-art” structural break model is the most important contribution that I provide.
To my knowledge, no work has been done on comparing the out-of-sample forecasting performance of
these methods. These methods differ in important aspects, for example, their treatment of the break
process. The change-point specification imposes the restriction that a precise number of breaks occurs
in the sample, whereas for the MIA specification, the number of breaks is treated as unknown.
In an extensive empirical evaluation involving several volatility series, I demonstrate presence of
structural breaks and their importance for forecasting. However, one cannot establish that there is
one single forecasting method which can be recommended universally. That is, for some series MIA
outperforms the change-point specification, whereas for other series CP performs better than MIA.
Chapter 2 builds on Raggi and Bordignon (2012). Specifically, the aforementioned paper consid-
ers estimating a two-state Markov switching ARFIMA (autoregressive fractionally integrated moving
average) model using Markov chain Monte Carlo techniques. Chapter 2 on the other hand, consid-
ers a more challenging specification, namely, a structural break specification of the ARFIMA model.
Structural breaks occur through irreversible Markov switching or so-called change-point dynamics.
The parameters subject to structural breaks, the unobserved states, which determine the position of
the structural breaks are sampled from the joint posterior density by sampling from their respective
conditional posteriors using Gibbs sampling and Metropolis-Hastings. Furthermore, instead of using
traditional approaches to evaluate the likelihood function, I extend previous works on precision-based
algorithms and provide a direct approach to evaluate the likelihood function. I believe that incorpo-
rating these features within the ARFIMA setting is the most important contribution that I provide.
An extensive Monte Carlo experiment is conducted to investigate if the proposed method works
Summary
vi
well in identifying: data generating parameters, true structural break dates and the correct number
of structural breaks. With regards to the last point, I compare the ability of the marginal likelihood
(ML) and the deviance information criterion (DIC) to detect the correct number of structural breaks.
Results show that ML and DIC perform very well in identifying the true number of structural breaks.
The higher the number of parameters that are affected by a break, the more likely it is that structural
breaks are correctly identified. Finally, it becomes more difficult to identify one or more breaks when
only the persistence parameter, d, changes. Applied to daily S&P 500 data from 2000 to 2009, one
finds strong evidence in favor of four structural breaks. The evidence in favor of structural breaks
is robust to different specifications including a GARCH specification for the conditional volatility of
realized volatility.
Chapter 3 is inspired by Bos (2011), Bos et al. (2012), Creal (2012), Flury and Shephard (2011). It
provides some contribution in the field of sequential Monte Carlo methods. Specifically, this chapter
details particle Markov chain Monte Carlo (PMCMC) techniques for analysis of unobserved component
time series models. The aim of this chapter is to describe the basic steps of PMCMC together with
details on implementation of some of the key algorithms in Ox2. I show that PMCMC allows one to
develop a general and flexible framework for estimation, forecasting and model comparison. On the
other hand, estimating the same types of models using “pure” Gibbs sampling would require relatively
more programming effort.
Chapter 4 is the last chapter of the dissertation. In this chapter, I apply a relatively new tool in
the family of sequential Monte Carlo methods, which is particularly useful for inference of stochastic
volatility models, namely, particle Gibbs with ancestor sampling (PG-AS), suggested in Lindsten et
al. (2012). I apply PG-AS to the challenging class of stochastic volatility models with increasing com-
plexity, including leverage and in mean effects. I provide applications that demonstrate the flexibility
of PG-AS under these different circumstances and justify applying it in practice.
Finally, I also provide applications where I combine discrete structural breaks within the stochastic
volatility model framework using both simulated and macroeconomic time series data. Structural
breaks are modeled through irreversible Markov switching, or so-called change-point dynamics. I
estimate model parameters, log-volatilities and change-points dates conditional on a fixed number of
change points. For each of these specifications, ML and DIC are calculated. They are then used to
determine the number of structural breaks. For instance, I model changing time series characteristics
of postwar monthly US core inflation rate using a structural break ARFIMA model with stochastic
volatility. I allow for structural breaks in the level, AR, MA parameters, persistence parameter, d,
level of volatility, persistence of volatility and volatility of volatility. Overall, compared to constant
parameter models, structural break specifications provide better in-sample fit, produce better out-of-
sample point and density forecasts.
Each of the subsequent chapters has been submitted to peer reviewed field journals. Chapter 1 has
been submitted to Journal of Forecasting. Chapter 2 and an earlier version of Chapter 3 have been
submitted to Journal of Financial Econometrics and Journal of Time Series Econometrics. Chapter 4
has been submitted to Studies in Nonlinear Dynamics and Econometrics.
2Both mentioned papers provide excellent introductions to particle filtering. Furthermore, the authors provide thesoftware associated with their work. Hence, the idea behind this chapter is to do the same, however, for PMCMC.This way other researchers can replicate the results using my codes. The choice of Ox is mainly because it is a popularsoftware among econometricians.
Resume
Denne afhandling bestar af fire uafhængige kapitler. Den røde trad i disse kapitler er modellering af
tidsrækker ved hjælp af simulationsmetoder som Markov chain Monte Carlo (MCMC) og sequential
Monte Carlo (SMC). Hvordan skal man modellere ikke-lineariteter, inklusion af uobserverede kompo-
nenter, strukturelle ændringer i modelparametre og andre vanskelige tekniske egenskaber? Hvordan
kan man opna palidelig estimation og inferens? I denne afhandling forsøger jeg at bidrage til at finde
en løsning pa disse problemer.
De første to kapitler foreslar en strategi for estimation og forecasting af realized volatility ved
hjælp af Gibbs sampling og Metropolis-Hastings. Jeg undersøger, om estimation og præcisionen af
prognoserne kan forbedres ved at tage højde for strukturelle ændringer i modelparametre. Pa baggrund
af en række simulationer samt empiriske anvendelser konkluderer jeg, at undladelse af tidsvariation
i modelparametre kan være et alvorligt problem, der kan give anledning til reduceret præcision af
prognoser, samt at brugen af de foreslaede metoder bidrager til at afhjælpe dette.
De sidste to kapitler undersøger muligheden for at opna robust estimation af modeller med uob-
serverede komponenter ved hjælp af particle Gibbs med ancestor sampling, PG-AS, og particle marginal
Metropolis-Hastings. Begge kapitler fokuserer mest pa de nævnte metoders tekniske egenskaber. Ved
simple parametriske antagelser, er jeg i stand til at estimere, og danne makroøkonomiske prognoser.
Jeg dokumenterer pa ny tilstedeværelsen af tidsvarierende parametre pa makroøkonomiske data. Igen,
bliver præcisionen af prognoserne forbedret ved at tage højde for tidsvariation i modelparametrene.
1. A Mixture Innovation Heterogeneous
Autoregressive Model for Structural Breaks and
Long Memory
Author: Nima Nonejad
Abstract: We propose a flexible model that is able to approximate long-memory behavior and in-
corporate structural breaks in the model parameters. Our model is an extension of the heterogeneous
autoregressive (HAR) model, which is designed to model and forecast volatility of financial time se-
ries. In an extensive empirical evaluation involving several volatility series, we demonstrate presence
of structural breaks and their importance for forecasting. We find that the choice of how to model
break processes is important in achieving good forecast performance. Furthermore, structural break
specifications perform better than simple, rolling window forecasts.
Keywords: Bayesian inference, forecasting, mixture innovation models, realized volatility
(JEL: C11, C22, C51, C53)
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
2
1.1. Introduction
Measuring and modeling the conditional variance or volatility of financial time series is an important
issue in econometrics. General approaches of estimating volatility are based on parametric models
such as generalized autoregressive conditional heteroskedasticity (GARCH) models proposed by Engle
(1982) and Bollerslev (1986), or stochastic volatility (SV) models as in Kim et al. (1998). However,
during the past two decades, the approach of using improved measures of expost volatility constructed
from high-frequency data has become very popular. This measure is called realized volatility (RV),
see Andersen et al. (2001), and Barndorff-Nielsen and Shephard (2002 a,b) for a formal discussion.
This paper proposes a simple model that merges long-memory dynamics and nonlinearities. The
specification that is put forward is a generalization of the heterogeneous autoregressive (HAR) model,
see Corsi (2009). The HAR model has been applied with success in modeling and forecasting realized
volatility, see Andersen et al. (2007). Our model is called the mixture innovation heterogeneous au-
toregressive model, MHAR. It combines ingredients from HAR and mixture innovation (MIA) models,
see Gerlach et al. (2000), Giordani and Kohn (2008) and Groen et al. (2012). This approach builds on
the state-space representation, modeling the breaks through mixture distributions in state innovations
of linear Gaussian state-space models. Indeed, this approach is very intuitive and has several desirable
features such as: the possibility of including a random number of breaks that can occur in the sample,
the possibility of jointly modeling small possibly frequent and large possibly less frequent breaks, while
allowing different parameters to change at different points in time.
It is an open question if the MHAR model will perform well when dealing with the sort of structural
changes present in realized volatility data. The main purpose of this paper is to shed light on this
question. Therefore, we compare the performance of the MIA specification with an existing method
for modeling structural breaks in realized volatility data, namely, the change-point specification of
Chib (1998). Henceforth, we refer to this specification as CPHAR, see Liu and Maheu (2008). In
addition, we also include alternative forecasting procedures such as: recursively estimating the HAR
model and a random walk time-varying parameter HAR model.
We believe that applying the MIA specification to realized volatility data and comparing its perfor-
mance with the change-point specification, which in the literature is considered as a “state-of-the-art”
structural break model is the most important contribution that we provide. To our knowledge, no work
has been done on comparing the out-of-sample performance of these methods. MHAR and CPHAR
differ in important aspects, for example, their treatment of the break process. Specifically, the “gen-
eral” change-point specification imposes the restriction that a precise number of breaks occurs in the
sample. Furthermore, all model parameters change simultaneously due to a structural break. How-
ever, for the MIA specification, the number of breaks is treated as unknown, and model parameters
are allowed to change separately at different points in time.
For eleven realized volatility series between 2004 and 2009, we consider the aforementioned models
and produce daily, weekly and biweekly forecasts. We evaluate forecast performance using two criteria:
predictive likelihood (PL), see Geweke (2005) and root mean squared error, RMSE. Both criteria
are easily attainable within the Bayesian estimation procedure. It turns out that these two loss
functions lead to similar qualitative conclusions. Overall, structural breaks play an important role for
forecasting in all of the volatility series that we consider. Specifically, we find that each structural break
specification outperforms the HAR model, regardless of criterion or forecast horizon. Furthermore, the
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
3
MHAR model with time-varying volatility tends to perform better than the other models, especially
for longer forecast horizons. However, we find that there is no one single method which can be
recommended universally, i.e. for all series and all forecast horizons. For some series CPHAR performs
better than MHAR specifications and vice versa.
The structure of this paper is as follows. Section 1.2 discusses the econometric issues for Bayesian
estimation. Section 1.3 reviews the theory behind the volatility measures used in this paper. Section
1.4 briefly presents the HAR model. Details on the data are presented in Section 1.5. Section 1.6
discusses the empirical results. Finally, the last section concludes.
1.2. Modeling Structural Breaks
1.2.1. Change-point model
The models considered in this paper use the framework of a Gaussian linear regression model
yt = Xt−1β + εt, εt ∼ N(0, σ2
)(1.2.1)
for t = 1, .., T . Let YT = (y1, ..., yT )′
be a vector of size T , and XT be a T × k, matrix of regressors
with row Xt−1, which can also include lags of yt. Obviously, different structural break models vary
in the way they model breaks in (1.2.1) by allowing β and possibly σ2 to vary over time. To begin
with, we focus mainly on structural breaks in the regression coefficients, β, and assume that σ2 is
fixed through time. However, breaks in σ2 can also be modeled rather easily. We consider two main
approaches to modeling structural breaks in β and later in(β;σ2
).
We start by considering the change-point (or structural break) specification proposed by Chib (1998).
This specification uses a hidden Markov model with a restricted transition matrix to model the change
points. A test for the number of structural breaks is then a test of the dimension of the hidden Markov
chain. Model parameters and change points are jointly estimated conditional on a fixed number of
change points. Bayes factors are then used to compare the evidence for the number of breaks.
Assume that there are m−1, m ∈ 1, 2, ... change points at unknown times, Ωm = τ1, τ2, ..., τm−1.Separated by those change points, there are m different regimes. Thus,
βt =
β1 if t < τ1
β2 if τ1 ≤ t < τ2
......
...
βm−1 if τm−2 ≤ t < τm−1
βm if τm−1 ≤ t
. (1.2.2)
The density of observation yt, t = 1, ..., T depends on βj , j = 1, 2, ...m whose value changes at the
change points, Ωm, and σ2. Let S = (s1, ..., sT )′
denote the unobserved state system, where st = j
indicates that yt is from regime j and follows the conditional distribution p(yt | βj , σ2, Yt−1
). The
one-step ahead transition probability matrix for st, P , is assumed to be
Pr (st = j | st−1 = j) = pj , Pr (st = j + 1 | st−1 = j) = 1− pj (1.2.3)
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
4
for j = 1, ...,m− 1 and Pr (st = m | st−1 = m) = 1. The other elements of P are set to zero. Hence, if
regime j holds at time t−1, then at time t, the process can either remain in regime j (with probability
pj), or a break occurs and the process moves to regime j + 1 (with probability 1− pj). Once the last
regime is reached, one stays there forever. This structure enforces the ordering (1.2.2) on the change
points1. Technical details about estimation of (1.2.1) using (1.2.2) are provided in the Appendix2.
As previously mentioned, the change-point regression model requires a fixed number of structural
breaks to occur3. In this paper we follow Pesaran et al. (2006), Liu and Maheu (2008), Bauwens
et al. (2011) and estimate models for different number of structural breaks (0 to 7 in our empirical
application). Then, we compare results across these models using the marginal likelihood criterion.
Specifically, let models with i and j structural breaks be denoted by Mi and Mj , respectively. For each
specification, we can calculate the marginal likelihood, p (y1, ..., yT |Mi), following Chib (1995). We
then rank models by means of their Bayes factors, BFij = p (y1, ..., yT |Mi) /p (y1, ..., yT |Mj). Large
values of BFij indicate that data supports Mi over Mj , see Kass and Raftery (1995). In the Appendix,
we provide details on how to compute the marginal likelihood for the change-point regression model.
1.2.2. Mixture innovation model
An alternative specification that allows for structural breaks in the regression parameters of (1.2.1)
can be defined in the following way
yt = Xt−1βt + εt, εt ∼ N(0, σ2
), (1.2.4)
where βt = (β1t, ..., βkt)′
is a vector of time-varying regression parameters. Each element of βt in
(1.2.4) evolves according to the following form
βit = βit−1 + κitηit, ηit ∼ N(0, q2
i
), i = 1, ..., k, (1.2.5)
where κit = 0, 1 is an unobserved process with Pr (κit = 1) = πi, Kt = (κ1t, ..., κkt)′, K = KtTt=1
and B = βtTt=1. Accordingly, the ith regression coefficient, βit, remains the same as its previous
value, βit−1, unless κit = 1 in which it changes with ηit.
This specification implies that βit in (1.2.5) is allowed to change every time period, but it does
not necessarily need to change at all. Also, changes in the separate parameters are not restricted to
coincide as in the change-point model. Rather, changes in each βit are allowed to occur at different
points in time. Furthermore, for the change-point model, we specify the number of structural breaks
by comparing ML of models with different number of change points. Here, we estimate only one
specification, and allow the data to determine the nature of structural breaks in each element of βt.
In the Appendix, we provide details on how to estimate the MIA model using Gibbs sampling.
1Theoretically, we could allow for breaks in each element of β to occur independently. In this case, st will be a k × 1discrete random variable with the first element controlling breaks in β1, the second element controlling breaks in β2and so on. Furthermore, in order to ease the notation, conditioning on XT is suppressed.
2Notice that our specification is identical to Liu and Maheu (2008) but differs slightly from the hierarchical priorspecification (on the conditional mean, variance and the prior on the regime durations) of Pesaran et al. (2006).However, since we perform direct forecasting it does not make any difference which specification is used.
3Koop and Potter (2007) argue that (1.2.3) may be restrictive in some situations. They suggest the use of the moreflexible Poisson distribution for the durations. However, in order to avoid the very heavy additional computationalcost inherent to the use of the Poisson prior, we choose only to implement the change-point model using (1.2.3).
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
5
1.2.3. Monte Carlo
The plan of our Monte Carlo study is to judge the performance of CP and MIA under different
circumstances. Specifically, we use a model very similar to (1.2.1) and consider cases of: 0 breaks, 1
break where all parameters change at the same time, and 2 breaks where the first (last) break in each
parameter occurs at a different time (the same time). This setting will provide us with information on
the tendency of each model to give false signals of structural breaks and misinterpret the magnitude
of parameter changes. For instance, as stated in Section 1.2.1, we must specify the number of change
points for the CP model. Furthermore, all the elements in β change due to a structural break. On the
other hand, changes in the separate parameters are not restricted to coincide for the MIA specification.
Thus, it is interesting to find out how well CP performs compared to MIA when structural breaks in
each parameter occur at different points in time. In a similar fashion, it is also interesting to find out
how well MIA performs compared to CP or a simple OLS when there are no structural breaks in the
parameters of the data generating (DGP) process.
For each DGP, 100 samples of 500 observations are generated. The true values of β are given in
Table 1.1. We choose Xt−1 ∼ N (0, I2) and σ2 = 0.01 in all of our simulations. For each Monte
Carlo repetition, we report: (a) the full in-sample root mean squared error (RMSE) of CP (MIA) over
OLS, and (b) the in-sample root mean squared error of CP (MIA) over OLS using only data from
the last structural break till the end of the sample, “last regime”. This requires exact knowledge of
the true date of the last break. Therefore, it is not available outside simulation studies. As shown
in the second column of Table 1.1, there is little to lose from using the MIA specification when there
are no structural breaks in the DGP. The increase in the RMSE of MIA over OLS is only about 5%.
We estimate models with zero up to five change points for the CP specification. When there are no
structural breaks in the DGP, the CP specification with no change point obtains the highest ML value
at every Monte Carlo repetition4.
On the other hand, compared to OLS, large gains are obtained for both models in the presence
of structural breaks. When there is one structural break in the DGP, CP and MIA perform very
similarly. Furthermore, both models perform almost as well as if the date of the structural break
was known. For instance, the RMSEs of CP/OLS and MIA/OLS for “last regime” are almost equal
to 1. For Case 2, we see that MIA performs slightly better than CP. More importantly, for CP, we
discover that the specification with four change points obtains the highest ML value at every Monte
Carlo repetition, even though there are only two structural breaks in the DGP. Recall that in this
case, the first structural break in each parameter occurs at a different point in time. Clearly, CP has
some difficulties incorporating this feature as changes in the parameters are restricted to coincide thus
making inference difficult. On average, the first change point is estimated at t = 125, the second at
t = 200 and the third at t = 350. Furthermore, CP also estimates a fourth change point at t = 400.
Evidently, this is incorrect as there are no structural breaks at t = 400 in the DGP. However, the
average Monte Carlo estimates of β are almost identical in the last two regimes. On the other hand,
MIA does not encounter any difficulty incorporating this feature, and is able to detect the break dates.
Finally, CP (MIA) performs almost as well as if the date of the last structural break was known. The
increase in the RMSE of CP (MIA) over OLS for the last regime is 0.5% (0.8%).
4The numbers inside the parentheses indicate the best change-point specification over the total number of Monte Carlorepetitions. In each case, the log(BF) in favor of the correct change-point specification is about 20 to 30.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
6
Table 1.1.: Monte Carlo results
DGP Case 0 Case 1 Case 2
Number of breaks 0 1 2β1 0.8 0.8, 0.2 0.8, 0.2, 0.8β2 -0.4 -0.4, 0.1 -0.4, 0.1, -0.4
True break datest = 250 for β1
t = 250 for β2
t = 125, 350 for β1
t = 200, 350 for β2
Full sample RMSE
CP/OLS1.012
(100/100, 0 CP)0.314
(100/100, 1 CP)0.377
(100/100, 4 CP)MIA/OLS 1.048 0.326 0.365Last regime RMSE
CP/OLS1.002
(100/100, 1 CP)1.005
(100/100, 4 CP)MIA/OLS 1.004 1.008
This table reports the RMSE ratio of CP and MIA over OLS using both the entire sample as well as only data
from the last break point till the end of the sample, “last regime”. The numbers inside the parentheses indicate
the number of times over the Monte Carlo repetitions when the specific change-point specification obtains the
highest ML value. CP denotes the number of change points that are conditioned on.
1.2.4. Breaks in the conditional variance
In this section, we briefly describe how to model structural breaks in the conditional variance, σ2,
for CP and MIA specifications. For the change-point model, the conditional posterior density of
θj =(β′j , σ
2j
)′depends only on observations in regime j = 1, ...,m. Therefore, let yj = yt : st = j,
Xj = Xt−1 : st = j and use standard Gibbs sampling methods for the linear model. Once we sample
βj for j, we can use εj = yj − Xjβj , and sample σ2j from the inverse Gamma density, see Liu and
Maheu (2008). Ideally, it would be desirable to allow σ2 to vary independently from β. However,
incorporating this feature is computationally more demanding and will probably not provide any
significant improvements.
Modeling structural breaks in σ2 for the MIA model is a bit more complicated. In this paper we
take the same approach as Giordani and Kohn (2008) and Groen et al. (2012):
Initialize the sampler with a time series of conditional variances, σ21, ..., σ
2T .
Conditional on σ21, ..., σ
2T , draw K, B, q2
1, ..., q2k and π1, ..., πk from their respective conditional
posteriors. Compute the residual for time t as εt = yt−Xt−1βt = σtut, where ut ∼ N (0, 1). We
can then square both sides, take the logarithm such that log ε2t = log σ2
t + log u2t , where log u2
t is
logχ21 distributed and can be very accurately approximated by a mixture of Normals with seven
components, see Kim et al. (1998). We follow the stochastic volatility literature and incorporate
structural breaks in σ2 as log σ2t = log σ2
t−1 + κSVt ηt, where ηt ∼ N(0, σ2
η
).
Here, κSVt = 0, 1 and evolves independently from Kt. As before, we can use the conditioning features
of the Gibbs sampler to sample log σ2t , κ
SVt , t = 1, ..., T , πSV and σ2
η from their respective conditional
posteriors. Specifically, we draw KSV =(κSV1 , ..., κSVT
)′using the algorithm of Gerlach et al. (2000).
Thereafter, we sample log σ2t , t = 1, ..., T using Carter and Kohn (1994) conditional on KSV and σ2
η.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
7
1.2.5. Model comparison
In this paper we compare the performance of models using a specific out-of-sample period. Consider
the universeM = (M1, ...,Mn) of models. Let p (yt | θk, Yt−1,Mk) denote the conditional data density
of model Mk given Yt−1 and the model parameters, θk. Conditional on Yt−1 = (y1, ..., yt−1)′, the
predictive likelihood (PL) of model Mk for yt, ..., yT , t < T is defined as
p (yt, ..., yT | Yt−1,Mk) =
ˆΘk
p (yt, ..., yT | θk, Yt−1,Mk) p (θk | Yt−1,Mk) dθk. (1.2.6)
Note that, if t = 1 this would be the marginal likelihood. Thus, (1.2.6) changes to
p (y1, ..., yT |Mk) =
ˆΘk
p (y1, ..., yT | θk,Mk) p (θk |Mk) dθk,
where p (y1, ..., yT | θk,Mk) is the likelihood and p (θk |Mk) is the prior density of model Mk. Hence,
the sum of log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the
marginal likelihood, but ignoring the initial t − 1 observations. The predictive likelihood indicates
how well model Mk accounts for the realizations, yt, ..., yT , such that the best model is the one which
achieves its maximum value. Hence, it can be used to order models according to their predictive
abilities. For instance, (1.2.6) is simply the product of the individual predictive likelihoods
p (yt, ..., yT | Yt−1,Mk) =T∏s=t
p (ys | Ys−1,Mk) , (1.2.7)
where each of the terms, p (ys | Ys−1,Mk), has parameter uncertainty integrated out. We can compare
the relative value of density forecasts using the realized data, yt, ..., yT , with the predictive likelihoods
for two or more models.
The Bayesian approach also allows for the comparison and ranking of models by predictive Bayes
factors, PBF. As before, suppose we have n different models denoted by Mk, k = 1, ..., n. The PBF
for yt, ..., yT and M1 versus M2 is
PBF12 = p (yt, ..., yT | Yt−1,M1) /p (yt, ..., yT | Yt−1,M2) .
It provides an estimate of the relative evidence for model M1 versus M2 over yt, ..., yT . PBFs include
Occam’s razor in that they penalize highly parametrized models that do not deliver improved predictive
content. Kass and Raftery (1995) recommend considering twice the logarithm of PBF for model
comparison. Evidence in favor of model M1 can be interpreted as: not worth more than a bare mention
for 0 ≤ 2 log (PBF12) < 2; positive for 2 ≤ 2 log (PBF12) < 6; strong for 6 ≤ 2 log (PBF12) < 10 and
very strong for 2 log (PBF12) > 10, see Kass and Raftery (1995).
1.2.6. Calculating the predictive likelihood and the predictive mean
Calculating the predictive likelihood within a Gibbs sampling scheme is easy. We can simply use the
predictive decomposition along with the output from the Gibbs sampler, θ(1), ..., θ(N). Specifically,
each term on the right-hand side of (1.2.7) can be consistently estimated from the Gibbs sampler
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
8
output as
p (yt | Yt−1,Mk) ≈1
N
N∑i=1
p(yt | θ(i)
k , Yt−1,Mk
). (1.2.8)
For example, in the context of (1.2.1), θ(i) =(β′(i), σ2(i)
)′and p
(yt | θ(i), Yt−1
)denotes the Normal
density with mean Xt−1β(i), variance σ2(i), evaluated at yt. The Gibbs sampler draws are obtained
based on the information set, Yt−1. As a new observation enters the information set, the posterior is
updated through a new round of Gibbs sampling. The predictive density, p (yt+1 | Yt), can then be
calculated in a similar manner.
We can also compare forecasts of models based on the predictive mean. Similar to the predictive
likelihood, the predictive mean can be computed using the Gibbs draws. For instance, in the context
of (1.2.1), we calculate the predictive mean of yt conditional on Yt−1 as
E [yt | Yt−1] ≈ 1
N
N∑i=1
Xt−1β(i). (1.2.9)
Calculating (1.2.8) or (1.2.9) for the change-point and MIA specifications is a bit more complicated
because one must also consider uncertainty regarding the timing of the structural breaks. This uncer-
tainty is accounted for by using draws of S(i), K(i) and KSV (i), i = 1, ..., N . All of these quantities
are available from the Gibbs output.
1.3. Realized Volatility
Assume that the price process belongs to the class of special semi-martingales, which is a very broad
class of processes including Ito and jump processes. Andersen et al. (2001), and Barndorff-Nielsen and
Shephard (2002a, b) show that the quadratic variation of the process which is defined as integrated
volatility plus a jump component provides a natural measure of expost volatility. Hence, consider the
following logarithmic price process
dp (t) = µ (t) dt+ σ (t) dW (t) + J (t) dq (t) , 0 ≤ t ≤ T, (1.3.1)
where µ (t) is the drift term, σ (t) is the stochastic volatility process, W (t) is a standard Wiener
process, dq (t) is a Poisson process with dq (t) = 1 which corresponds to a jump at time t, dq (t) = 0
which corresponds to no jump, a jump intensity, λ (t), and J (t) refers to the size of a realized jump.
The increment in quadratic variation from time 0 to t is
QVt =
ˆ t
0σ2 (s) ds+
∑0≤s≤t,dq(s)=1
J2 (s) ,
where the first term, integrated volatility, is from the continuous component of equation (1.3.1), and
the second term is the contribution from discrete jumps. To consider estimation of QVt, the daily
time interval is normalized to unity and divided into n periods. Each period has length 4 = 1/n.
The 4period return is defined as rt,j = p (t+ j4)− p (t+ (j − 1)4), j = 1, ..., n. The daily return is
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
9
simply given as rt = Σnj=1rt,j . Andersen et al. (2001), and Barndorff-Nielsen and Shephard (2002a,
b) apply the following estimator called realized volatility. It is defined as
RVt =n∑j=1
r2t,j
p→ QVt.
Therefore, RVt is the relevant quantity to focus on with regards to modeling and forecasting of volatil-
ity. Barndorff-Nielsen and Shephard (2004) also show how the continuous component can be separated
from the jump component of volatility. They define realized bipower variation as
RBPt = µ−21
n∑j=2
|rt,j−1||rt,j |,
where µ1 =√
2/π. As n→∞, we have that
RBPtp→ˆ t
0σ2 (s) ds.
The difference between RVt and RBPt is an estimate of the daily jump component. Market mi-
crostructure dynamics contaminate the price process with noise. In some instances the noise can be
time dependent and may be correlated with the efficient price. Hence, RVt can be a biased and incon-
sistent estimator of QVt. Hansen and Lunde (2006) provide a bias correction to realized volatility by
using autocovariances of intraday returns in the following way
RV qt =
n∑j=1
r2t,j + 2
q∑w=1
(1− w
1 + q
) n−w∑j=1
rt,jrt,j+w,
where q is a small non-negative integer, and we set q = 1 in this paper. Market microstructure also
contaminates bipower variation. We follow Andersen et al. (2007) and adjust bipower variation by
using staggered returns as
RBPt =π
2
n
n− 2
n∑j=3
|rt,j−2||rt,j |.
In the following, RV qt is referred to as RVt and RBPt is referred to as RBPt.
1.4. Model
An important feature of the time series of RV is the strong serial dependence, see for instance Andersen
et al. (2003), Corsi (2009), Koopman et al. (2005) and Ghysels et al. (2006). Corsi (2009) shows that
the heterogeneous autoregressive (HAR) model can capture the strong persistence in the data with a
simple linear structure. The HAR model provides a flexible way to model and forecast realized volatil-
ity. According to the framework of Corsi (2009), partial volatility is defined as the volatility generated
by a certain market component. The model is then an additive cascade of different partial volatil-
ities, see for instance Muller et al. (1997). By straightforward recursive substitution of the partial
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
10
volatilities, Corsi (2009) shows that the additive volatility cascade leads to a simple restricted linear
autoregressive model. The HAR model can approximate many of the features of realized volatility,
including long-memory. Our benchmark HAR model is given as
yt,h = β0 + βJJt−1 + βdyt−1,1 + βwyt−5,5 + βmyt−22,22 + εt,h, εt,h ∼ N(0, σ2
), (1.4.1)
where yt,h = h−1Σhi=1RVt+i−1 is the average realized volatility, h ≥ 1 periods ahead. Evidently,
yt,1 = yt. This model postulates that three factors affect yt,h: daily volatility, yt−1, weekly volatility,
yt−5,5, and monthly volatility, yt−22,22. For h = 1, we have a HAR model for the daily volatility,
for h = 5 a HAR model for the weekly volatility and so forth. Following Andersen et al. (2007),
we also include a jump term in (1.4.1) as Jt = max 0, RVt −RBPt. The HAR model with a jump
component can be cast into a standard regression form, yt = Xt−1β + εt, where yt = yt,h, εt = εt,h,
β = (β0, βJ , βd, βm)′
and Xt−1 = [1, Jt−1, yt−1,1, yt−5,5, yt−22,22]. Finally, besides (1.4.1), we can also
estimate structural breaks versions of the HAR model using the techniques from Section 1.2.
1.5. Data
The data consists of high-frequency observations of trades on the S&P 500 index using the Spyder
(SPY) fund, Boeing (BA), Bank of America (BAC), Caterpillar (CAT), General Electric (GE), IBM,
Johnson & Johnson (JNJ), JP Morgan (JPM), Pepsi (PEP), Walmart (WMT) and Exxon (XOM)
from January 2, 2004 to December 31, 2009, for a total of 1511 trading days.
The cleaning of the data is carried out using the steps in Barndorff-Nielsen et al. (2009). After
cleaning, a 5-minute grid from 9:30 to 16:00 is constructed using previous-tick method, see Hansen
and Lunde (2006). From this grid, 5-minute intraday log-returns are constructed. The log-returns
are then used to construct realized volatility and realized bipower variation. Conditioning on the first
22 observations, the final data consists of T = 1488 observations. We define yt,h as√
252RVt,h/100.
Table 1.2 presents summary statistics for yt.
Table 1.2.: Summary statistics
Series mean median std. dev. min max
BA 0.225 0.130 0.189 0.069 1.319BAC 0.303 0.365 0.152 0.048 3.646CAT 0.266 0.167 0.212 0.081 1.609GE 0.222 0.203 0.151 0.041 1.977IBM 0.181 0.116 0.148 0.045 1.309JNJ 0.138 0.084 0.118 0.035 1.266JPM 0.285 0.262 0.178 0.056 2.426PEP 0.159 0.094 0.137 0.052 1.514SPY 0.136 0.108 0.103 0.028 1.349
WMT 0.184 0.099 0.160 0.051 1.414XOM 0.206 0.125 0.176 0.057 1.996
Summary statistics, yt, January 2, 2004 to December 31, 2009.
In total 1510 observations.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
11
1.6. Results
1.6.1. Priors
For HAR, CPHAR and MHAR models, estimation and forecasting is performed conditional on the
following priors
HAR
β ∼ N (0, 100I) , σ2 ∼ IG(
4
2,0.2
2
)
CPHAR
βj ∼ N (0, 100I) , σ2j ∼ IG
(4
2,0.2
2
)pj ∼ Beta (20, 0.1) , j = 1, ...,m, pm = 1
MHAR
q2i ∼ IG
(4
2,0.2
2
), σ2 ∼ IG
(4
2,0.2
2
)πi ∼ Beta (0.5, 37) , i = d,w,m,
where IG(.2 ,
.2
)stands for the inverse Gamma density with E
[σ2j
]= E
[σ2]
= E[q2i
]= 0.10, see
Kim and Nelson (1999). In this setting, priors on the regression coefficients, βj , σ2j , σ
2 and q2i are
uninformative, while the priors on pj and πi tend to favor infrequent structural breaks. In Section 1.7,
we conduct a prior sensitivity analysis for CPHAR as well as for MHAR. Overall, we see that results
are slightly sensitive to prior settings on π for MHAR, whereas results are very robust with respect to
different hyperparameter values on P for CPHAR.
Table 1.3.: Change-point dates based on the full sample
Series # CP dates
BA 2 07-25-07 09-11-08BAC 3 02-26-07 10-30-07 09-11-08CAT 2 07-17-07 09-02-08GE 3 05-14-04 06-18-07 09-08-08IBM 4 07-18-07 03-24-08 09-05-08 12-09-08JNJ 2 12-31-07 09-05-08JPM 2 07-18-07 09-02-08PEP 2 07-18-07 09-15-08SPY 4 02-27-07 03-19-08 09-08-08 12-29-08
WMT 2 09-29-06 09-09-08XOM 4 07-19-07 02-12-08 09-08-08 12-08-08
This table reports the change-point dates for each series. The first column lists the volatility series.
The second column lists the number of the change points (CP) that are conditioned on. The third
column shows the change-point dates. The change-point dates are defined as the first observation
of a new regime, using the mode of S(i)Ni=1. Our sample period starts from February 2, 2004 to
December 31, 2009 (1488 observations).
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
12
Finally, besides these models, we also estimate a MHAR model with time-varying volatility, see Section
1.2.4. Henceforth, we refer to this model as MHAR-SV. For this model, we choose the same prior hy-
perparameter values as the MHAR model for q2i and πi. With regards to the additional parameters, we
let πSV ∼ Beta (0.5, 42) and σ2η ∼ IG (4/2, 10/2). We also experiment with different hyperparameter
values on πSV and σ2η. However, we do not find any significant changes worth mentioning.
1.6.2. Full sample estimation
We estimate daily CPHAR, MHAR and MHAR-SV models using the volatility series listed in Table
1.2. Thus, we obtain a better understanding with regards to nature and dates of possible breaks.
For each series, we estimate the CPHAR model from 0 to 7 change points. We relax the assumption
of homoskedastic errors of Section 1.2.1 and thus incorporate structural breaks in σ2 as well as in β,
see Section 1.2.4. We then choose the change-point specification with the highest marginal likelihood
value. With regards to MHAR specifications, we find that βdt, βwt and βmt are able to capture
structural changes in yt as β0t and βJt estimates are basically constant through time. Therefore, we
choose to estimate the intercept and the jump component as being constant coefficients. Hence, the
MHAR (MHAR-SV) model is respecified as
yt,h = β0 + βJJt−1 + βdtyt−1,1 + βwtyt−5,5 + βmtyt−22,22 + εt,h,
where yt,h =√
252RVt,h/100. Estimating β0 and βJ within a Gibbs sampling scheme is straightforward
as we can sample these parameters from their Gaussian conditional posterior using sampled draws of
the regression parameters and the conditional variance. In Table 1.3 we report structural break dates
found in daily CPHAR models using the complete sample. In our calculations break dates are defined
as the first observation of the new regime, using the mode of the posterior draws of S, S(i)Ni=1. For
most of the series, we find evidence in favor 2 change points. On the other hand, we find evidence in
favor of 4 change points for IBM, SPY and XOM. Overall, there is clear evidence that our realized
volatility series are subject to structural breaks. Accordingly, all of our series have at least one
structural break when modeled using the CP specification. Furthermore, for all series, we do not find
any posterior uncertainty regarding the number of change points.
The next question is how large are the parameter changes when breaks occur and which parame-
ters are mostly affected. Table 1.4 reports posterior mean and standard deviation estimates of the
parameters for the CPHAR equations for BAC, IBM and SPY. We get similar results for the other
series. Focusing on these series, we observe that the more sensitive parameters are βd, βw and σ2. As
expected, σ2 increases during the financial crises of 2008. For IBM and SPY, the posterior estimate
of σ2 decreases substantially in the last regime which starts from December 2008 till the end of the
sample. In general, changes in the regression parameters, β, are less spectacular than changes in σ2.
For IBM and SPY, βd and βw change, while β0 and βJ basically remain constant across regimes. We
picture the change-point dates for BAC, IBM and SPY in Figure 1.1.
We do not obtain exact change-point dates or point estimates of βdt, βwt and βmt for the MHAR
specifications. Instead, we plot the structural break probabilities, κdt, κwt, κmt, along with estimates
of βdt, βwt and βmt, for t = 1, ..., T , using the mean of the Gibbs sampler draws. Results for BAC,
IBM and SPY are given in Figures 1.2 to 1.7. Overall, looking at the break probabilities and estimates
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
13
of the regression coefficients clearly confirm structural breaks occurring during the fall of 2008 and
the beginning of 2009. The behavior of K for BAC and JPM, which are the only two bank series is
very similar. For the remaining series, the behavior of K is very similar to IBM and SPY. On the
other hand, the magnitude and the direction of the changes in β tend to differ for different assets.
Therefore, we cannot confirm any general trends in the changes in β for the series that we consider.
When we allow for structural breaks in the volatility of realized volatility, the behavior of K shows
that the regression coefficients tend to behave more smoothly with more frequent and smaller changes
in the beginning of the sample for BAC and JPM. Furthermore, we clearly confirm an almost linear
increase in the level of σ2 since the beginning of 2007 for all of the series. This increase is then followed
by a subsequent gradual fall from the beginning of 2009 till the end of the sample.
Table 1.4.: Parameter estimates, CPHAR model
Posterior mean (standard deviation)Regime β0 βJ βd βw βm σ2
BAC1 0.234 -0.190 0.395 0.198 0.084 0.009
(0.011) (0.137) (0.046) (0.070) (0.081) (0.001)2 0.273 -0.467 0.492 0.297 0.057 0.038
(0.012) (0.210) (0.084) (0.130) (0.120) (0.004)3 0.312 -0.382 0.652 0.265 -0.030 0.174
(0.011) (0.155) (0.075) (0.101) (0.078) (0.017)4 0.315 -0.200 0.433 0.321 0.189 1.032
(0.026) (0.187) (0.066) (0.101) (0.087) (0.079)
IBM1 0.190 -0.313 0.208 0.498 -0.063 0.011
(0.006) (0.112) (0.045) (0.069) (0.079) (0.001)2 0.230 -0.606 0.438 0.195 0.131 0.055
(0.006) (0.147) (0.089) (0.157) (0.156) (0.006)3 0.210 -0.029 0.319 0.386 -0.020 0.018
(0.006) (0.244) (0.115) (0.198) (0.173) (0.002)4 0.3469 -0.263 0.224 0.418 0.008 0.339
(0.056) (0.328) (0.139) (0.242) (0.189) (0.062)5 0.217 -0.320 0.349 0.489 0.081 0.021
(0.003) (0.181) (0.079) (0.113) (0.078) (0.001)
SPY1 0.137 -0.229 0.231 0.480 0.049 0.004
(0.003) (0.135) (0.069) (0.107) (0.074) (0.001)2 0.155 -0.298 0.553 0.219 0.062 0.025
(0.004) (0.215) (0.083) (0.123) (0.132) (0.004)3 0.149 -0.001 0.497 0.309 -0.103 0.007
(0.003) (0.223) (0.107) (0.144) (0.104) (0.001)4 0.260 -0.172 0.279 0.415 -0.045 0.309
(0.053) (0.334) (0.132) (0.215) (0.180) (0.055)5 0.151 -0.197 0.340 0.484 0.152 0.015
(0.003) (0.151) (0.081) (0.098) (0.094) (0.001)
This table reports posterior means and standard deviations (indicated inside the
parentheses) of model parameters from the preferred CPHAR model. Our sample
period starts from February 2, 2004 to December 31, 2009.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
14
Figure 1.1.: Change-point dates, CPHAR model
04 05 06 07 08 09 100
2
4BAC
04 05 06 07 08 09 100
0.5
1Change−point dates
04 05 06 07 08 09 100
0.7
1.4IBM
04 05 06 07 08 09 100
0.5
1Change−point dates
04 05 06 07 08 09 100
0.7
1.4SPY
04 05 06 07 08 09 100
0.5
1Change−point dates
Left: annual realized volatility. Right: change-point dates.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
15
Figure 1.2.: Posterior estimates, MHAR model, BAC
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 100.34
0.44
0.54
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.14
0.22
0.3
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 100.15
0.25
0.35
βmt
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
16
Figure 1.3.: Posterior estimates, MHAR model, IBM
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 100.25
0.3
0.35
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.42
0.49
0.56
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 100.08
0.13
0.18
βmt
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
17
Figure 1.4.: Posterior estimates, MHAR model, SPY
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 10−0.1
0.1
0.3
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.1
0.4
0.7
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 10−0.1
0.1
0.3
βmt
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
18
Figure 1.5.: Posterior estimates, MHAR-SV model, BAC
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 100.35
0.45
0.55
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.15
0.22
0.29
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 100.1
0.19
0.28
βmt
04 05 06 07 08 09 100
0.5
1
κtSV
04 05 06 07 08 09 100
0.4
0.8
σ2t
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2
t .
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
19
Figure 1.6.: Posterior estimates, MHAR-SV model, IBM
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 100.22
0.27
0.32
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.43
0.48
0.53
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 100.1
0.14
0.18
βmt
04 05 06 07 08 09 100
0.5
1
κtSV
04 05 06 07 08 09 100
0.04
0.08
σ2t
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2
t .
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
20
Figure 1.7.: Posterior estimates, MHAR-SV model, SPY
04 05 06 07 08 09 100
0.5
1
κdt
04 05 06 07 08 09 100.25
0.3
0.35
βdt
04 05 06 07 08 09 100
0.5
1
κwt
04 05 06 07 08 09 100.42
0.47
0.52
βwt
04 05 06 07 08 09 100
0.5
1
κmt
04 05 06 07 08 09 100.08
0.13
0.18
βmt
04 05 06 07 08 09 100
0.5
1
κtSV
04 05 06 07 08 09 100
0.04
0.08
σ2t
Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2
t .
1.6.3. Forecasts
In this section, we briefly explain how we forecast with the different structural break models. There-
after, we present results for forecasting daily (h = 1), weekly (h = 5) and biweekly (h = 10) realized
volatility using the direct method of forecasting, see Marcellino et al. (2005), and Liu and Maheu
(2009)5. In general, we carry out a forecasting exercise for a specific out-of-sample period. This means
that we first estimate the models with an initial sample and forecast. We then add one data point,
estimate and forecast again, until we have consumed all the out-of-sample data. The following is a list
of the forecasting models used in this paper along with their acronyms
5Liu and Maheu (2009) take a very similar approach to forecasting realized volatility within a Bayesian model averagingcontext. Furthermore, they justify using the direct method of forecasting both with regards to forecasts based onpredictive likelihood as well as forecasts based on the predictive mean.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
21
1. M1: HAR: constant parameter HAR model.
2. M2: CPHAR: structural break HAR model using the specification of Chib (1998).
3. M3: MHAR: mixture innovation HAR model.
4. M4: MHAR-SV: MHAR model with structural breaks in the volatility of realized volatility.
5. M5: TVPHAR: random walk time-varying parameter HAR model. This specification is a stan-
dard time-varying parameter model where βit = βit−1 + ηit, for i = d,w,m. As before, we
estimate the intercept and the jump component as being constant coefficients. This specification
assumes typically small and gradual breaks in βt. Finally, notice that TVPHAR is a restricted
version of MHAR when κdt = κwt = κmt = 1, for t = 1, ..., T .
Tables 1.5 to 1.7 present results of our forecasting exercise for h = 1, h = 5 and h = 10, respectively.
For each series listed in Table 1.2, we choose the out-of-sample period from February 2, 2008 to
December 31, 2009, for a total of 483 observations. We first estimate the models using the initial
sample and forecast. Then, we add one data point, update and forecast again, until the end of the
out-of-sample data. This strategy works for HAR, MHAR, MHAR-SV and TVPHAR specifications
as we do not need to specify the number of structural breaks over the out-of-sample data.
In the context of forecasting with CPHAR, we follow Bauwens et al. (2011). Thus, we perform the
following: for the first out-of-sample observation at time t, we calculate the marginal likelihood for
various number of change points, (0, ...,K∗), where K∗ ∈ 1, 2, ... using Yt−1. Thereafter, we choose
the optimal change-point number, K∗t−1, using the specification with the highest ML. We calculate the
predictive likelihood, p (yt | Yt−1,M2), and the predictive mean, E [yt | Yt−1,M2], using the parameters
associated with specification K∗t−1. Thereafter, we add one data point, calculate marginal likelihoods
for(0, ...,K∗t−1 + 1
)change points, choose the optimal change-point number, K∗t , repeat the above
forecasting procedure to obtain p (yt+1 | Yt,M2) and E [yt+1 | Yt,M2]. Thus, we allow the optimal
change-point number to vary over time as the number of regimes can increase as time goes by.
We report the logarithm of PBF for M2, ...,M5 versus M1 and the ratio of RMSE for M2, ...,M5
over M1 for the out-of-sample data. Overall, we see that structural break specifications outperform
the HAR model both in terms of the predictive likelihood and point forecasts, especially as forecast
horizon lengthens. For instance, for SPY, the log (PBF) in favor of MHAR over HAR is 50.54 for
h = 1, 60.64 for h = 5 and 72.54 for h = 10 (64.58 for h = 1, 62.84 for h = 5 and 74.50 for h = 10 for
CPHAR). With regards to point forecasts, we find that MHAR and CPHAR on average outperform
the HAR model by 5% to 10% for h = 1, 10% to 25% for h = 5 and h = 10. The TVPHAR model
outperforms the HAR model in terms of the predictive likelihood, regardless of forecast horizon. On
the other hand, TVPHAR performs slightly worse than HAR in terms of point forecasts for h = 1.
We also compare results between models that allow for structural breaks. Here, we get very inter-
esting results. For example, when we compare CPHAR with MHAR, we find that for some series and
forecast horizons the CPHAR model performs better, while for other series the MHAR model per-
forms better. For instance, we find that MHAR outperforms CPHAR, regardless of forecast horizon
or criterion for BAC. Compared to CPHAR, the log (PBF) in favor of MHAR is 5.29 for h = 1, 15.08
for h = 5, 29.61 and h = 10. In terms of point forecasts, compared to CPHAR, we see a reduction
of 20% for h = 1, 24% for h = 5 and 26% for h = 10 in RMSE when we use the MHAR model. On
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
22
the other hand, we see that CPHAR outperforms MHAR by 8.14 for h = 1, 1.87 for h = 5 and 4.98
for h = 10 in terms of log (PBF) for XOM. Furthermore, these models tend to perform better than
TVPHAR in terms of the predictive likelihood, regardless of forecast horizon. However, TVPHAR
performs relatively better, especially in terms of point forecasts for h = 5 and h = 10. The difference
between the predictive likelihood and the predictive mean is that the predictive likelihood criterion
takes into account the whole shape of the predictive density, whereas the predictive mean does not.
Finally, the MHAR-SV model tends to dominate its homoskedastic counterpart as well as the
CPHAR model. Density forecasts show the most improvements, while point forecasts often show
only modest gains over MHAR or CPHAR. For example, MHAR-SV improves upon MHAR with
increases in the log(PL) of 7.17 for h = 1, 1.07 for h = 5 and 4.01 for h = 10 for SPY.
Table 1.5.: Out-of-Sample forecast results, RVt,h, h = 1
log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR
BA 20.09 17.52 19.23 17.51 0.97 0.98 0.97 1.03BAC 90.14 95.44 101.27 92.62 0.97 0.78 0.77 0.83CAT 26.36 22.76 26.23 17.90 0.96 1.00 0.98 1.01GE 81.62 75.51 83.02 74.00 0.96 0.89 0.88 0.91IBM 36.77 26.88 32.38 24.63 0.94 0.97 0.96 1.01JNJ 28.68 36.35 41.68 35.83 0.95 0.86 0.89 1.01JPM 56.06 62.00 65.36 56.68 0.97 0.86 0.85 0.90PEP 42.01 47.43 52.68 49.19 0.96 0.88 0.87 0.94SPY 64.58 50.54 57.72 47.93 0.93 0.91 0.90 1.00
WMT 19.12 27.06 35.62 27.31 0.95 0.86 0.86 0.95XOM 47.61 39.47 45.67 37.73 0.93 0.89 0.88 0.99
This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the
out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.
The out-of-sample period is from February 2, 2008 to December 31, 2009.
Table 1.6.: Out-of-Sample forecast results, RVt,h, h = 5
log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR
BA 42.80 43.25 41.97 38.92 0.86 0.88 0.85 0.89BAC 110.81 125.89 126.28 110.12 0.91 0.69 0.67 0.81CAT 47.09 50.31 48.67 46.19 0.85 0.87 0.86 0.87GE 114.89 117.22 114.02 105.46 0.81 0.80 0.77 0.96IBM 43.69 44.08 49.02 38.29 0.89 0.92 0.91 0.85JNJ 58.30 53.73 64.30 50.34 0.87 0.89 0.84 1.02JPM 91.58 96.45 99.46 86.80 0.79 0.75 0.71 0.91PEP 65.13 63.61 70.49 58.66 0.90 0.85 0.84 1.04SPY 62.84 60.64 61.71 59.08 0.88 0.93 0.87 0.93
WMT 49.99 47.73 50.56 43.12 0.87 0.84 0.82 1.07XOM 55.79 53.92 59.26 49.13 0.90 0.85 0.80 0.89
This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the
out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.
The out-of-sample period is from February 2, 2008 to December 31, 2009.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
23
Table 1.7.: Out-of-Sample forecast results, RVt,h, h = 10
log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR
BA 43.70 55.75 55.53 46.81 0.80 0.80 0.81 0.75BAC 120.79 150.40 144.17 127.49 0.85 0.63 0.64 0.72CAT 64.62 72.21 64.59 67.34 0.77 0.79 0.77 0.78GE 128.75 135.01 124.37 115.61 0.71 0.70 0.72 0.85IBM 55.58 57.53 57.30 54.22 0.83 0.85 0.82 0.73JNJ 66.98 68.59 79.90 62.66 0.82 0.82 0.79 1.10JPM 96.42 107.35 107.19 84.41 0.72 0.68 0.68 0.80PEP 72.52 72.06 78.91 65.56 0.86 0.83 0.80 1.09SPY 74.50 72.54 76.54 68.31 0.81 0.86 0.84 0.79
WMT 56.41 58.51 64.77 51.20 0.88 0.82 0.81 0.97XOM 68.19 63.20 69.12 60.65 0.86 0.77 0.72 0.80
This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the
out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.
The out-of-sample period is from February 2, 2008 to December 31, 2009.
1.7. Prior Sensitivity Analysis
1.7.1. CPHAR
In this section, sensitivity of the results to prior specification is evaluated by investigating alternative
prior hyperparameter values on the transition probabilities, pj ∼ Beta (a0, b0), j = 1, ...,m−1, keeping
the prior values of the other parameters the same as in Section 1.6.1. This parameter is very important
because it controls the duration of each regime. Models with different hyperparameter values on βj
and σ2j are also estimated. Results are very similar to those given in Tables 1.5 to 1.7.
We repeat the forecasting exercise of Section 1.6.3 using the SPY data, while experimenting with
different values of a0 and b0. Results are reported in Table 1.8. For instance, the first alternative
prior is pj ∼ Beta (0.5, 0.5), which is relatively flat. The last alternative prior is pj ∼ Beta (100, 0.1),
which is relatively very tight. For instance, for pj ∼ Beta (100, 0.1), we assume a priori that the
expected duration of each regime is about 1000 days before we see the data. As before, results over-
whelmingly suggest existence of structural breaks over the out-of-sample data as log (PBFM2,M1) > 0.
Furthermore, the choice of prior hyperparameter values on P is of limited importance.
Table 1.8.: Prior sensitivity analysis, CPHAR model, SPY
log(PBF) RMSEh = 1 h = 5 h = 10 h = 1 h = 5 h = 10
pj ∼ Beta (0.1, 0.1) 64.28 62.71 72.60 0.93 0.88 0.81pj ∼ Beta (8, 0.1) 64.13 62.72 73.27 0.94 0.88 0.82pj ∼ Beta (10, 0.1) 64.06 65.00 72.60 0.94 0.88 0.81pj ∼ Beta (8, 2) 63.70 63.89 72.61 0.94 0.88 0.81pj ∼ Beta (20, 2) 64.57 65.10 72.60 0.93 0.88 0.81pj ∼ Beta (100, 0.1) 64.56 65.16 74.70 0.93 0.88 0.81
This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the
out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.
The results are for the CHAR model considering six different prior hyperparameter values on P . The out-of-sample
period is from February 2, 2008 to December 31, 2009.
1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory
24
1.7.2. MHAR
We consider sensitivity of the results to prior hyperparameter values on πi, where πi ∼ Beta (ai0, bi0),
for i = d,w,m. The forecasting exercise of Section 1.6.3 is repeated using the SPY data. We experi-
ment with different prior hyperparameter values on πi, keeping the other hyperparameter values the
same as in Section 1.6.1. Results are presented in Table 1.9. It can be seen that these different prior
hyperparameter values are overall yielding fairly similar results. However, for πi ∼ Beta (2, 8), we get
better results in terms of log (PBF) and RMSE for h = 10.
Table 1.9.: Prior sensitivity analysis, MHAR model, SPY
log(PBF) RMSEh = 1 h = 5 h = 10 h = 1 h = 5 h = 10
πi ∼ Beta (0.5, 27) 50.58 60.87 73.80 0.88 0.93 0.86πi ∼ Beta (0.5, 47) 50.54 61.24 73.22 0.89 0.93 0.86πi ∼ Beta (0.5, 8) 50.04 61.34 73.96 0.90 0.93 0.86πi ∼ Beta (2, 8) 49.12 61.79 71.69 0.90 0.92 0.82πi ∼ Beta (0.5, 100) 50.45 60.23 73.35 0.91 0.93 0.87πi ∼ Beta (0.5, 1000) 50.54 60.96 72.12 0.92 0.93 0.87
This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the
out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.
The results are for the MHAR model considering different prior hyperparameter values on π. The out-of-sample period
is from February 2, 2008 to December 31, 2009.
1.8. Conclusion
In this paper we compare different forecasting procedures which allow for structural breaks in the
model parameters using realized volatility data. Our set of forecasting models is divided into three
groups: constant parameter model, HAR, one which formally specifies the number of structural breaks,
CPHAR, and those which determine the nature of structural changes in the parameters using MIA
and random walk specifications.
The empirical application provides some interesting results. First, we add to the literature estab-
lishing existence of structural breaks in realized volatility data. Second, our results also show the
importance of using a forecasting method that allows for some sort of structural changes in the model
parameters. Furthermore, perhaps as expected, we cannot establish that there is one single forecasting
method that can be recommended universally. On the contrary, we find that for some series the MHAR
model outperforms the CPHAR model, whereas for other series the CPHAR model works better.
Finally, when we account for structural breaks in the volatility of realized volatility in the MHAR
model, we find that this specification tends to dominate its homoskedastic counterpart as well as
the CPHAR model. Density forecasts show the most improvements, while point forecasts show only
modest gains compared to competing specifications.
2. Long Memory and Structural Breaks in
Realized Volatility: An Irreversible Markov
Switching Approach
Author: Nima Nonejad
Abstract: We propose an ARFIMA (autoregressive fractionally integrated moving average) model
that is able to capture long memory and incorporate structural breaks in the model parameters. We
model structural breaks through irreversible Markov switching or so-called change-point dynamics.
Monte Carlo simulations demonstrate that our approach is effective in estimating the model param-
eters, identifying and dating structural breaks. Applied to daily S&P 500 data, we find evidence of
four structural breaks. The evidence of structural breaks is robust to different specifications including
a GARCH specification for the conditional volatility of realized volatility.
Keywords: change points, Gibbs sampling, long memory, structural breaks
(JEL: C11, C22, C52, G10)
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
26
2.1. Introduction
Measuring and modeling volatility is a very important issue in many pricing and risk management
problems. Recently, a new observable measure of volatility, called realized volatility (RV) has been
proposed. Realized volatility uses high-frequency data information and has been shown to be an
accurate estimate of expost volatility. RV is constructed from the sum of intraday squared returns,
and converges to quadratic variation for a broad class of continuous time models. Many empirical
features of RV are very well-documented in recent literature. For instance, a detailed review has been
provided by McAleer and Medeiros (2008). One of the most well-known and relevant properties of
RV is the strong serial dependence, see Andersen et al. (2001) and Andersen et al. (2007). For
this reason, long-memory models such as the ARFIMA (autoregressive fractionally integrated moving
average) model have been applied to RV data. Other strategies to model serial dependencies for
RV have also been proposed. For instance, Barndorff-Nielsen and Shephard (2002a) model serial
dependencies for RV through a superposition of ARMA(1,1) processes, and Corsi (2009) has introduced
the Heterogeneous Autoregressive model (HAR) given by a combination of volatilities measured over
different time horizons.
In this paper we provide a Bayesian analysis of structural breaks in daily S&P 500 realized volatility.
We propose an ARFIMA model in which the level, persistence and volatility of realized volatility
parameters are subject to structural breaks. The basis of the analysis is on an ARFIMA model
which builds on the Hidden Markov Chain (HMC) formulation of the multiple change-point model
proposed by Chib (1998). Breaks are captured through an integer-valued state variable, st, that tracks
the regime from which a particular observation, yt, is drawn. st is modeled as a discrete first order
Markov process with a constrained transition probability matrix. At each point in time, st can either
remain in the current state or jump to the next state1.
We investigate specifications which allow for all parameters, as well as only for a subset of parameters
to change due to structural breaks. This allows one to isolate the impact of structural breaks on
individual parameters and use all data in estimation of parameters that are not affected by structural
breaks. Each change-point ARFIMA model is estimated conditional on 0, 1, ...,m breaks occurring.
For each of these specifications, the marginal likelihood (ML) and the deviance information criterion
(DIC) are calculated. They are then used to determine the number of change points. Specifically, we
can compare marginal likelihoods using Bayes factors, and use differences in DIC between different
specifications to compare models or to determine the number of structural breaks. It is important to
note that DIC can be considered as a compelling alternative to ML. Furthermore, calculation of DIC
in our MCMC scheme is trivial as the likelihood with st, t = 1, ..., T , integrated out is easily obtainable
using the algorithm of Chan (2013).
Our contributions in this paper are two-fold. First, we provide an efficient Markov chain Monte
Carlo sampling scheme to draw st, t = 1, ..., T , and the parameters within each regime, θk, k = 1, ...,m,
from their respective conditional posteriors. Furthermore, instead of using traditional approaches to
evaluate the likelihood function such as Chan and Palma (1998), we build upon previous works on
precision-based algorithms as in Chan and Jeliazkov (2009) and Chan (2013), using a direct approach
to evaluate the likelihood function. We believe that incorporating the precision-based algorithm of
1Many papers have studied testing for structural breaks, or directly modeling parameter change. See for instance,Andreou and Ghysels (2002), Ray and Tsay (2002) and Engle and Rangel (2005).
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
27
Chan and Jeliazkov (2009) and Chan (2013) along with the change-point specification of Chib (1998)
within the ARFIMA setting is the most important contribution that we provide. Furthermore, we also
conduct an extensive Monte Carlo experiment to investigate if our methods work well in identifying
data generating parameters, true structural break dates and the correct number of structural breaks.
With regards to the last point, we compare the ability of ML and DIC to detect the correct number
of structural breaks. Our simulations, based on empirically reasonable scenarios, show that ML and
DIC perform very well in identifying the true number of structural breaks. The higher the number
of parameters that are affected by a break, the more likely it is that structural breaks are correctly
identified. Finally, it becomes more difficult to identify one or more breaks when only the persistence
parameter, d, changes.
Empirical results for S&P 500 volatility provide strong evidence in favor of four structural breaks
based on data from January 2nd, 2000 to December 31st, 2009, for a total of 2515 trading days. The
effect of structural breaks is mainly confined to the conditional mean and variance with weaker evidence
that the persistence parameter is also subject to structural breaks. Finally, in order to investigate if
the existence of breaks is spurious due to neglected conditional variance dynamics, we also consider
breaks in an ARFIMA-GARCH model. Again, the evidence is strong in favor of structural breaks,
and the estimated change-point dates are close to those of the change-point ARFIMA model.
The structure of this paper is as follows. In Section 2.2 we present the change-point ARFIMA
model. Bayesian estimation techniques and model comparison methods are presented in Section 2.3.
Section 2.4 presents the Monte Carlo results. We briefly review the theory behind realized volatility
in Section 2.5 . Section 2.6 is the application to S&P 500 volatility while Section 2.7 concludes. An
appendix explains how to evaluate the likelihood using the algorithm of Chan (2013).
2.2. Change-point ARFIMA Model
Consider the following ARFIMA model
yt = µ+ (1− L)−d εt, εt ∼ N(0, σ2
)(2.2.1)
for t = 1, ..., T . yt is the actual observation, L is the lag operator such that Lεt = εt−1, and d
determines the long-memory property of yt. The fractional difference operator, (1− L)−d, in (2.2.1)
is defined as (1− L)−d = Σ∞j=0Γ (j + d) (Γ (j + 1) Γ (d))−1 Lj , where Γ (.) is the Gamma function.
Equation (2.2.1) is a generalization of the moving average (MA) model to non-integer values of d.
Specifically, if d > 0, the process is said to have long memory since the autocorrelations die out at
a hyperbolic rate. For 0 < d < 0.5, (2.2.1) is a stationary long-memory process with non-summable
autocorrelation functions. For d = 0, we have that yt = µ+ εt.
There are many ways to estimate (2.2.1), see Beran (1994) and Robinson (2003). In this paper we
focus on MCMC methods (in particular Gibbs sampling) for inference. We rely on the main idea of
Chan and Palma (1998). Specifically, Chan and Palma (1998) consider an approximation of (2.2.1)
based on a truncation lag of order M . Thereafter, the likelihood is computed using the Kalman
filter. However, instead of using the Kalman filter, we take a different approach to evaluate the
likelihood function. Our approach extends previous works on precision-based algorithms, see Chan
and Jeliazkov (2009) and Chan (2013). The aforementioned method exploits the special structure of
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
28
(2.2.1), particularly that the covariance matrix for the joint distribution of YT = (y1, ..., yT )′
is sparse,
i.e. it contains only few non-zero elements. By exploiting the sparse structure of the covariance matrix
of YT , we are able to develop an easy and fast method for evaluating the likelihood function.
Conditional on the model parameters, θ =(µ, d, σ2
)′, and M , we can write (2.2.1) as YT = u+Hε,
where u = µι, ι is a T × 1 vector of 1s, ε = (ε1, ..., εT )′∼ N (0, SY ) and SYT = σ2IT . H is a T × T
lower triangular matrix with ones on the main diagonal and
H =
1 0 0 0 0 0 · · · 0
π1 1 0 0 0 0 · · · 0
π2 π1 1 0 0 0 · · · 0...
... π1 1 0 0 · · · 0
πM πM−1... π1 1 0 · · · 0
0 πM πM−1... π1 1 · · · 0
......
. . .. . .
.... . .
. . ....
0 0 · · · πM πM−1 · · · π1 1
,
where πj = Γ (j + d) /Γ (j + 1) Γ (d). Using the algorithm of Chan (2013), it is shown in the Appendix
that p (YT | θ,M) has a closed form solution given as
log p (YT | θ,M) = −T2
log (2π)− T
2log(σ2)− 1
2(YT − u)
′Ω−1YT
(YT − u) , (2.2.2)
where ΩYT = HSYTH′and p (.) denotes the density of all random quantities. Allowing for structural
breaks in the parameters of (2.2.1) is straightforward using the change-point structure proposed by
Chib (1998). This specification uses a hidden Markov model with a restricted transition matrix to
model change points. A test for the number of breaks is then a test of the dimension of the hidden
Markov chain. θ and the unobserved states are jointly estimated conditional on a fixed number of
change points. ML and DIC are then used to compare evidence for the number of structural breaks.
In the following, we review a Gibbs sampling approach for the model in which all the elements in θ
are subject to structural breaks. Thereafter, we discuss restricted models in which some parameters
are restricted to be constant across structural breaks. Specifically, the restricted specifications allow
us to determine which parameters of the model are most likely affected by a structural break.
Assume that there are m − 1, m ∈ 1, 2, ... change points at unknown times, τ1, τ2, ..., τm−1.
Separated by those change points, there are m different phases. The density of yt depends on θk =(µk, dk, σ
2k
)′, k = 1, 2, ...m, whose value changes at the change points, τ1, τ2, ..., τm−1, and remains
constant otherwise
θt =
θ1 if t < τ1
θ2 if τ1 ≤ t < τ2
......
...
θm−1 if τm−2 ≤ t < τm−1
θm if τm−1 ≤ t
. (2.2.3)
Let S = (s1, ..., sT )′
where st = k indicates that yt is from regime k. The one-step ahead transition
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
29
probability matrix for st is given as
P =
p11 p12 0 · · · 0
0 p22 p23 · · · 0...
......
......
...... 0 pm−1,m−1 pm−1,m
0 0 · · · 0 1
, (2.2.4)
where plk = Pr (st = k | st−1 = l) with k = l or k = l + 1 is the probability of moving from regime l
at time t − 1 to regime k at time t. Equation (2.2.4) ensures that given st = k at time t, the next
period, t+ 1, st + 1 remains in the same state or jumps to the next state. For instance, given st = k,
one has st+1 = k or st+1 = k + 1 with pk,k + pk,k+1 = 1. Once the last regime is reached, one stays
there forever, that is pm,m = 1. This structure enforces the ordering (2.2.3) on the change points2.
2.3. Bayesian Estimation
To conduct model estimation, we jointly estimate the long-memory dynamics, P and S. However,
model estimation is not straightforward. First, S is not observable. Second, there is no standard way
to draw µk or dk from its conditional posterior density. Although the joint posterior density of the
model, p (P, θ,M, S | YT ), is not a well-known density, samples from it can be obtained using Gibbs
sampling and Metropolis-Hastings (M-H).
The parameters are divided into four blocks: S, θ = θkmk=1, M and P . The Gibbs sampler requires
the following steps: first, choose starting values for P , θ and M , i.e. P (0), θ(0),M (0) and set i = 1.
Then iterate from
1. S(i) | P (i−1), θ(i−1),M (i−1), YT .
2. θ(i)k
mk=1 |M (i−1), S(i), YT .
3. M (i) | θ(i), S(i), YT .
4. P (i) | S(i).
5. Set i = i+ 1 and goto the first step.
Notice that in step 2 of iteration i of the Gibbs sampler, each element of θk is updated one-at-a-time.
After dropping a set of burn-in samples, the remaining draws are collected for inference. For N large
enough, any function of interest can be consistently estimated. For instance,
f (θ) =1
N
N∑i=1
f(θ(i))
is a consistent estimate of E [f (θ) | YT ], the posterior mean of f (θ). We run the chain from different
starting values and compute convergence diagnostics such as Geweke (1992) to ensure that the draws
2Ray and Tsay (2002) exploit the state-space representation of (2.2.1) to derive inference for long-memory processeswith random level shifts. Raggi and Bordignon (2010) consider a two-state Markov switching ARFIMA model, i.e.m = 2, p21 6= 0, 0 < p22 < 1, and use a permutation scheme to identify the two states.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
30
have converged to p (P, θ,M, S | YT ). Below, more details are provided on each step of the Gibbs
sampling procedure.
Step 1: Simulation of S | P, θ,M, YT . Chib (1998) shows that a joint draw of S can be achieved
using
p (S | P, θ,M, YT ) = p (sT | P, θ,M, YT )T−1∏t=1
p (st | st+1, P, θ,M, Yt) (2.3.1)
in which one samples sequentially from each density on the right-hand side of (2.3.1) beginning with
p (sT | P, θ,M, YT ), and then p (st | st+1, P, θ,M, Yt), t = T − 1, ..., 1. At each step, one conditions on
the previously drawn state, st+1, until a full draw of S is obtained. The individual densities in (2.3.1)
are obtained based on the following steps:
(a) Initialization: At t = 1, set p (s1 = 1 | P, θ,M, Y1) = 1.
(b) Compute the Hamilton (1989) filter, p (st = k | P, θ,M, Yt). This involves a prediction and an
update step in which one iterates on the following from t = 2, ..., T ,
p (st = k | P, θ,M, Yt−1) =
k∑l=k−1
p (st−1 = l | P, θ,M, Yt−1) plk, k = 1, ...,m (2.3.2)
p (st = k | P, θ,M, Yt) =p (st = k | P, θ,M, Yt−1) p (yt | θ,M, Yt−1, st = k)∑ml=1 p (st = l | P, θ,M, Yt−1) p (yt | θ,M, Yt−1, st = l)
,
k = 1, ...,m.
The last equation is obtained from Bayes’ rule. Note that in (2.3.2) the summation is only from k− 1
to k, due to the restricted nature of the transition matrix. Furthermore, p (yt | θ,M, Yt−1, st = k) has
a closed form solution, see (2.2.2).
(c) Finally, Chib (1998) shows that the individual densities in (2.3.1) are
p (st | st+1, P, θ,M, Yt) ∝ p (st | P, θ,M, Yt) p (st+1 | st, P ) .
Thus, given sT = m, st is drawn backwards over t = T − 1, T − 2, ..., 2 as
st | st+1, P, θ,M, Yt =
st+1 with probability ct
st+1 − 1 with probability 1− ct,
where
ct =p (st = k | P, θ,M, Yt) p (st+1 = k | st = k, P )∑kl=k−1 p (st = l | P, θ,M, Yt) p (st+1 = k | st = l, P )
.
Finally, note that p (s1 = 1 | s2, P, θ,M, Y1) = 1.
Step 2: Simulation of θk |M,S, YT . For each regime, the conditional posterior of θk depends only on
information in regime k. Furthermore, compared to σ2k, sampling µk and dk is more complicated since
their conditional posteriors do not have closed form. Therefore, the Metropolis-Hastings algorithm is
used. Let Yk = yt : st = k denote the observations in regime k. We sample µk and dk, k = 1, ...,m
one-at-a-time. For example, µk is sampled in the following way:
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
31
1. Sample a candidate, µ∗k, from a random walk proposal, q(µ∗k | µ
(i−1)k
)∼ N
(µ
(i−1)k ,Σk
), where
Σk is chosen by the researcher in a manner to ensure a sufficient acceptance rate. We follow
Koop (2003, page 98) and adjust Σk to get an acceptance rate roughly around 30 to 40%. We
do this by experimenting with different values of Σk, until we find one which yields a reasonable
acceptance rate probability.
2. Define the acceptance probability of µ∗k as
aMH
(µ∗k, µ
(i−1)k
)= min
1,p(µ∗k | d
(i−1)k , σ
2(i−1)k ,M (i−1), Y
(i)k
)q(µ
(i−1)k | µ∗k
)p(µ
(i−1)k | d(i−1)
k , σ2(i−1)k ,M (i−1), Y
(i)k
)q(µ∗k | µ
(i−1)k
) . (2.3.3)
3. Draw u from U ∼ (0, 1). If u ≤ aMH
(µ∗k, µ
(i−1)k
), set µ
(i)k = µ∗k, else set µ
(i)k = µ
(i−1)k .
Finally, σ2k | µk, dk,M, Yk ∼ IG (νk/2, lk/2), where IG (./2, ./2) stands for the inverse Gamma density,
see Kim and Nelson (1999), vk = Tk + v0, lk = ε′kεk + l0, Tk is the number of observations in regime k
and εk = εt : st = k. v0 and l0 are the prior hyperparameter values.
Step 3: Simulation of M | θ, S, YT . In order to sample M from its conditional posterior, we use
the same method as in Raggi and Bordignon (2012). A truncation parameter, M∗, is proposed from a
discretized Laplace proposal, q(M∗ |M (i−1)
)= 1/2λ exp
(−λ∣∣M∗ −M (i−1)
∣∣), where λ = 0.1 in order
to obtain small moves. The Metropolis-Hastings acceptance probability is given as
aMH
(M∗,M (i−1)
)= min
1,
p(M∗ | θ(i), S(i), YT
)q(M (i−1) |M∗
)p(M (i−1) | θ(i), S(i), YT
)q(M∗ |M (i−1)
) .Step 4: Simulation of P | S. Assume that pkk ∼ Beta (a0, b0). The conditional posterior for each
diagonal component of P is then pkk | S ∼ Beta (a0 + nkk, b0 + 1), k = 1, ...,m − 1, where nkk is the
number of one-step transitions from state k to state k in a sequence of S.
2.3.1. Breaks in µ and d
Suppose that only µ and d are subject to structural breaks. Thus, we have µk and dk for k =
1, ...,m, while the conditional variance, σ2, is modeled as non time-varying parameter. Modeling this
specification is straightforward because as before, we can use the conditioning properties of the Gibbs
sampler.
Specifically, in order to sample µ(i)k and d
(i)k for k = 1, ...,m, we use S(i), σ2(i−1), M (i−1), Y
(i)k =
yt : s(i)t = k
and perform M-H to obtain µ
(i)k and d
(i)k using (2.3.3). Conditional on S(i), µ
(i)1 , ..., µ
(i)m ,
d(i)1 , ..., d
(i)m , M (i−1) and YT , we then draw σ2(i) from the inverse Gamma density. The remaining
parameters, M (i) and P (i) are also sampled conditional on S(i), µ(i)k , d
(i)k , k = 1, ...,m, σ2(i), YT using
Step 3 and Step 4 from the previous section.
2.3.2. Only breaks in σ2
Now suppose only σ2 changes between regimes, while µ and d are constant. In this case, we draw σ2(i)k
from the inverse Gamma density, IG (νk/2, lk/2) for k = 1, ...m, using S(i) from Step 1, µ(i−1), d(i−1),
M (i−1) and YT . Thereafter, we stack the σ2(i)k s using st = k, and construct a vector of time-varying
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
32
conditional variances, σ2(i)t , t = 1, ..., T . To complete the cycle for θ(i), we sample µ(i) and then d(i)
conditional on σ2(i)1 , ..., σ
2(i)T ,M (i−1), YT .
Finally, note that we can consider breaks in the conditional variance with only partial breaks in µ
or d by combining the methods in the last two subsections.
2.3.3. Bayes factors and marginal likelihood computation
Let M denote a model parametrization in which some or all parameters are subject to breaks. The
marginal likelihood (ML) of model M is defined as
p (YT | M) =
ˆp (YT | P, θ,M,M) p (P, θ,M | M) dPdθdM. (2.3.4)
The marginal likelihood is a measure of the success the model has in accounting for the data after
parameter uncertainty has been integrated out over the prior, p (P, θ,M | M). p (YT | P, θ,M,M) is
the likelihood function with S integrated out. It is calculated as
log p (YT | P, θ,M,M) =
T∑t=1
log p (yt | P, θ,M, Yt−1,M) , (2.3.5)
where
p (yt | P, θ,M, Yt−1,M) =
m∑k=1
p (yt | θ,M, Yt−1, st = k,M) (2.3.6)
p (st = k | P, θ,M, Yt−1,M) .
The last term on the right-hand side of (2.3.6) is computed from (2.3.2). In the following steps, the
model index,M, is suppressed for conciseness. As we shall see in the next sections, (2.3.5) is essential
with regards to marginal likelihood (ML) and DIC computation.
In order to compute ML, we rely on the method of Gelfand and Dey (1994), henceforth, G-D, see
Geweke (2005), Liu and Maheu (2008)3. In general, the G-D method is based on the following quantity
1
N
N∑i=1
g(θ(i))
p(YT | θ(i),M (i)
)p(θ(i),M (i)
) → p (YT )−1 as N →∞. (2.3.7)
It applies to any posterior simulator, no matter what algorithm is used. The prior, p(θ(i),M (i)
),
can be evaluated directly and p(YT | θ(i),M (i)
)is calculated by substituting θ(i) into the likelihood
function, (2.3.5) and (2.3.6). Gelfand and Dey (1994) show that if g(θ(i))
is thin-tailed relative to
p(YT | θ(i),M (i)
)p(θ(i),M (i)
), then (2.3.7) is bounded and the estimator is consistent. Following
Geweke (2005), the truncated Normal distribution, TN (θ∗,Σ∗), is used for g (θ), where θ∗ and Σ∗ are
the posterior sample moments calculated as θ∗ = N−1ΣNi=1θ
(i) and Σ∗ = N−1ΣNi=1
(θ(i) − θ∗
) (θ(i) − θ∗
)′whenever θ(i) is in the domain of the truncated Normal. The domain, Θ, is defined as
Θ =
θ :(θ(i) − θ∗
)′(Σ∗)−1
(θ(i) − θ∗
)≤ χ2
α (z)
,
3Ideally, we would prefer to compute ML using the method of Chib (1995). However, calculating ML using the methodis Chib (1995) is computationally very demanding as we sample, µk, dk, k = 1, ...,m using M-H.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
33
where z is the dimension of the parameter vector and χ2α (z) is the αth percentile of the Chi-squared
distribution with z degrees of freedom. In practice, 0.75, 0.95 and 0.99 are popular selections for α.
High values of α work best since then more draws will be included in estimating (2.3.7).
We estimate (2.3.7) using the aforementioned values of α. In general, we find that different values of
α lead overwhelmingly to very similar results, see Section 2.4.2. It was also suggested to compute the
marginal likelihood using the method of Sims et al. (2008). As pointed out in Sims et al. (2008), the
G-D method may not work for models with time-varying parameters as the posterior density tends to
be non-Gaussian. We also calculate ML for the empirical part using the method of Sims et al. (2008).
However, we do not find any significant qualitative changes compared to G-D. Therefore, we choose
to retain these values. Furthermore, Monte Carlo results clearly indicate that G-D correctly identifies
the true model in the presence of structural breaks.
Once we calculate the marginal likelihood for different specifications, we can compare models across
the number of regimes, as well as the type of breaks (restricted versus unrestricted) using Bayes factors.
The Bayes factor (BF) for modelMA versus modelMB is BFMAB= p (YT | MA) /p (YT | MB). BF
is the factor by which the data considers MA more probable than MB. Kass and Raftery (1995)
suggest interpreting the evidence forMA as: not worth more than a bare mention for 1 ≤ BFMAB< 3;
positive for 3 ≤ BFMAB< 20; strong for 20 ≤ BFMAB
< 150; and very strong for BFMAB≥ 150.
Equivalently, based on a log scale, log (BFMAB) > 0 is evidence in favor of MA versus MB.
2.4. Monte Carlo
In this section, a set of Monte Carlo simulations is conducted to investigate the ability of the change-
point ARFIMA model to detect the correct number of change points. The effect of different sample
sizes and the ability of the deviance information criterion (DIC) to detect the correct number of change
points are also considered. Specifically, we compare the performance of DIC with ML to find out if
DIC is just as capable of identifying the true model from which the data is generated as ML. We
do this because computing DIC for the change-point ARFIMA model is almost trivial, see Section
2.4.4. However, as pointed out by Spiegelhalter et al. (2002), we must be cautions against using ML
as a basis against which to assess DIC. ML addresses how well the prior has predicted the observed
data, whereas DIC addresses how well the posterior might predict future data generated by the same
parameters that give rise to the observed data, YT .
Table 2.1.: Change-point model specifications
Model index parameters that change from a break
M0 NoneM1 µM2 dM3 σ2
M4 µ, dM5 µ, σ2
M6 All parameters
This table labels the various change-point specifications. The first column
is the model index. The second column lists the parameters that change
due to structural breaks.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
34
2.4.1. Setting
The ARFIMA model based on equation (2.2.1), in which change points affect one or more model
parameters is considered. Table 2.1 lists all the specifications used in the simulations and the empirical
application4. Specifically, M0 is the simple ARFIMA model without any structural changes, M1 to
M5 are models in which different parameters change and M6 is the model in which all parameters
change from a structural break. For each model, a sample of T = 1000 observations is generated. The
true models considered include cases of no, one and two change points. In settings with change points,
the positions of the change points follow a Uniform distribution, U . For instance, when there is one
change point, the position of this change point follows U (0.25× T, 0.75× T ). When there are two
change points, the first one follows U (0.25ÖT, 0.40ÖT ) and the second follows U (0.60ÖT, 0.80× T ).
This setting allows us to account for the randomness of the change points as well as ensuring sufficient
observations in each regime to conduct estimation.
The parameter values of the data generating process (DGP) under different scenarios are listed in
Table 2.2. For example, for M1 only µ changes, while the other parameters remain constant. In
specifications M4 to M6, the time series properties change greatly and the change points should be
identified rather easily. To make our simulation empirically realistic, we select the parameter values
of the DGP to reflect periods with increasing and decreasing levels of µ, d and σ2.
We specify the priors as: dk ∼ N (0, 100) truncated such that 0 < dk < 0.5, µk ∼ N (0, 100) and
σ2k ∼ IG (4/2, 0.02/2), k = 1, ...,m. A suitable prior for M is the Poisson truncated distribution with
M ∈ Mmin, ...,Mmax, where in this paper Mmin = 10 and Mmax = 50, see Raggi and Bordignon
(2012). Finally, we assume that pkk ∼ Beta (8, 0.1), k = 1, ...,m − 1. In this setting, priors on dk,
µk, σ2k and M are very uninformative, while the prior on pkk favors infrequent structural breaks. We
conduct a prior sensitivity analysis with regards to the S&P 500 data and report the results in Section
2.6.45.
Table 2.2.: Parameter values for Monte Carlo simulations
Regime M0 M1 M2 M3 M4 M5 M6
1 1 1 1 1 1 1 12 µ 1 2 1 1 2 2 23 1 0.8 1 1 0.8 0.8 0.8
1 0.3 0.3 0.3 0.3 0.3 0.3 0.32 d 0.3 0.3 0.45 0.3 0.45 0.3 0.453 0.3 0.3 0.05 0.3 0.05 0.3 0.05
1 0.1 0.1 0.1 0.1 0.1 0.1 0.12 σ2 0.1 0.1 0.1 0.18 0.1 0.18 0.183 0.1 0.1 0.1 0.36 0.1 0.36 0.36
This table lists the parameter values for the Monte Carlo simulations. The first column
is the index of the regimes. The second column lists the parameters. The first row is the
model index. If there is one break then the DGP parameters are first from regime 1 and
then regime 2.
4We can also include the d, σ2 combination in which both d and σ2 change from a break. However, in order to maintaina small number of models and avoid high computational costs, we choose not include this specification.
5Overall, result are robust to different hyperparameters on pkk.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
35
Table 2.3.: Change-point identification by marginal likelihood
DGP frequency by ML# of CP 0 CP 1 CP 2 CP 3 CP
M1
µ
012
10000
0990
01
100
000
M2
d
012
8710
0860
0595
1385
M3
σ2
012
10000
0100
1
0099
000
M4
µd
012
10000
0970
0398
002
M5
µσ2
012
10010
0992
0098
000
M6
All parameters
012
10000
0100
0
00
100
000
The first column lists the true model along with the parameters that change due
to a structural break. CP, change points; ML, marginal likelihood. The “0 CP”
displays the number of times in the repetitions when the specification with no
change point has the highest ML, ect. Each row sums to 100. In this table,
we set α = 0.99.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
36
Table 2.4.: Model comparison using ML
Frequency by ML for different values of α
# of CP measure 0 CP 1 CP 2 CP 3 CP
DGP, M1
0α = 0.75α = 0.95α = 0.99
100100100
000
000
000
1α = 0.75α = 0.95α = 0.99
000
999999
111
000
2α = 0.75α = 0.95α = 0.99
000
000
100100100
000
DGP, M5
0α = 0.75α = 0.95α = 0.99
100100100
000
000
000
1α = 0.75α = 0.95α = 0.99
111
999999
000
000
2α = 0.75α = 0.95α = 0.99
000
222
989898
000
DGP, M6
0α = 0.75α = 0.95α = 0.99
100100100
000
000
000
1α = 0.75α = 0.95α = 0.99
000
100100100
000
000
2α = 0.75α = 0.95α = 0.99
000
000
100100100
000
The evidence for the number of change points is determined according ML.
“0 CP” column reports the number of times in the 100 repetitions when the
specification with no change point has the best performance. Similarly for
the other columns.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
37
2.4.2. Change-point identification
It is assumed that model specification Mi is known but the number or dates of the structural breaks
are not known. For each draw from the data generating process (DGP), the change-point ARFIMA
model assuming 0, 1, 2 and 3 structural breaks is estimated. Evidence for the number of break points
is then ranked according to the highest marginal likelihood (ML). Thereafter, a new data from the
DGP is generated and the procedure is repeated, until 100 repetitions are completed. The frequency
over repetitions in which each specification is best according to the marginal likelihood criterion is
then reported. Initially, we calculate ML using G-D with α = 0.99.
Table 2.3 lists the results for each specification. For example, the second row for M1 says that for
a DGP with one change point, 99 times is correctly identified as one change point in terms of ML,
while 1 is identified incorrectly as two change points. The next entry in the table repeats this for
a DGP with two change points. Here, 100 times two change points is correctly identified. Overall,
the change-point ARFIMA model works very well. When there is no change point, this is correctly
selected most of the times. Looking at these cases (first row in each panel) they are: 100/100 forM1,
87/100 for M2 and 100/100 for M3-M6. When the process contains change points, the marginal
likelihood method correctly identifies existence of the change points in most cases. For example, the
probability of correctly identifying instability of the process is 0.99 (86 + 5 + 8) /100 forM2 with one
change point, and is 1 for two change points. The correct number of change points is found most
of the times. Looking at the numbers in bold, many of them are close to 100. However, relatively
smaller numbers associated withM2 show that ML is less powerful when there are changes only in d6.
Therefore, for DGPs where d is subject to structural breaks it becomes easier to identify the breaks
when more parameters undergo a change. For example, compare M2 with the better performance of
M4 in which both µ and d change from a structural break. The best results of the table correspond
to models M3-M6.
We also investigate the ability of G-D to identify the true model for different values of α. Therefore,
besides α = 0.99, we repeat our Monte Carlo experiment with α = 0.75 and α = 0.95. Results for
M1, M5 and M6 are summarized in Table 2.4. We get identical results for these specifications and
forM0,M2,M3,M4 (not reported) as well. However, the Bayes factor in favor of the true model in
each case varies slightly across the values of α. Overall, G-D provides a very reliable method for the
identification of structural breaks using the ML criteria.
2.4.3. Parameter estimates
Given a full MCMC run, we calculate the mean, median and mode of θ(i)k , i = 1, ...N for each regime.
We then take the mean of these quantities over the number of Monte Carlo repetitions. We also
consider the posterior deviation (POSDEV) of the parameters in each regime defined as
POSDEV =
√√√√ 1
R
1
N
R∑h=1
N∑i=1
(θ
(i)k,h − θk
)2,
6We also get very similar results using DIC, see Section 2.4.4 for further results on the ability of DIC to detect thecorrect number of structural breaks. In cases where a too high number of change points are selected, we find that insome cases parameter estimates in the last regime tend to suffer from biases, whereas in other cases they do not. Itdepends of course on the position of the estimated change points.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
38
Table 2.5.: Monte Carlo parameter estimates
Regime parameter true mean median mode POSDEV
DGP, M4, T = 5001 1 0.9880 0.9877 0.9812 0.07712 µ 2 1.9883 1.9883 1.9455 0.13933 0.8 0.7985 0.7985 0.7941 0.0269
1 0.3 0.2798 0.2782 0.2511 0.07832 d 0.45 0.4223 0.4258 0.4117 0.05193 0.05 0.0854 0.0780 0.0752 0.1194
1-3 σ2 0.1 0.0953 0.0951 0.0762 0.0074
DGP, M4, T = 10001 1 0.9967 0.9968 0.9878 0.05752 µ 2 2.0012 2.0014 1.9961 0.11643 0.8 0.8018 0.8019 0.8014 0.0245
1 0.3 0.2983 0.2974 0.2846 0.04582 d 0.45 0.4430 0.4453 0.4437 0.02673 0.05 0.0686 0.0643 0.0705 0.0336
1-3 σ2 0.1 0.0988 0.0986 0.0847 0.0047
DGP, M4, T = 20001 1 1.0033 1.0040 0.9956 0.04072 µ 2 1.9896 1.9900 1.9929 0.07353 0.8 0.7979 0.7978 0.7957 0.0101
1 0.3 0.3089 0.3082 0.2983 0.02642 d 0.45 0.4479 0.4490 0.4446 0.02393 0.05 0.0638 0.0621 0.0551 0.0241
1-3 σ2 0.1 0.0991 0.0993 0.0898 0.0030
This table reports the true value of the DGP parameters along with the mean, median, mode and
posterior deviation (POSDEV) of θk for the Monte Carlo simulations, generated from M4, 2 CP
with parameters as indicated using T = 500, T = 1000 and T = 2000.
where θ(i)k,h is the ith posterior draw of θk at the hth Monte Carlo iteration, and θk is the vector of the
true DGP parameters in regime k. We summarize results for M4 with 2 change points for T = 500,
T = 1000 and T = 2000 in Table 2.5. In each case, the first 1000 samples are discarded and the next
5000 are used for posterior inference. Overall, we see that the change-point ARFIMA model works
very well. On average, the parameter estimates are very close to their true values. In the simulations,
we estimate M at 25 to 30. Compared to T = 500, as we increase the number of observations in the
DGP to T = 1000, the POSDEV of each parameter drops on average by 10 to 40%. The POSDEV of
each parameter drops even more when we increase the sample size to T = 20007.7We find that our estimation method also correctly identifies the true position of the change points for T = 500.
Furthermore, as correctly pointed out by a referee, the period between each change point increases with T and thuscontributes to the reduction in POSDEV.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
39
Table 2.6.: Model comparison using different criteria
Frequency by DIC and ML
# of CP measure 0 CP 1 CP 2 CP 3 CP
DGP, M1
0DICML
100100
00
00
00
1DICML
00
9899
01
20
2DICML
00
00
98100
20
DGP, M3
0DICML
100100
00
00
00
1DICML
00
93100
00
70
2DICML
00
01
9699
40
DGP, M6
0DICML
84100
120
20
20
1DICML
00
96100
20
20
2DICML
00
00
97100
30
The evidence for the number of change points is determined according to DIC
and ML. “0 CP” column reports the number of times in the repetitions when
the specification with no change point has the best performance.
2.4.4. Deviance information criterion
Another approach to compare the evidence for the number of change points is by using the deviance
information criterion (DIC) of Spiegelhalter et al. (2002). Calculation of DIC in a MCMC scheme is
trivial. Contrary to AIC or BIC, DIC does not require maximization over the parameter space. DIC
is a combination of p (YT | P, θ,M) and a penalty term, pD, which describes the complexity of the
model serving as a penalization term that corrects deviance’s propensity towards models with more
parameters. More precisely, pD = D (P, θ,M) − D(P , θ, M
), where D (P, θ,M) is approximated by
N−1ΣNi=1 − 2 log p
(YT | P (i), θ(i),M (i)
)and D
(P , θ, M
)= −2 log p
(YT | P , θ, M
), where P , θ and M
are estimated from the Gibbs output using mean or mode of the posterior draws. The DIC is defined
as D(P , θ, M
)+ 2pD. It is worth mentioning that the best model is the one with the smallest DIC.
However, it is difficult to say what would constitute a significance difference in DIC. Very roughly,
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
40
differences of more than 10 might definitely rule out the model with the higher DIC.
The DIC is calculated for specifications M1, M3, M6 and results are summarized in Table 2.6.
Similar results are obtained for the other cases. The number of times out of 100 repetitions that a
specific change-point model is selected as best according to DIC and ML is reported. For instance,
in the top row, for DGP M6 with no change point, 84 out of 100 repetitions the no change-point
model has the smallest DIC, 12/100 times the one change-point model is best, 2/100 times two and
three change-point models are best. When there are two structural breaks in the DGP for M6, DIC
correctly identifies the true model 97/100 times, whereas 3/100 times the three change-point model is
best. On the other hand, for this DGP, ML identifies the correct model 100/100 times.
2.4.5. Higher number of change points
In Section 2.4.2 we evaluated the performance of the change-point ARFIMA model under 0, 1 and 2
structural breaks. In this section, we evaluate the performance of the change-point ARFIMA model
when the DGP contains more change points. Specifically, we set T = 2000 and simulate data containing
4 change points. As before, the position of the change points follows a Uniform distribution. We
consider specifications M1, M3, M4 and M6. For each specification, we estimate the change-point
ARFIMA model assuming 0, ..., 5 structural breaks.
We find that DIC and ML correctly identify the true model in most cases. However, we are also
interested in whether or not our method is able to correctly identify the position of the change points.
Therefore, for each Monte Carlo iteration, we calculate the position of the change points using the
mode of S(i)Ni=1, and compare it with the true position of the change points. Specifically, for each
change point, we calculate
DIFFk =1
R
R∑h=1
∣∣∣cp(h)k − cp
(h)k
∣∣∣
Table 2.7.: Dating the change points
DIFFDGP # CP 1st CP 2nd CP 3rd CP 4th CP
M1
µ4 0.0053 0.0105 0 0
M3
σ2 4 2.2000 4.9375 4.3500 1.3250
M4
µd
4 0.3913 0.2391 0 0
M6
All parameters4 0.0822 0.6027 0.1644 0
This table reports the average difference (for each change point) between the
estimated change-point and the true change-point date. The definition of DIFF
is given in the text.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
41
for k = 1, ...,m − 1, where cp(h)k is the kth estimated change-point date. cp
(h)k is the true position of
the kth change point for the hth Monte Carlo iteration. For results to give meaning, we focus only on
specifications that have the correct number of change points, i.e. four change points.
Table 2.7 reports DIFF for each specification when the underlying process contains 4 change points.
For instance, we miss (on average) the correct date of the second change point by 0.6 time periods (or
0.6 ≈ 1 day, if we work with daily data) forM6. On the other hand, we correctly identify the correct
date of the last change point at every Monte Carlo iteration as∣∣∣cp(1)
4 − cp(1)4
∣∣∣ =∣∣∣cp(2)
4 − cp(2)4
∣∣∣ = ... =∣∣∣cp(R)4 − cp(R)
4
∣∣∣ = 0. The same happens for the third and forth change point for M1and M4.
2.4.6. Sample size
In order to assess the robustness of our methods with respect to different sample sizes, the more
challenging specifications are considered with sample sizes of 500, 1000 and 2000. Therefore, results
for M2 with one and two change points along with M4 with two change points are reported in Table
2.8. Increasing T improves identification of the true number of change points. The distribution is also
more concentrated on the true model. For instance, for DGPM2, 2 CP with T = 500, we identify the
correct specification only 54/100 times, whereas the for T = 1000, the correct specification is selected
95/100 times. For T = 2000, the correct specification is selected 98/100 times. We obtain very similar
results for DIC and the other model specifications which we do not report here.
Overall, we find that DIC can be considered as a very compelling alternative to ML. Furthermore,
the change-point ARFIMA model works very well in identifying the true dates of the change points.
Specifications M1, M4 and M6 are more accurate than M3.
Table 2.8.: Effect of sample size on identification of change points
Frequency by ML
Sample size 0 CP 1 CP 2 CP 3 CP
DGP, M2, 1 CP500 9 76 6 91000 1 86 5 82000 0 95 3 2
DGP, M2, 2 CP500 0 0 54 461000 0 0 95 52000 0 0 98 2
DGP, M4, 2 CP500 0 0 83 171000 0 0 98 22000 0 0 98 2
The “0 CP” column records the number of times when the model with no change
point has the highest marginal likelihood. The “1 CP” column records the number
of times when the model with one change point has the largest marginal likelihood.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
42
2.5. Realized Volatility
Suppose that, along day t, the logarithmic prices of a given asset follow a continuous-time diffusion
process
dp (t+ s) = µ (t+ s) dt+ σ (t+ s) dW (t+ s) , 0 ≤ s ≤ 1, t = 1, 2, ...,
where p (t+ s) is the logarithmic price at time t + s, µ (t+ s) is the drift component, σ (t+ s) is
the instantaneous volatility and W (t+ s) is a standard Brownian motion. In addition, suppose that
σ (t+ s) is orthogonal to W (t+ s), such that there is no leverage effect. This assumption is standard in
the realized volatility literature. Andersen et al. (2003) and Barndorff-Nielsen and Shephard (2002a)
show that daily returns, defined as rt = p (t)− p (t− 1) are conditionally Gaussian. That is
rt | Ft ∼ N(ˆ 1
0µ (t+ s− 1) ds,
ˆ 1
0σ2 (t+ s− 1) ds
).
The true volatility for day t is defined as IVt =´ 1
0 σ2 (t+ s− 1) ds and is known as the integrated
volatility. In the absence of microstructure noise, realized volatility is a consistent estimator of IVt as
the intraday sampling frequency goes to infinity. Realized volatility (RV) is constructed from the sum
of intraday squared returns, Σnj=1r
2j,t, where rj,t = pj,t − pj−1,t. pj,t is the j th intraday price and n is
the number of intra-daily observations. As pointed out by for example Andersen et al. (2003), RVt is
more efficient than traditional measures of volatility, such as daily squared returns.
Market microstructure dynamics contaminate the price process with noise. Hence, RVt can be a
biased and inconsistent estimator of IVt, see Hansen and Lunde (2006) for more details on the effects
of market microstructure noise on volatility estimation. In order to reduce the effects of market
microstructure noise, we employ a kernel-based estimator that utilizes autocovariances of intraday
returns. Specifically, we follow Hansen and Lunde (2006) and provide a bias correction to realized
volatility in the following way
RV qt =
n∑j=1
r2t,j + 2
q∑w=1
(1− w
1 + q
) n−w∑j=1
rt,jrt,j+w,
where q is a small positive integer, and we set q = 1. Henceforth, RV qt is referred to as RVt.
2.6. Application to S&P 500 Volatility
2.6.1. Data
The empirical application is based on S&P 500 index data using the S&P’s Depositary Receipts fund.
The data consists of 5-minutes intra-daily observations from January 2nd, 2000 to December 31st,
2009, for a total of T = 2515 trading days. The cleaning of the data is carried out using the steps in
Barndorff-Nielsen et al. (2009). After cleaning, a 5-minute grid from 9:30 to 16:00 is constructed using
previous-tick method, see Hansen and Lunde (2006). From this grid, 5-minute intraday returns are
constructed. These returns are used to construct realized volatility. Following Raggi and Bordignon
(2012), the annualized realized standard deviation, yt =√
252RVt/100, is considered.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
43
However, there are outliers in yt. Therefore, we risk that a single outlier can wrongly be identified as
a separate regime8. To rule this out, we follow Kim et al. (2005) and Liu and Maheu (2008), imposing
the assumption that each regime lasts at least 66 days. Specifically, we perform the following: when
we simulate a draw of S(i) which has a regime shorter than 66 days, we discard it and resample until
each regime is 66 days or more in length. We find that our results are robust to different duration
restrictions, see Section 2.6.3. The first 1000 draws are discarded and the next 5000 are used for
posterior inference.
Table 2.9.: Model comparison by marginal likelihood and DIC for S&P 500 volatility
M1 M2 M3 M4 M5 M6
# CP µ d σ2 µ, d µ, σ2 All parameters
0-1946.84 -1946.84 -1946.84 -1946.84 -1946.84 -1946.84(3860.65) (3860.65) (3860.65) (3860.65) (3860.65) (3860.65)
1-1942.64 -1942.42 -1918.87 -1951.32 -1927.45 -1929.60(3852.58) (3851.81) (3778.32) (3854.58) (3803.89) (3801.83)
2-1938.71 -1946.55 -1549.75 -1941.20 -1378.12 -1531.20(3845.17) (3859.81) (2777.14) (3836.48) (2705.06) (2701.23)
3-1933.58 -1945.08 -1504.93 -1949.16 -1283.65 -1379.35(3814.83) (3854.55) (2728.29) (3816.87) (2514.36) (2511.32)
4-1915.71 -1945.45 -1404.15 -1921.52 -1257.10 -1290.64(3781.36) (3853.67) (2562.78) (3764.23) (2360.45) (2378.46)
5-1933.43 -1945.58 -1670.04 -1937.13 -1259.75 -1999.48(3819.57) (3855.29) (2859.83) (3774.48) (2368.46) (3898.11)
6-1940.84 -1947.59 -1682.08 -1941.23 -1737.02 -2053.65(3833.59) (3858.50) (2983.70) (3801.87) (2653.20) (4109.60)
This table reports the log-marginal likelihood with α = 0.99 and DIC (indicated in the parentheses) for different ARFIMAmodels. The first row lists the index of the models. The second row lists the parameters that are subject to structuralbreaks within each specification. The first column is the number of the change points (CP) that are conditioned on.Bold numbers indicate the highest marginal likelihood and lowest DIC within each specification.
2.6.2. Results
To conduct estimation, we use the same priors as in Section 2.4.1. We investigate models under the
structural change configurations in Table 2.1. Table 2.9 displays log(ML) and DIC (indicated inside
the parentheses) for specifications with zero up to six change points.
Results suggest existence of four change points according to ML and DIC. The log-marginal likeli-
hood for no change point is −1946.84, and most specifications with structural breaks improve on this.
The difference between the best structural break specification (M5, 4 CP) and M0 is large with a
Bayes factor of exp (689.74) in favor of four structural breaks. This is very strong evidence. For all
model settings, except M2, the largest ML (lowest DIC) occurs at four change points. There is some
posterior support for four change points forM2, but it is outperformed by its one change-point coun-
terpart with a Bayes factor of exp (3.03). We also compare models across parameter specifications.
The highest log(ML) and lowest DIC model across all cases is −1257.10 and 2360.45, respectively for
M5 with four change points. Considering the second largest log(ML) in Table 2.9, which is −1290.64
fromM6 with four change points, the Bayes factor ofM5 vsM6 is exp (33.54). Therefore, we conclude
that the effect of the breaks is mainly in µ and σ2.
8This is a very common problem in RV data as realized volatility by its nature is very noisy.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
44
Compared to M5 and M6, our results also suggest that incorporating changes only in µ, d, σ2, or
the combination µ,d worsens ML (DIC) considerably, indicating the need to model breaks in both µ
and σ2. For instance, compare the performance ofM3 withM5. Thus, while both models point to the
4 CP specification as the best performing model, the Bayes factor of M5 versus M3 is exp (147.05).
Table 2.10.: Parameter estimates for S&P 500 volatility
M5, 4 CP ARFIMA-GARCH, 3 CP
Parameter mean RB Geweke M-Hratio
mean RB Geweke M-Hratio
d 0.4445 6.86 -0.50 0.35 0.4806 7.95 -0.19 0.32[0.4181,0.4722] [0.4612,0.4897]
µ1 0.1890 4.20 -1.22 0.37 0.1128 7.76 -0.61 0.39[0.1685,0.2089] [0.0669,0.1590]
µ2 0.0945 10.11 -0.15 0.34[0.0876,0.1015]
µ3 0.1554 6.09 -0.61 0.41[0.1266,0.1836]
µ4 0.3922 9.82 1.11 0.42[0.3270,0.4589]
µ5 0.1702 4.45 -2.19 0.34[0.1517,0.1890]
σ21 0.3238 1.03 0.76
[0.2944,0.3572]σ2
2 0.0501 1.76 0.58[0.0450,0.0549]
σ23 0.1806 0.88 1.41
[0.1515,0.2152]σ2
4 0.6809 1.02 0.01[0.5345,0.8755]
σ25 0.0822 1.38 -0.52
[0.0676,0.0999]M 31 15.41 -1.85 0.28
[24,38]γ2 1.1014 7.04 0.01 0.39
[0.8552,1.3937]γ3 1.6845 7.04 -0.01 0.37
[1.2693,2.2280]γ4 0.9303 4.53 -0.21 0.33
[0.7590,1.1251]ω 0.0029 3.71 -1.04 0.42
[0.0023,0.0036]a 0.1644 4.95 -1.95 0.42
[0.1449,0.1833]b 0.8313 4.87 1.47 0.42
[0.8124,0.8505]
DIC 2360.45 2375.11log(ML) -1257.10 -1261.83
This table reports posterior means (mean), 95% credibility intervals (indicated inside the brackets), inefficiency factors(RB), Geweke’s convergence statistics (Geweke), M-H acceptance ratios, DIC, log(ML) using α = 0.99 for M5, 4 CPand ARFIMA-GARCH, 3 CP. Parameters associated with each regime are labeled with subscript 1, ...,m.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
45
Figure 2.1.: Change-point dates for M5, 4 CP
00 01 02 03 04 05 06 07 08 09 100
0.25
0.5
0.75
1
1.25
1.5
RVFitted RVConditional meanChange−point dates
S&P 500 annualized volatility, fitted volatility, conditional mean and estimated change-point dates
indicated as vertical lines for M5, 4 CP.
Figure 2.1 displays the data, fitted volatility and the estimated change-point dates shown by vertical
lines using the posterior mode of S(i)Ni=1 for M5 , 4 CP. The change-point dates are given as:
04-10-2003, 07-17-2007, 09-16-2008 and 03-19-2009.
There are several events that may have contributed to the changes in volatility dynamics. For
instance, the last two break points occur during the financial crises (fall of 2008 and spring of 2009).
Furthermore, 09-16-2008 and 03-18-2009, which is the date before the last change point are both
associated with FOMC announcements. The first and second break points correspond roughly to the
beginning of the second Iraq invasion in 2003 and the beginning of the subprime crisis in the US in
20079. Finally, in the last two phases, we see a structural exponential decay of volatility. The fitted
values show that our change-point specification is also able to incorporate this feature.
Table 2.10 reports some summary statistics concerning posterior distribution of the key parameters
of M5, 4 CP and some diagnostics. Specifically, after discarding the first 1000 iterations, we collect
the final sample and compute: the posterior mean of θ and M , 95% credibility intervals (indicated
inside the brackets), inefficiency measures, RB, Geweke’s convergence statistics, Metropolis-Hastings
acceptance ratios, DIC and log(ML). RB displays the relative variance of the posterior sample draws
when adapting for correlation between iterations, as compared to the variance without accounting for
correlation. In these calculations, a bandwidth, B, of 100 is used, see Kim et al. (1998) for a further
background on this measure.
None of the density intervals for the parameters include 0. The order of integration, d, is estimated
at 0.44 (compared to 0.48 for M0). This implies that S&P 500 data exhibits long-memory behavior.
Furthermore, results suggest that there are no structural breaks in d, which can be seen from the
relative smaller ML values for M2, M4 or M6 compared to M5. When we look at the parameters
9The beginning of the second Iraq invasion is associated with a lower level of volatility which seems counterintuitive.We cannot find an explanation for this phenomenon.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
46
that change from a break, we find that there are sensible differences across regimes. For instance, we
see that µk increases from 0.15 to 0.39 during the regime that covers the financial crisis. Subsequently,
µk falls to 0.17 from the last change point till the end of the sample. The same happens for the
conditional volatility of realized volatility, σk. Specifically, we estimate σ4 at 0.82 during the 2008
financial crises. This estimate is twice as large as σ3, which we estimate at 0.42. Thereafter, σk falls
to 0.28 from the last change point till the end of the sample.
2.6.3. Robustness to minimum duration restrictions
In this section, we test the robustness of the minimum regime durations. We follow Liu and Maheu
(2008), and estimate the best model, M5, under different minimum duration lengths. Hence, besides
the minimum 66 days (3 months), we consider the following duration lengths: 44 days (2 months), 88
days (4 months), 110 days (5 months) and 132 days (6 months).
Table 2.11 reports ML values for M5 from 1 to 6 change points under the mentioned duration
lengths. Overall, we see that the marginal likelihoods are almost identical across different cases,
except for duration lengths of 44 and 132 days. First, when we set the minimum duration length to
44, we obtain the exact same ML values as in Table 2.9 for 1 to 4 change points. Second, for 5 and 6
change points, we get lower ML values. However, this does not change the main conclusion.
For 132 days, we find evidence in favor of 3 change points at: 04-10-2003, 07-25-2007 and 03-19-
2009. Furthermore, the ML values for M5, 4 to 6 CP differ significantly than those in Table 2.9.
However, this is understandable. The explanations is as follows: in Section 2.6.2, we found evidence
of four structural breaks. Furthermore, the last two break dates occur at 09-16-2008 and 03-19-2009.
Accordingly, there are 127 days between these to dates. Hence, when we set the minimum duration
length to 132, we are automatically forcing the model to find different break dates than those dates. In
fact, for the specification with four change points, we find that the last two breaks occur at 09-05-2008
and at 03-19-2009, whereas for the other duration lengths they occur at 09-16-2008 and 03-19-2009.
Consequently, we obtain different ML values. Furthermore, these results indicate that M5, 4 CP
estimate of the third change-point date under the minimum 132 days restriction, 09-05-2008, worsens
ML considerably.
Overall, the marginal likelihoods are almost identical across the different duration restrictions. The
first four specifications favor the four change points with exact change-point dates10.
Table 2.11.: Robustness to minimum regime duration lengths
44 days 66 days 88 days 110 days 132 days# CP (2 months) (3 months) (4 months) (5 months) (6 months)
1 -1927.45 -1927.45 -1927.45 -1927.45 -1927.452 -1378.12 -1378.12 -1380.89 -1378.12 -1378.123 -1283.65 -1283.65 -1283.65 -1283.65 -1283.654 -1257.10 -1257.10 -1257.10 -1257.10 -1662.565 -2033.78 -1259.75 -1259.75 -1259.75 -1542.716 -2036.12 -1737.02 -1737.02 -1737.02 -1702.11
This table compares log(ML) values of different minimum regime durations for M5. Each column
reports the lower bound for the number of observations in each regime.
10We obtain very similar results for DIC.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
47
2.6.4. Prior sensitivity analysis
In this section, sensitivity of the results to prior specification is evaluated by experimenting with
different prior hyperparameter values on the transition probabilities, pkk, keeping prior hyperparameter
values of the other parameters the same as in Section 2.4.1. pkk is one of the key parameters of the
model because it controls the duration of each regime in a sequence of S.
In Table 2.12 we experiment with different hyperparameter values on pkk, and report log(ML) for
each of these hyperparameter values by estimating M5 from 1 to 6 change points. For instance,
the first alternative prior is pkk ∼ Beta (0.5, 0.5) which is relatively flat. With this prior, we find
evidence of three change points. They are given as: 04-10-2003, 07-25-2007 and 03-19-2009. For
pkk ∼ Beta (0.5, 0.5), compared to M5, 4 CP, the Bayes factor in favor of M5, 3 CP is exp (1.35),
which is very weak evidence. Hence, contrary to the results in Table 2.9, we do not have substantial
posterior evidence in favor of 3 change points. For pkk ∼ Beta (10, 2), we find strong evidence in favor
of four change points. Furthermore, the change-point dates correspond exactly to those in Section
2.6.2. Finally, for pkk ∼ Beta (20, 0.1) which is a relatively tighter prior, we find that M5, 5 CP
performs best11. The first break occurs at 07-26-2002, the second at 04-14-2003, while the remaining
break dates correspond exactly to those in Section 2.6.2. Evidently, there remains some uncertainty
regarding the correct number of breaks for very uninformative and very tight priors on pkk. However,
results overwhelmingly suggest existence of structural breaks during the financial crisis of 2008/2009.
Finally, we also experiment with different prior hyperparameter values on µk, dk and σ2k. Overall, we
obtain very similar results.
Table 2.12.: Prior sensitivity analysis
# CP pkk ∼ Beta (0.5, 0.5) pkk ∼ Beta (8, 0.1) pkk ∼ Beta (10, 2) pkk ∼ Beta (20, 0.1)
1 -1928.12 -1927.45 -1935.21 -1927.352 -1384.52 -1378.12 -1393.53 -1379.413 -1290.85 -1283.65 -1669.50 -1616.894 -1292.20 -1257.10 -1289.23 -1444.745 -1869.91 -1259.75 -1869.29 -1237.216 -1905.09 -1737.02 -1888.65 -1604.79
This table compares log(ML) values for different prior hyperparameter values on pkk. The prior hyperparameter valuesof the other parameters are fixed according to Section 2.4.1.
2.6.5. Forecasts
In this section, we compare the out-of-sample performance ofM5 (break) withM0 (no-break). Specif-
ically, we compare the out-of-sample predictive likelihood (PL) and predictive mean between these two
models. Given data up to time t−1, Yt−1 = (y1, ..., yt−1)′, the predictive likelihood, p (yt, .., yT | Yt−1),
is the predictive density evaluated at the realized outcome, yt, ..., yT , t ≤ T , see Geweke (2005). It
contains the out-of-sample prediction record of a particular model, making it the essential quantity of
interest for model evaluation. For instance, the predictive likelihood for M5 is given as
p (yt, .., yT | Yt−1,M5) =
ˆp (yt, .., yT | P, θM, Yt−1,M5) p (P, θM | Yt−1,M5) dPdθdM. (2.6.1)
11Specifically, with pkk ∼ Beta (20, 0.1) it means that we assume the expected duration of each regime is about 201 daysbefore we see the data.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
48
(2.6.1) is the product of the individual predictive likelihoods
p (yt, ..., yT | Yt−1,M5) =
T∏s=t
p (ys | Ys−1,M5) .
If t = 1 this would be the marginal likelihood and (2.6.1) changes to (2.3.4). Hence, the sum of log-
predictive likelihoods can be interpreted as a measure similar to the logarithm of the marginal likeli-
hood, but ignoring the initial t−1 observations. The predictive likelihood can be used to order models
according to their predictive abilities. In a similar fashion to Bayes factors, one can also compare the
performance of models based on a specific out-of-sample period by predictive Bayes factors, PBF. The
PBF for model A versus B is given as PBFAB = p (yt, .., yT | Yt−1,MA) /p (yt, .., yT | Yt−1,MB), and
summarizes the relative evidence of the two models over the out-of-sample data, yt, ..., yT . Calculating
the predictive likelihood within a Gibbs sampling scheme is easy. We can simply use the output from
the Gibbs sampler. These draws are obtained based on the information set, Yt−1. As a new observa-
tion enters the information set, the posterior is updated through a new round of Gibbs sampling and
p (yt+1 | Yt,MA) can then be calculated.
In the context of forecasting with M5, we want the optimal change-point number to vary over
the out-of-sample data as the number of change points can increase as time goes by. Accordingly, we
adopt the following strategy: for the first out-of-sample observation at time t, we calculate the marginal
likelihood for various number of change points, (1, ...,K) using Yt−1. Thereafter, we choose the opti-
mal change-point number, Kt−1, using ML. We calculate the predictive likelihood, p (yt | Yt−1,M5),
and the predictive mean, E [yt | Yt−1,M5], using the parameters associated with specification Kt−1.
Thereafter, we increase the out-of-sample with one observation, calculate marginal likelihoods for
(1, ...,Kt−1 + 1) change points, choose the optimal change-point number, Kt, repeat the above fore-
casting procedure to obtain p (yt+1 | Yt,M5) and E [yt+1 | Yt,M5]. We choose the out-of-sample period
from January 23rd, 2006 till the end of the sample. It is also interesting to consider out-of-sample
point forecasts of yt based on the predictive mean. Therefore, we also report mean absolute error
(MAE) and root mean squared error (RMSE) for the predictive mean. The out-of-sample period
corresponds exactly to the period used to calculate PL. Furthermore, in addition to MAE and RMSE,
forecasts are also compared using the linear exponential (LINEX) loss function of Zellner (1986). This
loss function is defined as L (yt, yt) = bLINEX [exp (aLINEX (yt − yt))− aLINEX (yt − yt)− 1], where
yt is the point forecast. L (yt, yt) ranks overprediction (underprediction) more heavily for aLINEX > 0
(aLINEX < 0). We use bLINEX = 1, with aLINEX = 1 and aLINEX = −1 in our calculations.
Overall, the break model offers improvements compared to the no-break model. For one observation
out-of-sample, (T = t), log (PBF ) = 2.26, 1 month, (T = t+ 21), log (PBF ) = 5.14, 3 months,
(T = t+ 65), log (PBF ) = 59.06, each in favor of the break specification. The improvements continue
till the end of sample, see Table 2.13. Finally, Table 2.13 also displays out-of-sample results for one-day
ahead point forecasts for the no-break and the break model. The break model offers improvements in
terms of MAE and RMSE compared to the no-break model. When the LINEX loss function is used, the
break model also provides gains in terms of point forecasts. However, compared to density forecasts,
point forecasts show only modest improvements. The difference between the predictive likelihood and
the predictive mean is that the predictive likelihood criterion takes into account the whole shape of
the density, whereas the predictive mean does not.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
49
Table 2.13.: Out-of-sample forecasts for S&P 500 volatility
Model MAE RMSE LINEX LINEX log(PL)aLINEX = 1, bLINEX = 1 aLINEX = −1, bLINEX = 1
No-break 0.3553 0.6241 0.2582 3.2456 -923.73Break 0.3403 0.5993 0.2475 2.5680 -544.48
This table reports mean absolute error (MAE), root mean squared error (RMSE) and average LINEX for the forecastsbased on the predictive mean for one-day ahead. Furthermore, the one-day ahead log-predictive likelihood, log(PL), isalso reported.
2.6.6. Structural breaks and GARCH effects
In this section, we model changes in the volatility of realized volatility, see Bollerslev et al. (2007),
Corsi et al. (2008), Liu and Maheu (2008). To investigate if the presence of structural breaks is due
to neglected conditional variance dynamics, we consider breaks in an ARFIMA-GARCH model. The
ARFIMA model extended to include a GARCH structure is
(1− L)d (yt − µ) = γstσtet, et ∼ N (0, 1) , σ2t = ω + ae2
t−1 + bσ2t−1, (2.6.2)
where ω > 0, a > 0, b > 0 and a+ b < 1 are also imposed. The parameter, γst , is a scaling constant,
which has a direct effect on the unconditional volatility of yt. In the following, γ1 = 1, and it is
assumed that γk > 0 for k = 2, ...,m. Thus, in regime 1, this is a standard ARFIMA model with
GARCH effects, see Baillie et al. (1996), while in later regimes, the conditional variance of yt can be
larger or smaller than σ2t depending of course on γst > 1 or γst < 1. As noted by Liu and Maheu
(2008), the advantage of this specification is that one can model permanent changes in the volatility
of realized volatility but avoid the path dependence in σ2t induced by parameter changes in ω, a, and
b12. Equation (2.6.2) is estimated using the AR representation of the ARFIMA model. The likelihood
is evaluated using the method of Beran (1994). Let θ = (µ, d, ω, a, b, γ2, ...γm)′, the Gibbs sampler
requires iteration of
1. S(i) | P (i−1), θ(i−1), YT .
2. θ(i) | S(i), YT .
3. P (i) | S(i).
We use Metropolis-Hastings to sample each element of θ. For the GARCH parameters, we sam-
ple ψ = (ω, a, b)′
all-at-once using the Independence Chain Metropolis-Hastings algorithm. Specif-
ically, conditional on S, γ2, ...γm, d and µ, at each iteration of the Gibbs sampler, we maximize
the likelihood of (2.6.2) with respect to ψ, and specify the candidate generating density as q (ψ) ∼N(ψML, c · var
(ψML
)), where c ∈ R+. The priors of θ are independent Normals with mean 0,
variance 100, truncated (except for µ) to satisfy the restrictions on each parameter. Furthermore, we
ensure that a + b < 1 by resampling a(i) > 0 and b(i) > 0, until a(i) + b(i) < 1. We follow Section
2.6.2 and estimate a stable ARFIMA-GARCH model, as well as its structural break version with 1
to 5 change points. We find strong evidence in favor of structural breaks for (2.6.2). Specifically, we
find that the specification with 3 change points performs best. The change-point dates are given as:
11-20-2007, 09-04-2008 and 03-19-2009.
12Liu and Maheu (2008) consider a similar HAR-GARCH specification.
2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach
50
Parameter estimates of the ARFIMA-GARCH specification conditional on 3 change points are
listed in Table 2.10. Compared toM5, 4 CP, we find that d increases (from 0.44 to 0.48). The scaling
parameter, γst , also changes between regimes. Specifically, γst rises during the second and third phase
which start from late 2007 and last till 03-18-2009. γst falls from 1.68 for 09-04-2008/03-18-2009 to
0.93 for 03-19-2009/12-31-2009. Finally, the unconditional volatility of volatility,√γ2stω/ (1− a− b),
increases during the financial crises of 2008 (1.91 for the period 09-04-2008 to 03-18-2009 compared
to 0.82 for the period 11-20-2007 to 09-03-2008). Thereafter,√γ2stω/ (1− a− b) falls to 0.58 from the
last change point till the end of the sample.
2.7. Conclusion
We present a Bayesian method for joint analysis of long-memory and structural breaks using change-
point ARFIMA models. We estimate different specifications and determine the number of change
points using ML and DIC. Monte Carlo simulations demonstrate that our MCMC sampler works very
well as the estimated parameters are close to their true values. Furthermore, we find that ML and
DIC are powerful in detecting and dating structural breaks.
Applying the model to daily S&P 500 data from 2000 to 2009 shows that there is robust evidence
in favor of four structural breaks. We demonstrate that accounting for structural breaks improves
density and point forecasts. Finally, an ARFIMA model with GARCH effects is also considered. This
model provides evidence in favor of three structural breaks.
3. A Survey of Particle Markov Chain Monte Carlo
Techniques for Unobserved Component Time
Series Models
Author: Nima Nonejad
Abstract: This paper details particle Markov chain Monte Carlo (PMCMC) techniques for anal-
ysis of unobserved component time series models using several economic data sets. The objective of
this paper is to explain the basics of the methodology and provide computational applications that
justify applying PMCMC in practice. For instance, we use PMCMC to estimate stochastic volatil-
ity models with leverage effect, Student-t distributed errors and serial dependence. We also model
changing time series characteristics of monthly US inflation rate by considering a long-memory au-
toregressive fractionally integrated moving average (ARFIMA) model with its conditional variance
modeled by a stochastic volatility process with Gaussian and Student-t distributed errors.
Keywords: Bayes, Gibbs, Metropolis-Hastings, particle filter, unobserved components
(JEL: C11, C22, C63)
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
52
3.1. Introduction
This paper is inspired by Bos (2011), Creal (2012), Flury and Shephard (2011). It provides some
contribution in the field of computational sequential Monte Carlo methods1. Specifically, we analyze
different economic data sets using particle Markov chain Monte Carlo (PMCMC) techniques, see
Andrieu et al. (2010), Flury and Shephard (2011). The aim of this paper is not to focus on the
properties of PMCMC compared to Gibbs sampling and maximum likelihood, nor to provide very
thorough analyses of empirical data. Instead, we aim to describe the basic steps of PMCMC, together
with details on implementation of some of the key algorithms2. These algorithms are chosen for the
insights that they provide. They are not always the most advanced, quickest, or most efficient way
of programming. Rather, we aim to show that PMCMC provides a very compelling framework for
estimating unobserved component (UC) time series models. We illustrate these methods on different
problems, producing rather generic methods.
In traditional MCMC (particularly Gibbs sampling) applications of UC models, we use either: a
single-state procedure, see Jacquiera et al. (1994), mixture samplers as in Kim et al. (1998), Omori
et al. (2007) or an accept-reject Metropolis-Hastings procedure as in Chan (2014) to draw the latent
states. In the PMCMC framework, we can estimate these types of models by (a): targeting the
posterior with the latent states integrated out. Thereafter, we can sample the model parameters
using Metropolis-Hastings, or (b): we can sample the latent states directly using the nonlinear or
non-Gaussian framework via a conditional particle filter. In general, (a) is referred to as the particle
marginal Metropolis-Hastings sampler. Andrieu et al. (2010) show that when an unbiased estimated
likelihood is used inside a Metropolis-Hastings algorithm then the estimation error makes no difference
to the equilibrium distribution of the algorithm, see Flury and Shephard (2011) for more details. In
a similar fashion, Andrieu et al. (2010) term the second estimation procedure, (b), the particle Gibbs
sampler, in which one sequentially samples the latent states (using a particle approximation of the
conditional posterior of the latent states) and the model parameters. In this paper we focus mainly
on implementation of the particle marginal Metropolis-Hastings sampler, PMMH. However, we also
provide an empirical application where we apply particle Gibbs, PG3. We believe that applying the
PMCMC methodology to unobserved component models is the most important contribution that we
provide. As we shall see, for these types of models, PMCMC requires limited design effort on the
user’s part, especially if one desires to change some features in a particular model.
The main concepts of PMCMC are briefly introduced in Section 3.2. The initial model in Section
3.3 is the standard stochastic volatility (SV) model with Gaussian errors applied to a financial data
set concerning daily OMXC20 returns. Next, we consider different well-known extensions of the SV
model. The first extension is a SV model with Student-t distributed errors. In the second extension,
we incorporate a leverage effect by modeling a correlation parameter between measurement and state
errors. In the third extension, we implement a model that has both stochastic volatility and moving
average errors, see Chan (2013). The fourth extension is PMMH implementation of the stochastic
1Bos (2011) and Creal (2012) provide excellent introductions to particle filtering both in terms of theory and computa-tion. In fact, both authors provide the software associated with their work. Hence, the idea behind this article is todo the same, however, for PMCMC which is gaining traction in time series econometrics. This way other researcherscan replicate our results using the codes that we can provide.
2The choice of Ox is mainly because it is a popular program among econometricians.3Furthermore, we address possible technical complications of PG by using the conditional particle filter with ancestor
sampler of Lindsten et al. (2012), which has been shown to be computationally more robust and flexible.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
53
volatility in mean model of Koopman and Hol Uspensky (2002). In this specification, the unobserved
volatility process appears in both the conditional mean and the conditional variance. We show that
estimating this specification is also very straightforward using PMMH. Finally, we consider a two factor
stochastic volatility model as in Harvey et al. (1994) and Shephard (1996). We show that PMMH
provides a straightforward procedure for estimation and marginal likelihood (ML) calculation of these
models. Specifically, computing the marginal likelihood is relatively easy using the method of Gelfand
and Dey (1994) as the integrated likelihood is easily available from the particle filter. Thereafter,
we reconsider the unobserved components model of the US inflation rate, see Stock and Watson
(2007), Grassi and Proietti (2010). We estimate different specifications of the unobserved components
model using PMMH. Model selection is again carried out by comparing marginal likelihoods between
models. Results indicate that the specification in which the volatility of both the regular and irregular
components of inflation evolve according to SV processes performs best.
Finally, we show that it is also relatively easy to estimate more complicated models using PMCMC.
First, we estimate an autoregressive fractionally integrated moving average (ARFIMA) model with
time-varying volatility using monthly postwar US core inflation data from 1957 to 2013. Our initial
model is an ARFIMA model with time-varying volatility modeled as a Gaussian SV process, see Bos
et. al (2012). Next, we consider a well-known extension, and model volatility using a SV model with
Student-t distributed innovations. Furthermore, model estimation is comparatively fast, so we are
able to obtain rolling estimates necessary for forecasting and parameter sensitivity analysis. Second,
we follow Chan (2014), and develop an unobserved components model where the stochastic volatility
process has a direct and time-varying impact on the variable of interest. However, Chan (2014) uses
a traditional Gibbs sampling approach, whereas we use particle Gibbs. We show that we are able
to sample the latent states all-at-once using the nonlinear state-space form. On the other hand,
Chan (2014) draws these latent states sequentially. Furthermore, Chan (2014) is forced to implement
an accept-reject Metropolis-Hastings procedure to draw the volatility process from its conditional
posterior as the conditional state-space model cannot be written in linear form.
The main concepts of PMCMC with computational focus on Metropolis-Hastings and the particle
filter are presented in Section 3.2. In Sections 3.3 to 3.6, we present several applications to demonstrate
the performance of the algorithms. Finally, the last section concludes.
3.2. Markov Chains and Particle Filters
Consider the simplest formulation of the stochastic volatility (SV) model
yt = exp (αt/2) εt, εt ∼ N (0, 1) (3.2.1)
αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (3.2.2)
where yt is the observed data, α1:T = (α1, ..., αT )′
are the unobserved log-volatilities, µ is the drift in
the state equation, σ is the volatility of log-volatility and ρ is the persistence parameter. Typically,
we would impose that |ρ| < 1 so that we have a stationary process with the initial condition, α1 ∼N(µ, σ2/
(1− ρ2
)). Let θ =
(µ, ρ, σ2
)′and YT = (y1, ..., yT )
′. This model has been heavily analyzed in
time series econometrics, see for example Kim et al. (1998). Equations (3.2.1)-(3.2.2) are an example
of a nonlinear state-space model where the measurement equation is nonlinear in α1:T . Furthermore,
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
54
while sampling θ ∼ p (θ | α1:T , YT ) is relatively easy, sampling α1:T ∼ p (α1:T | θ, YT ) is often difficult.
Within the Gibbs sampling framework, the most popular approach to estimating (3.2.1)-(3.2.2) is
the so-called auxiliary mixture sampler, see Kim et al. (1998). The idea is to approximate the SV
model using a mixture of linear Gaussian models. Specifically, we can square both sides of (3.2.1), take
the logarithm such that y∗t = αt+ε∗t , where y∗t = log y2
t and ε∗t = log ε2t . Kim et al. (1998) show that ε∗t
can be approximated by a seven-component Gaussian mixture density. We can then write this mixture
density in terms of an auxiliary random variable, zt ∈ 1, ..., 7, that serves as the mixture component
indicator. Hence, ε∗t | zt = i ∼ N(mi, s
2i
), where Pr (zt = i) = ωi. The values of mi, s
2i and ωi,
i = 1, ..., 7 are all fixed and given in Kim et al. (1998). Using this Gaussian mixture approximation,
we can express the SV model as a linear Gaussian state-space model. Bayesian estimation can be
performed using standard Gibbs sampling techniques, see Kim et al. (1998). Finally, using this
approach, we sample from the augmented posterior, p (θ, α1, ..., αT , z1, ..., zT | YT ), i.e. augmented
to include z1, ..., zT and not from p (θ, α1, ..., αT | YT ). Within the PMCMC framework, specifically
PMMH, we need not use the above approximation. On the contrary, we can approach estimating
(3.2.1)-(3.2.2) by using the predictive decomposition
p (YT | θ) =
T∏t=1
p (yt | Yt−1, θ) . (3.2.3)
In most general cases, we do not have a closed form expression for p (yt | Yt−1, θ). We can therefore only
approximate it. In this paper, we use simulations to unbiased estimate each term on the right side of
(3.2.3). This is carried out using a particle filter, see Section 3.2.1. We then use one of the main results
of Andrieu et al. (2010). Their result states that when an unbiased estimated likelihood, p (YT | θ),is used inside a MCMC algorithm then the estimation error makes no difference to the equilibrium
distribution of the algorithm, the posterior distribution, p (θ | YT ) ∝ p (YT | θ) p (θ). Thus, using
(3.2.3) and the prior, p (θ), we can sample θ using Metropolis-Hastings (M-H), see equation (13) of
Andrieu et al. (2010)4. The general M-H algorithm follows the steps:
1. Initialize, start with a vector of parameters, θ(0), set i = 1.
2. Draw a candidate value, θ∗ ∼ q(θ | θ(i−1)
).
3. Obtain p (YT | θ∗) via θ∗ and a particle filter. Accept θ∗, p (YT | θ∗) with probability
aMH
(θ∗, θ(i−1)
)= min
1,p (YT | θ∗)p (θ∗) q
(θ(i−1) | θ∗
)p
(YT | θ(i−1)
)p(θ(i−1)
)q(θ∗ | θ(i−1)
) . (3.2.4)
4. If θ∗ is accepted, set θ(i) = θ∗, p(YT | θ(i)
)= p (YT | θ∗), else retain θ(i−1) and p
(YT | θ(i−1)
).
Set i = i+ 1 and repeat from Step 2.
The candidate density, q(θ | θ(i−1)
), can be chosen freely, though a density which is related to the
target density would lead to better acceptance rates. We start by using the Random Walk Metropolis-
Hastings algorithm, see Koop (2003). Thus, we generate θ∗ from q(θ | θ(i−1)
), a multivariate Normal
4We follow the same framework as Flury and Shephard (2011). Thus, we sample θ and not the pair (θ;α1:T ) as inAndrieu et al. (2010), which requires a very minor modification in the codes. Furthermore, there are no differencesin estimation results between these two approaches. However, the first framework is computationally little easier.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
55
density with expectation, θ(i−1), and a prespecified covariance matrix, Σq. We follow Koop (2003,
page 98) and adjust Σq to get acceptance rates roughly around 30 to 40%. Furthermore, to allow for
faster convergence and even better mixing, we follow So et al. (2005). Thus, we do the following: we
perform the Random Walk M-H algorithm for the first N1 of the total N PMMH iterations, form the
sample mean and covariance forθ(i)N1
i=1,(θ, Σ
). Then, we perform the following Independent Chain
M-H algorithm for the remaining N2 = N −N1 iterations. Hence,
1. At iteration i, where i ≥ (N1 + 1), generate θ∗ from N(θ, Σ
).
2. Obtain p (YT | θ∗) via θ∗ and a particle filter. Accept θ∗, p (YT | θ∗) with probability
aMH
(θ∗, θ(i−1)
)= min
1,p (YT | θ∗)p (θ∗) q
(θ(i−1)
)p
(YT | θ(i−1)
)p(θ(i−1)
)q (θ∗)
, (3.2.5)
where q(θ(i−1)
)∼ N
(θ, Σ
)evaluated at θ(i−1). Set i = i+ 1 and repeat from Step 1.
Finally, we specify the priors as: µ ∼ N (0, 1), (ρ+ 1) /2 ∼ Beta (20, 1.5) and σ2 ∼ IG (4/2, 0.02/2),
where IG (v0/2, s0/2) denotes the inverse Gamma density, see Kim and Nelson (1999).
Table 3.1.: A particle filter in Ox
funcParticleFilter(const vhpf, const vESS, const vlogpdf, const vparam, const vy,
const iN)
...
vw=ones(iN,1)*(1/iN);
vlogpdf[0][0]=0; // No likelihood contribution
[vh]=funcInParticles(vparam,iN); // Initial particles
for(i=1; i<it; i++)
vA=funcResample(vw’,iN); // Resample
vh=funcDrawP(vh[vA’],vparam,vy[i][0],iN); // Draw from q()
vE=vy[i]./(exp(0.5*vh));
vtau=exp(-0.5.*vh).*densn(vE); // Compute likelihood
vlogpdf[0][i][0]=log(meanc(vtau)); // Take logs
vw=vtau./sumc(vtau); // Normalize
if (ismissing(vw)) // Reset weights if missing
vw=ones(iN,1)/iN;
vhpf[0][i][0]=sumc(vw.*vh); // Store mean
vESS[0][i][0]=1/sumc(vw.^2); // Store ESS
if(vESS[0][i][0]<0.5*iN)
// If vESS[0][i][0]<0.5*iN resample
vA=funcResample(vw’,iN);
vh=funcDrawP(vh[vA’],vparam,vy[i][0],iN);
vw=ones(iN,1)*(1/iN);
return 1;
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
56
3.2.1. A Particle filter
As stated in Section 3.2, we obtain p (YT | θ) by employing a particle filter. The particle filter is a
sequential simulation device for filtering of non-Gaussian, nonlinear state-space models. It can be
thought of as a generalization of the Kalman filter. Both the particle and Kalman filters produce
filtered estimates of α1:T and p (yt | Yt−1, θ) for t = 1, ..., T . In the Kalman filter case all these
quantities are exact, whereas in the particle filter case they are simulation-based estimates.
The main idea of the particle filter is to sample a cloud of particles, α(j)t , j = 1, ...,M , such that
they together describe the density of the state variable at time t conditional on Yt. At each t, we
propagate the particles, α(j)t , and update their associate weights. This way, we prevent accumulation
of errors by eliminating unpromising particles. In the following, we give a brief description of a very
general particle filter that we use throughout this paper. The reader is referred to Doucet et al. (2000)
and Creal (2012) for more details on particle filtering. Our particle filter scheme is as follows:
1. Set t = 1 and l0 = 0. Draw α(1)t , ..., α
(M)t from αt | θ.
2. Compute τ(j)t = p
(yt | α(j)
t , Yt−1, θ)
and w(j)t = τ
(j)t /
(ΣMk=1τ
(k)t
)for j = 1, ...,M .
3. Resampleα
(1)t , ..., α
(M)t
with probabilities w
(1)t , ..., w
(M)t . First, draw u ∼ U (0, 1). Let x(j) =
uM + (j − 1) /M for j = 1, ...,M , and find indices i(1), ..., i(M) such that Σi(j−1)
k=1 w(k)t < x(j) ≤
Σi(j)
k=1w(k)t . We refer to this step as the “ (multinomial) resampling step” for future references.
4. Sample α(j)t+1 ∼ αt+1 | α
(i(j))t , θ for j = 1, ...,M .
5. Compute lt (θ) = lt−1 (θ) + log(M−1ΣM
j=1τ(j)t
). Set t = t+ 1 and goto step 2.
From the particle filter, output of the filtered particles can be collected. We can use αt = ΣMj=1w
(j)t α
(j)t
as an estimate for E [αt | α1:t−1, Yt−1, θ]. An intuitive implementation of a very basic particle filter
used throughout this paper is provided in Table 3.1.
Del Moral (2004, Theorem 7.4.2) shows that E[
p (YT | θ) = exp (lT (θ))]
= p (YT | θ). Therefore,
since the particle filter provides us with an unbiased estimate of p (YT | θ), we can use the result
in Andrieu et al. (2010) and replace p (YT | θ) with p (YT | θ) inside a MCMC sampling scheme.
Thereafter, we sample θ, which has a lower dimension using M-H, see Flury and Shephard (2011).
Basic code for evaluating aMH
(θ∗, θ(i−1)
)itself could be implemented as in Table 3.2. The routine
funcMetropolisHastings() draws θ∗, evaluates aMH
(θ∗, θ(i−1)
)using (3.2.4) for i = 1, ..., N1 and
(3.2.5) for i = N1 + 1, ..., N . The function funcPosterior() evaluates p (θ∗ | YT ) as θ∗ is generated.
In order to evaluate p (θ∗ | YT ), we need to run the particle filter using θ∗. For i = 1, ..., N1, we specify
the random walk variance, Σq, as vtune. As stated before, in order to get reasonable acceptance
rate probabilities, we experiment with several values for vtune. For instance, for the Gaussian SV
model, we set: 4µ(i) = 0.2828ε(i)1 , 4ρ(i) = 0.01ε
(i)2 and 4σ2(i) = 0.01ε
(i)3 , where εk ∼ N (0, 1) for k =
1, ..., 3. Thereafter, we follow Section 3.2 and perform Independent Chain Metropolis-Hastings for the
remaining N2 draws. The denominator is evaluated using θ(i−1), p(θ(i−1)
)and p
(YT | θ(i−1)
), which
are available from the previous iteration. We complete the M-H step by drawing u from the standard
Uniform distribution, U (0, 1). If aMH
(θ∗, θ(i−1)
)≥ u, we set θ(i) = θ∗ and p
(YT | θ(i)
)= p (YT | θ∗),
else we retain θ(i−1) and p(YT | θ(i−1)
). Thereafter, we take another PMMH iteration and move along
the chain. After dropping a set of burn-in samples, the remaining draws are collected for inference.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
57
Table 3.2.: PMMH scheme in Ox
funcMetropolisHastings(const mprior, const vy, const vparamo, const vparamq,
const mcov, const dlogpdfo, const iN, const idumth)
...
// Draw the candidate, vparamp
[vparamp]=funcDrawparam(vparamq,mcov,idumth);
// Evaluate the posterior at the new (dnum) and old (dden) parameter value
[dnum,dden,dlogpdfp]=
funcPosterior(vparamp,vparamo,dlogpdfo,vy,mcov,mprior,iN);
idum=0;
vparamnew=vparamo; // vparamo is the old parameter value
dlogpdfnew=dlogpdfo; // dlogpdfo is the old likelihood value
// Set the new parameters equal to the old
// If dalpha>du they will be replaced
dalpha=min(1,exp(dnum-dden)); // Calculate aMH
du=ranu(1,1); // Draw a random uniform number
if (dalpha>du) // Accept if dalpha>du
vparamnew=vparamp;
dlogpdfnew=dlogpdfp;
idum=1; // Count that the draw was accepted
return vparamnew,dlogpdfnew,idum;
3.2.2. Bayes factors and marginal likelihood computation
The main output from the particle filter is the log-likelihood contribution of yt with αt integrated out.
The sum of the log-likelihood contributions delivers the estimated log-likelihood of the data with the
unobserved states integrated out, p (YT | θ). This quantity can then be used to compute the marginal
likelihood (ML) for model M. ML is a measure of the success the model has in accounting for the
data after the parameter uncertainty has been integrated out over the prior. It is defined as
p (YT | M) =
ˆΘp (YT | θ,M) p (θ | M) dθ.
In the following steps, the model index, M, is suppressed for conciseness. Gelfand and Dey (1994)
propose a very compelling and general method to calculate ML. It is efficient and utilizes the same
routines when calculating ML for different models. The Gelfand-Dey (G-D) estimate of the marginal
likelihood is based on
1
N
N∑i=1
g(θ(i))
p(YT | θ(i)
)p(θ(i)) → p (YT )−1 as N →∞, (3.2.6)
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
58
whereas before, N is the number of PMMH iterations. G-D show that if g(θ(i))
is thin-tailed relative to
p(YT | θ(i)
)p(θ(i))
then (3.2.6) is bounded and the estimator is consistent. Following Geweke (2005),
a (truncated) Normal distribution, N (θ∗,Σ∗), is used for g (θ). θ∗ and Σ∗ are the posterior sample
moments calculated as
θ∗ =1
N
N∑i=1
θ(i) and Σ∗ =1
N
N∑i=1
(θ(i) − θ∗
)(θ(i) − θ∗
)′whenever θ(i) is in the domain of the truncated Normal. The domain, Θ, is defined as
Θ =
θ :(θ(i) − θ∗
)′(Σ∗)−1
(θ(i) − θ∗
)≤ χ2
a (z)
,
where z is the dimension of the parameter vector and χ2a (z) is the ath percentile of the Chi-squared
distribution with z degrees of freedom. In practice, 0.75, 0.95 and 0.99 are popular selections for a.
Once the marginal likelihood for different specifications has been calculated, we can compare them
using Bayes factors, BF. The relative evidence for MA versus MB is
BFMAB=
p (y1, ..., yT | MA)
p (y1, ..., yT | MB).
Kass and Raftery (1995) recommend considering 2 logBFMABfor model comparison. They suggest a
rule-of-thumb of support forMA based on 2 logBFMAB: 0 to 2 not worth more than a bare mention,
2 to 6 positive, 6 to 10 strong, and greater than 10 as very strong.
3.3. Stochastic Volatility Models
In this section, we estimate the standard stochastic volatility (SV) model along with different exten-
sions. The first of these extensions, we label the SVt model. Thus, εt ∼ St (v), where St stands for
the Student-t distribution with v > 2 degrees of freedom. In the second extension, we incorporate
a leverage effect by letting ϕ denote the correlation coefficient between εt and ηt. We shall refer to
this model as SVL. Notice that in both cases only small adjustments are needed in the codes. With
regards to the SVt model, we can follow Bollerslev (1987) and set
p (yt | αt, Yt−1, θ) =Γ(v+1
2
)Γ(v2
)√(v − 2)π
1
σt
(1 +
y2t
(v − 2)σ2t
)−(v+1)/2
.
We then follow the sampling steps as before. On the other hand, if we were to use “pure” Gibbs
sampling to estimate the SVt model, we would be forced to convert the model into a conditionally
Gaussian state-space model by letting εt = λ−1/2t et, where et ∼ N (0, 1) and λt ∼ G (v/2, v/2).
We would then follow the steps in Chib et al. (2002) and sample from the augmented posterior,
p (θ, v, α1:T , z1, ..., zT , λ1, ..., λT | YT ), whereas before, zt serves as the mixture component indicator.
For the SVL model, we need only to rewrite (3.2.2) as
αt+1 = µ+ ρ (αt − µ) + σϕyt exp (−αt/2) + σ√
(1− ϕ2)ξt.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
59
Here, we use that ηt = ϕεt +√
(1− ϕ2)ξt, where ξt ∼ N (0, 1), and we need to sample an additional
parameter, ϕ. We choose the same priors as in Section 3.2 for µ, ρ and σ2. With regards to the
additional parameters, we let v ∼ Exp (0.2), where Exp denotes the Exponential distribution, and we
let ϕ ∼ TN]−1,1[ (0, 1), where TN refers to the truncated Normal distribution on the domain [., .]. The
prior on ϕ assumes that ϕ lies between −1 and 1. Furthermore, we ensure that∣∣ϕ(i)
∣∣ < 1(v(i) > 2
)by resampling ϕ(i)
(v(i))
until∣∣ϕ(i)
∣∣ < 1(v(i) > 2
).
We can also expand the plain stochastic volatility model by allowing the errors in the measurement
equation to follow a moving average (MA) process of order m, see Chan (2013). This means that the
errors in the measurement equation are no longer serially independent as for the plain SV model. In
this paper we choose a simpler specification and set m = 1. Hence, our model is given as
yt = eαt/2εt + ψ1eαt−1/2εt−1, εt ∼ N (0, 1) ,
where αt follows (3.2.2). As before, we impose that |ρ| < 1 and α1 ∼ N(µ, σ2/
(1− ρ2
)). We also
ensure that the root of the characteristic polynomial associated with the MA coefficient, ψ1, is outside
the unit circle. Notice that under the standard stochastic volatility model, the conditional variance of
yt is simply eαt . However, under the moving average variant, the conditional variance of yt is given by
eαt +ψ21eαt−1 , see Chan (2013). Thus, the conditional variance for SV-MA(1) is time-varying through
two channels: the moving average term of eαt and αt which evolves according to (3.2.2). Estimating
this model is straightforward as we again only need to make small adjustments in the codes. With
regards to ψ1, we let p (ψ1) ∼ TN]−1,1[ (0, 1).
The flexibility of PMCMC, specifically PMMH can be used to model other attractive specifications
of the stochastic volatility model. For instance, consider the stochastic volatility in mean (SVM)
model of Koopman and Hol Uspensky (2002), where eαt/2 appears in both the conditional mean and
the conditional volatility. We define the SVM model as
yt = λ exp (αt/2) + exp (αt/2) εt, εt ∼ N (0, 1) ,
where αt follows (3.2.2). Estimation of this extension is nontrivial using “pure” Gibbs sampling.
This is because drawing α1:T ∼ p (α1:T | θ, λ, YT ) is computationally more demanding since the model
cannot be written in linear state-space form. However, within the PMMH context, estimating the
SVM model is straightforward. In fact, we note that p (yt | αt, Yt−1, θ, λ) ∼ N (λ exp (αt/2) , exp (αt)).
Incorporating this specification is very easy in the particle filter. We only need to modify step 2 of the
algorithm. Thus, we use τ(j)t = N
(λ exp
(α
(j)t /2
), exp
(α
(j)t
)), j = 1, ...,M . Furthermore, we sample
an additional parameter in the M-H step, namely, λ, where p (λ) ∼ N (0, 1). We present simulation
results for SV-MA(1) and SVM in the Appendix. Finally, we estimate a two factor SV model, TFSV.
yt = exp
(µ
2+αt + α2t
2
)εt, εt ∼ N (0, 1)
αt+1 = ραt + σηt, ηt ∼ N (0, 1)
α2t+1 = ρ2α2t + σ2η2t, η2t ∼ N (0, 1)
|ρ| < 1 , α1 ∼ N(0, σ2/
(1− ρ2
))|ρ2| < 1 , α21 ∼ N
(0, σ2
2/(1− ρ2
2
)).
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
60
Estimating the two factor SV model using PMMH is straightforward. First, we collect all the pa-
rameters in θ =(µ, ρ, ρ2, σ
2, σ22
)′. Then, we only need to modify our particle filter such that we
draw two sets of particles (one for αt and one for α2t) instead of one. This part is also very easy
and does not cost anything in terms of computation. We specify the random walk increments in the
M-H part as 4µ(i)= 0.3162ε
(i)1 , 4σ2(i) = 0.01ε
(i)2 and 4σ2(i)
2 = 0.03ε(i)3 . With regards to sampling ρ
and ρ2, we follow Fouque et al. (2010). First, we put the restriction ρ(i) > ρ(i)2 . This restriction is
needed for identification. We then draw ρ and ρ2 from the truncated Normal density. For instance,
ρ∗ ∼ TN]−1,1[
(b, B−1
), where b = (1/B) ΣT
t=1α(i−1)
t α(i−1)
t−1 and B = ΣTt=1α
2(i−1)
t−15. Simultaneous with ρ∗
and ρ∗2, we generate µ∗, σ2∗ and σ2∗2 . Thereafter, we accept or reject θ∗. We specify the priors as
µ ∼ N (0, 1) ,
ρ ∼ TN]−1,1[ (0, 1) , ρ2 ∼ TN]−1,1[ (0, 1)
σ2 ∼ IG(
4
2,0.02
2
), σ2
2 ∼ IG(
4
2,0.02
2
).
The top panel of Figure 3.1 displays the daily OMX Copenhagen 20 (OMXC20) index for the period
1/2/2006-12/30/2010, followed by the daily returns and the filtered estimates of σt = exp (αt/2),
t = 1, ..., T for SVL. From these figures strong differences in return and volatility are immediately
apparent6. For instance, the top left panel shows a sharp decrease in the OMXC20 index towards the
end of 2008. At the same time, the third panel in the top row of Figure 3.1 shows an increase in σt.
We set M = 1000, and run our samplers for N = 20000 M-H iterations7. After discarding the first
10000 iterations, we collect the final sample and compute: the posterior mean of θ, θ , 95% credibility
intervals (indicated inside the brackets), the inefficiency measures, RB, the log-likelihood that results
from the particle filter, the logarithm of the marginal likelihood, log(ML), for a = 0.75, a = 0.99 and
the M-H acceptance ratio. The inefficiency measures display the relative variance of the posterior
sample draws when adapting for correlation between iterations, as compared to the variance without
accounting for correlation. In these calculations, we follow Bos (2011), choosing a bandwidth, B, of
100, see Kim et al. (1998).
Results are summarized in Table 3.3. Overall, we find that the Gaussian SVL model performs best in
terms of the marginal likelihood criterion. The 2logBF of SVL versus SV is 7.8. This indicates strong
evidence in favor of the SVL model8. Compared to SVt, the 2logBF in favor of SVL is 12.8, which
is also very strong evidence. The values of ρ close to one confirm strong daily volatility persistence,
in accordance with typical estimates reported in the literature. Notice that the persistence increases
slightly as the fat-tailed error distribution is introduced (ρ = 0.9810 and ρ = 0.9852 for SV and
SVt, respectively), and drops from ρ = 0.9810 for SV to ρ = 0.9777 for SVL. In the SVt model, the
distribution of the degrees of freedom parameter is centered around 14.20 with a standard deviation of
2.28. The posterior mean of ϕ for the SVL model is −0.22, and negative as expected. We also find that
5We could also choose a Beta (., .) prior for ρ and ρ2. However, we find that ρ2 is very sensitive with regards to thehyperparameter values of Beta (., .). Therefore, we choose a more uninformative prior, i.e. TN]−1,1[ (0, 1).
6OMX Copenhagen 20 (OMXC20) is the Copenhagen Stock Exchange’s leading share index. The index consists of the20 most actively traded shares on the Copenhagen Stock Exchange.
7We experiment with different values of M to find out its effect by examining the corresponding Markov chain, see theAppendix. We can also set M to obtain a specified level of the variance of p (YT | θ) for a given θ.
8An advantage of using Bayes factors is that they automatically include Occam’s razor in that they penalize highlyparametrized models that do not deliver improved content.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
61
Tab
le3.
3.:
Est
imat
ion
resu
lts
ofS
Vm
od
els
SV
SV
tSV
LS
V-M
A(1
)SV
MT
FSV
Para
met
erθ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)
µ0.3
526
(7.5
4)0.2
562
(7.3
8)0.
4221
(6.8
9)0.
3779
(7.7
8)0.
3754
(7.4
3)0.
3598
(6.2
3)[0
.111
2,0
.589
5]
[-0.
0482
,0.5
508]
[0.2
046,
0.64
36]
[0.0
831,
0.67
03]
[0.0
685,
0.67
08]
[0.0
821,
0.63
32]
ρ0.
9810
(8.2
7)0.
9852
(5.3
3)0.
9777
(6.9
7)0.
9817
(8.8
6)0.
9805
(9.6
3)0.
9776
(7.1
3)[0
.9743
,0.9
887
][0
.978
3,0.
9914
][0
.971
6,0.
9838
][0
.973
0,0.
9893
][0
.972
7,0.
9882
][0
.967
1,0.
9881
]ρ
20.
0458
(10.
66)
[-0.
0642
,0.1
576]
σ2
0.0296
(9.1
3)0.
0221
(6.1
1)0.
0319
(7.3
6)0.
0285
(7.9
7)0.
0311
(9.1
6)0.
0373
(6.5
9)[0
.0199
,0.0
394
][0
.014
9,0.
0297
][0
.023
1,0.
0401
][0
.018
3,0.
0395
][0
.020
3,0.
0421
][0
.023
6,0.
0512
]σ
2 20.
0981
(6.3
1)[0
.027
6,0.
1728
]ϕ
-0.2
216
(7.0
4)[-
0.32
02,-
0.12
26]
ψ1
0.00
45(7
.63)
[-0.
0296
,0.0
376]
λ0.
0563
(6.3
7)[0
.022
3,0.
0892
]v
14.2
019
(8.0
6)[9
.959
1,18
.871
2]
log(
L)
-211
3.6
-211
4.2
-210
6.1
-211
3.4
-211
2.1
-211
3.2
log(
ML
),-2
125.
4-2
127.
9-2
121.
5-2
129.
1-2
128.
0-2
186.
3a
=0.
75
log(
ML
),-2
125.
1-2
127.
6-2
121.
2-2
129.
1-2
127.
7-2
186.
1a
=0.
99
M-H
rati
o0.
350.
360.
350.
330.
330.
34
This
table
rep
ort
ses
tim
ati
on
resu
lts
for
diff
eren
tst
och
ast
icvola
tility
model
spec
ifica
tions.RB
isin
dic
ate
din
side
the
pare
nth
eses
.lo
g(L
):lo
g-l
ikel
ihood,
log(M
L):
log-m
arg
inal
likel
ihood
usi
ng
the
corr
esp
ondin
gva
lue
ofa.
M-H
rati
o:
Met
rop
olis-
Hast
ings
acc
epta
nce
rati
o.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
62
ψ1 = 0.0045 for the SV-MA(1) model. Compared to SV-MA(1), the 2logBF in favor of SVL is 15.8.
On the other hand, we estimate λ at 0.0563 with a standard deviation of 0.0168. However, compared
to SV or SVL, SVM does not offer any improvements in terms of ML. Finally, TFSV performs worst
in terms of ML. The estimates of µ, ρ and σ2 are relatively close to those for the plain SV model. On
the other hand, the estimates of ρ2 and σ22 show that the second factor, α2t, is very close to being a
white noise process.
We report the Markov chain output for µ | YT , ρ | YT , σ2 | YT , ϕ | YT along with posterior density
and the evolution of ACF for these parameters in Figure 3.1. The chain mixes well with relatively fast
decaying autocorrelation functions.
Figure 3.1.: Estimation results, Gaussian SVL model
OMXC20
2006 2008 2010 2012200
400
600OMXC20 Daily returns
2006 2008 2010 2012−20
0
20Daily returns E(σt|y1,...,yT)
2006 2008 2010 20120
2
4
6E(σt|y1,...,yT)
Loglike (PF)
0 2500 5000 750010000−2175
−2150
−2125
−2100
Loglike (PF)
µ |YT
0 2500 5000 750010000−0.5
0.5
1.5µ |YT
ρ |YT
0 2500 5000 7500100000.90
0.95
1.00
ρ |YT
σ2|YT
0 2500 5000 7500100000.00
0.04
0.08σ2|YT ψ |YT
0 2500 5000 750010000−0.5
0.0
0.5ψ |YT
µ |YT
−0.5 0.5 1.50
1
2
3 µ |YT ρ |YT
0.90 0.95 1.000
50
100
150ρ |YT σ2|YT
0.00 0.04 0.08
25
50
75σ2|YT ψ |YT
−0.5 0.0 0.5
2.5
5.0
7.5ψ |YT
µ
0 100 200 300 400 500−1
0
1µ ρ
0 100 200 300 400 500−1
0
1ρ σ2
0 100 200 300 400 500−1
0
1σ2 ψ
0 100 200 300 400 500−1
0
1ψ
Markov chain, posterior density and autocorrelation function of the parameters for the Gaussian SVL model. Notice thatthe first 10000 iterations are considered as the burn-in period and are therefore discarded. Note: for graphical output,the professional version of Ox is needed. Alternatively, the updated GnuDraw package of Bos (2013) can be used.
3.4. Unobserved Components Model of US Inflation
In this section, we reconsider the unobserved components model of Stock and Watson (2007), and
provide a computational framework for this model using PMMH. This model provides a simple but yet
sufficient framework for discussing the main stylized facts concerning inflation. Specifically, the model
postulates the decomposition of observed inflation into two components: the regular component, which
captures the trend in inflation, and the irregular component, which captures the deviations of inflation
from its trend value. We start from a specification where components are driven by disturbances whose
variance are constant over time. Thereafter, we consider specifications in which the components are
driven by disturbances whose variance evolves over time according to a stationary stochastic volatility
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
63
process. Finally, we carry out systematic model selection by comparing the marginal likelihoods
implied by the different models of inflation volatility.
We focus on quarterly inflation rates constructed from the seasonally adjusted consumer price index,
made available by FRED (Federal Reserve Economic Data). We denote the quarterly series by CPIt.
The annualized quarterly inflation rate, yt, t = 1, ..., T is computed as yt = 400 ln(
CPItCPIt−1
). For the
analysis, we use data from 1952:q1-2013:q1. In the following, the most general specification of the
unobserved component (UC) model is defined as
yt = αt + εt, εt ∼ N(0, σ2
ε
)(3.4.1)
αt+1 = αt + ηt, ηt ∼ N(0, σ2
η
), (3.4.2)
see Grassi and Proietti (2010). This model contains two parameters, θ =(σ2ε , σ
2η
)′along with a vector
of the unobserved states, α1:T . We let p(σ2ε
)∼ IG (4/2, 0.02/2) and p
(σ2η
)∼ IG (4/2, 0.02/2). We
also provide extensions of (3.4.1) and (3.4.2) by incorporating stochastic volatility effects in σ2ε , or
both in σ2ε and σ2
η. First, let εt ∼ N(0, eh1t
), where
h1t+1 = µ1 + ρ1 (h1t − µ1) + σ1η1t, η1t ∼ N (0, 1) .
Hence, we add a second unobserved state which describes the evolution of the volatility to the irregular
component of inflation. We shall refer to this model as UC-SVm. Finally, we add a third unobserved
state which describes the volatility of αt, i.e. ηt ∼ N(0, eh2t
). Henceforth, we refer to this model
as UC-SV9. UC-SV and UC-SVm both have a special structure. For instance, for the UC-SV model,
conditional on hkt, k = 1, 2, the remaining model is a linear Gaussian state-space model where α1:T
can be integrated out analytically using the Kalman filter. This is known as Rao-Blackwellization
in the literature because it is an implication of the Rao-Blackwell theorem, see Robert and Casella
(2004). When this is possible, the state vectors can be separated. Particles are only simulated for
h(j)kt , k = 1, 2, and conditional on these αt, t = 1, ..., T can be integrated out analytically using the
prediction and updating steps of the Kalman filter.
In the PMMH procedure, we modify our particle filter using the approach of Creal (2012). There-
after, we sample θ in one block. On the other hand, estimating UC-SV using “pure” Gibbs sampling
would require more programing effort. For instance, using a Gibbs sampling approach, we would
proceed by cycling through the following steps
1. α1:T | h1,1:T , h2,1:T , YT .
2. h1,1:T | µ1, ρ1, σ21, α1:T , z1,1:T , YT , where hk,1:T = hk1, ..., hkT .
3. h2,1:T | µ2, ρ2, σ22, α1:T , z2,1:T .
4. z1,1:T | h1,1:T , α1:T , YT .
5. z2,1:T | h2,1:T , α1:T , see Kim et al. (1998).
6. Finally, we sample θ element-by-element, see Kim et al. (1998), Grassi and Proietti (2010).
9The specification of the stochastic volatility processes differs only slightly from Stock and Watson (2007) who assumea random walk process for hkt, k = 1, 2. In this paper we follow Grassi and Proietti (2010).
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
64
Below, we provide the steps of the modified particle filter for the UC-SV model, see Creal (2012) for
further background on Rao-Blackwellization.
1. For t = 1, draw α(1)1 , ..., α
(M)1 , P
(1)1 , ..., P
(M)1 , where Pt is the covariance of αt and is obtained
from the Kalman filter, see Kim and Nelson (1999). Draw h(1)k1 , ..., h
(M)k1 for k = 1, 2 and set
τ(1)1 , ..., τ
(M)1 = 1/M .
2. for t = 2, ..., T , use the prediction step of the Kalman filter. Thereafter, obtain prediction errors
and variances,v
(j)t , f
(j)t
Mj=1
.
3. Compute τ(j)t ∼ N
(v
(j)t , f
(j)t
)and normalize w
(j)t = τ
(j)t /ΣM
k=1τ(k)t .
4. ResampleM particlesα
(j)t , P
(j)t , h
(j)1t , h
(j)2t
Mj=1
with probabilitiesw
(j)t
Mj=1
and set w(j)t = 1/M .
5. Draw h(1)kt , ..., h
(M)kt , k = 1, 2, and run the updating step of the Kalman filter on each of these
particles to obtainα
(j)t , P
(j)t
Mj=1
.
We compare the performance of UC, UC-SVm and UC-SV using the marginal likelihoods. Results
reported in Table 3.4 point that the UC-SV model performs best.
Table 3.4.: Estimation results of unobserved components (UC) models
UC UC-SVm UC-SVParameter θ RB θ RB θ RBµ1 0.0602 4.08 -0.0764 8.47
[-0.4104, 0.5197] [-0.3733,0.2223]µ2 -0.5886 4.95
[-0.9465,-0.2391]ρ1 0.9456 4.54 0.9541 7.84
[0.9067,0.9792] [0.9278,0.9779]ρ2 0.9815 7.21
[0.9676,0.9938]σ2ε 3.2280 5.09
[2.9026,3.5680]σ2
1 0.1153 3.93 0.0844 8.97[0.0408,0.2215] [0.0383,0.1406]
σ22 2.0021 4.57 0.5090 3.59 0.0141 7.75
[1.2772,2.7866] [0.3449,0.6826] [0.0051,0.0255]
log(L) -592.28 -469.35 -465.07log(ML), a = 0.75 -638.30 -497.77 -484.86log(ML), a = 0.99 -638.03 -497.50 -484.58M-H ratio 0.42 0.48 0.40
This table reports estimation results for different UC models. log(L): log-likelihood, log(ML): log-marginal likelihoodusing the corresponding value of a. M-H ratio: Metropolis-Hastings acceptance ratio. Note σ2
2 = σ2η for UC and UC-SVm.
The filtered estimate of α1:T and the filtered estimates of the volatilities are all available from the
particle filter. They are pictured in Figure 3.2, together with yt. These estimates confirm largely the
results of Stock and Watson (2007) and Grassi and Proietti (2010). The volatility of the irregular
component, exp (h1t/2), increases during the high periods of inflation in the 1970s, while the volatility
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
65
of the regular component, exp (h2t/2), is relatively more stable. Specifically, exp (h2t/2) has been
decreasing substantially after 1982. The decrease in exp (hkt/2), k = 1, 2, since the early 1980s
and throughout the 1990s has been documented in a range of studies. It is often labeled as “the
Great Moderation”. Furthermore, exp (h1t/2) shows that the increase in volatility of the inflation rate
during the last recession is mainly concentrated in the irregular component. Finally, we find that
incorporating the steps in So et al. (2005) contribute to reducing RB significantly. Specifically, RB for
each parameter drops by a factor of 2 to 3. For instance, compare PMMH results of this section with
PG results of Section 3.6, where θ is sampled element-by-element using standard Gibbs techniques.
Figure 3.2.: Estimation results, UC-SV model
Inflation Filtered trend
1952 1962 1972 1982 1992 2002 2012−16
−6
4
14
24Inflation Filtered trend 1st difference of inflation
1952 1962 1972 1982 1992 2002 2012−16
−6
4
14
241st difference of inflation
Irregular volatility
1952 1962 1972 1982 1992 2002 20120
2
4
6
8Irregular volatility Regular volatility
1952 1962 1972 1982 1992 2002 20120.0
0.4
0.8
1.2
1.6Regular volatility
Top panels: inflation, filtered estimates of the trend inflation and the 1st difference of inflation. Bottom panels: volatilityof the irregular and regular components of inflation.
3.5. Long Memory with Stochastic Volatility
In this section, we model changing time series characteristics of US inflation rate by considering a
heteroskedastic ARFIMA model similar to Bos et al. (2012). However, Bos et al. (2012) apply
maximum likelihood methods, whereas we use PMCMC. To the best of our knowledge, no work has
been done on estimating long-memory models with SV effects using PMMH. Our ARFIMA(p,d,q)-SV
model for the time series, yt, is given by
Φ (L) (1− L)d (yt − τt) = Ψ (L)σεtεt, εt ∼ N (0, 1) (3.5.1)
σεt = exp (αt/2)
αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (3.5.2)
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
66
where Φ (L) = (1− φ1L− ...− φpLp), Ψ (L) = (1 + ψ1L+ ...+ ψqLq) are autoregressive (AR), moving
average (MA) polynomials in the lag operator, L, with Lkyt = yt−k for k = 0, 1, ... and integer
orders p ≥ 0, q ≥ 0. Initially, the disturbance term, εt, is normally and identically distributed
with expectation 0 and variance 1. However, we also consider a case where εt follows a heavy-tailed
distribution, see Section 3.5.2. The fractional difference operator, (1− L)d, with d ∈ R is given by
(1− L)d =∞∑j=0
(d
j
)(−L)j .
Here, we assume that −1 < d < 0.5. We further assume that the equations Ψ (x) = 0 and Φ (x) = 0
for unknown x do not have common roots. In the standard ARFIMA model, the means and variances
are constant through time, that is τt = τ and σ2εt = σ2
ε for t = 1, ..., T . In this paper we assume
specific time-varying functions for τt and σ2εt. We specify τt as a regression model to capture seasonal
variation in the time series through autoregressive coefficients. Hence, we set τt = X′tβ, where Xt is
a m× 1 vector of lagged values of yt, and β is a m× 1 vector of unknown autoregressive coefficients.
As stated before, we model σεt in (3.5.1) as a SV process. Hence, we let σεt = exp (αt/2), where αt
follows (3.5.2).
3.5.1. Monte Carlo
We conduct a set of Monte Carlo simulations to judge the performance of PMMH for the ARFIMA-
SV model. We successively generate data from the ARFIMA-SV model, re-estimate and compare the
estimated parameters with the parameters under the data generating process, DGP.
We focus on the ARFIMA(p,d,0 )-SV model and start with a data series of T = 500 observa-
tions which is in line with the actual data in the empirical application. As we are mostly inter-
ested in the ability of PMMH to estimate the persistence parameter of inflation, d, we focus on the
ARFIMA(1,d,0 )-SV model. We follow Bos et al. (2012), and set the true DGP parameters for yt as:
d = 0.3 and φ = 0.4. The mean of the stochastic volatility sequence is set to µ = −1.2. The conditional
variance of the volatility of yt is set to σ2 = 0.05 and the persistence parameter of volatility, ρ, is set
to 0.97.
For each DGP, we first generate αt according to SV dynamics with α1 ∼ N(µ, σ2/
(1− ρ2
)).
Thereafter, we define εt ∼ N(0, σ2
εt
), and generate yt through ARFIMA dynamics using d and φ.
For each of the R = 100 Monte Carlo iterations, we estimate the ARFIMA-SV model using 10000
M-H iterations with a burn-in of 1000, M = 1000, conditional on the following priors for the model
parameters
d ∼ TN]−1,0.5[ (0, 1) , φ ∼ TN]−1,1[ (0, 1)
µ ∼ N (0, 1) ,ρ+ 1
2∼ Beta (20, 1.5)
σ2 ∼ IG(
4
2,0.02
2
).
Given a full run, we calculate the mean, median and mode of the posterior draws, θ(i), i = 1, ...N . We
then take the mean of these quantities over R. Finally, we also consider the root mean squared error
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
67
(RMSE) for each parameter defined as
RMSE =
√√√√ 1
R
R∑h=1
(1
N
N∑i=1
θ(i)h − θ
)2
.
θ(i)h is the ith posterior draw of the hth Monte Carlo iteration and θ is the vector of the true DGP
parameters. Results are summarized in Table 3.5. The mean, median and mode of d, φ, µ , σ2 and
d+φ are close to their values corresponding to the true DGP parameters. As we increase T , we obtain
more precise estimates of the parameters. Compared to T = 500, the RMSE of each parameter drops
on average by 10 − 40% for T = 1000. From the simulations it is very clear that d and φ are very
related. The correlation between d and φ is on average −0.75. This shows that distinguishing between
short-run and long-run correlation, as modeled by either d and φ, is inherently difficult, see Sowell
(1992) and Bos et al. (2012). We repeat this simulation with T = 200 to see if a relatively smaller
sample size leads to more disperse estimates. Still, we find that d + φ is well identified. On average,
d + φ = 0.68 with similar results for median and mode. However, we find that it becomes relatively
harder to handle d and φ separately. Furthermore, we are able to get decent estimates of µ and ρ,
while σ2 displays high variability between simulations. Compared to T = 500, the RMSE of σ2 is 0.03.
Table 3.5.: Monte Carlo evidence for ARFIMA-SV
Parameter true value mean median mode RMSE
T = 200d 0.30 0.2111 0.2245 0.2011 0.1590φ 0.40 0.4660 0.4578 0.4653 0.1511µ -1.20 -0.8008 -0.8551 -0.6575 0.6531ρ 0.97 0.9522 0.9601 0.9639 0.0528σ2 0.05 0.0223 0.0183 0.0123 0.0308d+ φ 0.70 0.6771 0.6772 0.6664 0.0796
T = 500d 0.30 0.2675 0.2751 0.2403 0.1017φ 0.40 0.4403 0.4334 0.4570 0.1122µ -1.20 -0.9761 -1.0248 -0.9559 0.4214ρ 0.97 0.9715 0.9742 0.9761 0.0123σ2 0.05 0.0356 0.0322 0.0259 0.0210d+ φ 0.70 0.7077 0.7079 0.7035 0.0444
T = 1000d 0.30 0.2906 0.2931 0.2823 0.0675φ 0.40 0.4136 0.4114 0.4202 0.0862µ -1.20 -1.1917 -1.1166 -1.1121 0.2502ρ 0.97 0.9736 0.9749 0.9760 0.0107σ2 0.05 0.0412 0.0388 0.0341 0.0163d+ φ 0.70 0.7043 0.7047 0.7029 0.0378
This table reports the true values of the DGP together with mean, median, mode
and root-mean-squared error (RMSE) of the estimated parameters for the Monte
Carlo simulations, generated from ARFIMA(1,d,0)-SV with model parameters as
indicated using samples of T = 200, T = 500 and T = 1000.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
68
Table 3.6.: Model comparison by marginal likelihood
G-D(a = 0.75)
G-D(a = 0.95)
G-D(a = 0.99)
T = 200M1 M2 M1 M2 M1 M2
0 100 0 100 0 100(28.07) (28.16) (28.09)
T = 500M1 M2 M1 M2 M1 M2
0 100 0 100 0 100(67.72) (67.75) (67.69)
T = 1000M1 M2 M1 M2 M1 M2
0 100 0 100 0 100(142.11) (142.17) (142.03)
This table reports the number of times in the repetitions when each model
obtains the highest ML. In each case, the data is generated according to
ARFIMA(1,d,0)-SV. M1 : ARFIMA(1,d,0),M2 : ARFIMA(1,d,0)-SV. a
is the ath percentile of the Chi-squared distribution used to calculate ML.
The numbers in the parentheses indicate the average logBF in favor of
ARFIMA(1,d,0)-SV over the number of repetitions.
Finally, we also investigate the ability of G-D to identify the true model. Therefore, for each Monte
Carlo iteration, we also estimate a homoscedastic ARFIMA(1,d,0) model using “pure” Gibbs sampling
and compare the MLs10. We report the frequency over repetitions in which each specification is best
according to ML using a = 0.75, 0.95 and 0.99. Results are summarized in Table 3.6.
Overall, we see that G-D performs very well. For each T , G-D identifies ARFIMA-SV as the best
performing model 100 out of 100 times. The numbers inside the parentheses report the average logBF
of ARFIMA-SV versus ARFIMA over the 100 Monte Carlo iterations for each a. These numbers
are fairly similar across different values of a. As expected, the logBFs clearly indicate a better fit of
ARFIMA-SV compared to ARFIMA.
3.5.2. US core inflation
We apply our model to a monthly time series of inflation, using the US City Average core consumer
price index of the Bureau of Labor Statistics (BLS). This series, which is labeled CUUR0000SA0L1E,
excludes the direct effects of price changes for food and energy. We denote the series by Pt and use
data from 1957:1 until 2013:5 for a total of 676 months11. Following Bos et al. (2012), we construct the
monthly US core inflation as pt = 100 log (Pt/Pt−1). To adapt for part of the seasonality in the series,
we regress the inflation on a series of seasonal dummies, D, as in p = Dβ + u. Instead of using the
original inflation, pt, we use yt = ut+p, where ut is the residual of adapting the inflation for the major
10Throughout this paper, we estimate the homoscedastic ARFIMA specifications using “pure” Gibbs sampling.11Our sample is considerably longer than the sample used in Bos et al. (2012). We choose the longer sample period,
1957:1-2013:5, mainly because it provides us with what we think is a sufficient number of observations for each window(265 months) with regards to forecasting and recursive analysis.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
69
seasonal effects at time t, and p is the average inflation level, Bos et al. (2012). Figure 3.3 shows time
series plots of the price index, Pt, yt, the sample autocorrelation functions of yt and changes in yt, ∆yt.
Clearly, time variation in the mean and volatility are apparent from the time series plots of yt. The
autocorrelation function shows strong persistence and seasonal patterns even after regressing on the
seasonal dummies. Therefore, we also include seasonal effects in our models through AR coefficients.
As previously mentioned, yt exhibits time-variation in its conditional mean and volatility. Intuitively,
the 1970s show the highest level, volatility, persistent in the mean and volatility of inflation. The
reduction in the volatility of inflation since the mid 1980s is very noticeable. There are also outliers
in yt, for example 1980:7. We use a dummy variable for this month in the estimation procedure, Bos
et al. (2012).
Figure 3.3.: Time series characteristics of the US core inflation rate
1957 1969 1981 1993 20050
80
160
240
320(a)
1957 1969 1981 1993 2005−0.5
0.0
0.5
1.0
1.5(b)
0 20 40 60 80 100
0.25
0.50
0.75
1.00(c)
0 20 40 60 80 100−1.0
−0.5
0.0
0.5
1.0(d)
Panel (a): monthly time series of US core price index, Pt (1982-84=100), (b): monthly core inflation adjusted for fixedseasonals, yt, (c): sample autocorrelation function of yt (100 lags), (d): sample autocorrelation function of 4yt.
Besides ARFIMA-SV, we also provide an extension by assuming that εt follows a heavy-tailed distribu-
tion. We model this feature by assuming that εt ∼ St (v), whereas before, St stands for the Student-t
distribution with v > 2 degrees of freedom. We label this model as ARFIMA-SVt. As before, we
set M = 1000, and run the sampler for N = 20000 M-H iterations. After discarding the first 10000
iterations, we collect the final sample and compute: the posterior mean of θ, 95% credibility intervals,
RBs, the log-likelihood that results from the particle filter, the logarithm of the marginal likelihood,
log(ML) for a = 0.75, a = 0.99, and the M-H acceptance ratio. Table 3.7 presents estimation results
for: a homoscedastic ARFIMA model, ARFIMA-SV and ARFIMA-SVt. We use the same priors as
in Section 3.5.1 for d, µ, ρ and σ2. With regards to the additional parameter in the ARFIMA-SVt
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
70
Table 3.7.: Estimation results of ARFIMA models
ARFIMA ARFIMA-SV ARFIMA-SVtParameter θ RB θ RB θ RBd 0.3799 8.14 0.2931 7.04 0.2946 6.28
[0.3209,0.4407] [0.2549,0.3297] [0.2631,0.3251]φ11 0.1305 22.32 0.1446 5.04 0.1459 6.51
[0.0655,0.1980] [0.1064,0.1810] [0.1157,0.1770]φ12 0.2542 21.70 0.3791 5.65 0.3747 5.83
[0.1861,0.3213] [0.3407,0.4172] [0.3447,0.4060β0 0.1639 7.12 0.2252 5.63 0.2169 6.83
[0.0278,0.2949] [0.1409,0.3101] [0.1443,0.2904]β(i80:7) -1.0294 9.77 -1.1333 6.06 -1.1113 6.13
[-1.3461,-0.7176] [-1.3727,-0.8897] [-1.3058,-0.9236]µ -3.4733 6.92 -3.4761 7.58
[-3.8827,-3.0422] [-3.8589,-3.0660]ρ 0.9845 7.55 0.9837 8.48
[0.9766,0.9898] [0.9748,0.9896]σ2 0.0191 6.94 0.0188 8.92
[0.0129,0.0268] [0.0127,0.0256]σ2ε 0.0318 1.39
[0.0285,0.0354]v 21.9569 7.68
[14.4626,29.9132]
log(L) 201.93 264.53 261.79log(ML), a = 0.75 183.59 236.13 228.91log(ML), a = 0.99 183.54 236.38 229.17M-H ratio 0.38 0.38
This table reports estimation results for different ARFIMA(12,d,0) type models. β(i80:7) is included in the mean equation.log(L): log-likelihood, log(ML): log-marginal likelihood using the corresponding value of a. M-H ratio: Metropolis-Hastings acceptance ratio.
model, we let p (v) ∼ Exp (5). Furthermore, we experiment with different AR lags. We find φ11 and
φ12 to be very significant. Thus, we let p (φ1) ∼ N (0, 1) , ..., p (φ12) ∼ N (0, 1), and employ rejection-
sampling to ensure that the roots of(
1− φ(i)1 L− ...− φ(i)
12L12)
= 0 lie outside the unit circle. However,
φ1, ..., φ10 are not significant in our applications and are therefore fixed at zero.
We find that the Gaussian ARFIMA-SV model performs best in terms of the marginal likelihood
criterion. In fact, 2logBF in favor of ARFIMA-SV versus ARFIMA is 105.68, which is interpreted as
a very significant improvement. In the ARFIMA-SVt model, the distribution of v is centered around
22 with a standard deviation of 3.94. However, compared to ARFIMA-SVt, the marginal likelihood
indicates strong evidence in favor of the ARFIMA-SV model. For the ARFIMA model, the order of
integration, d, is estimated at 0.38. This implies that US core inflation exhibits long-memory behavior.
φ12 captures the main seasonal effects. The mean inflation, β0, is estimated at 0.16%. The residual
standard error, σε, is large at 0.18% per month. The inflation rate in 1980:7 is a negative additive
outlier and very significant. When we compare ARFIMA with the ARFIMA-SV model, we find that
d drops from 0.38 to 0.29. Furthermore, φ11, φ12 increase from 0.13 and 0.25 to 0.15 and 0.38. The
estimate of β0 is also affected, being more precisely estimated at a slightly higher value. The SV
component itself is nearly nonstationary as the autoregressive coefficient of volatility, ρ, is close to
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
71
one, and the conditional volatility of volatility, σ, is well identified at 0.15. The average volatility,
exp (µ/2), is at 0.17% per month. We plot the filtered estimates of σεt, t = 1, ..., T in the top left panel
of Figure 3.4. The volatility decrease since the early 1980s is noticeable and persistent. As stated
earlier, this period is often referred to as “the Great Moderation”.
Table 3.8.: ARFIMA-SV, ARIMA-SV and IMA-SV results for 4ytARFIMA-SV ARIMA-SV IMA-SV
Parameter θ RB θ RB θ RBd− 1 -0.6615 6.34
[-0.6866,-0.6376]ψ1 -0.8422 4.14 -0.8374 3.20
[-0.8701,-0.8148] [-0.8593,-0.8141]φ11 0.1506 6.87 0.1021 4.13
[0.1251,0.1768] [0.0768,0.1268]φ12 0.3808 6.61 0.3494 4.19
[0.3540,0.4089] [0.3237,0.3732]β(i80:7) -1.0278 5.50 -1.2310 4.87 -1.0449 3.72
[-1.1657,-0.8890] [-1.3352,-1.1250] [-1.2352,-0.8491]β(i80:8) 1.0356 4.75 0.9648 4.72 0.7907 4.33
[0.8930,1.1806] [0.8477,1.0846] [0.5908,0.9863]µ -3.5039 7.12 -3.3937 3.58 -3.4204 4.26
[-3.8044,-3.2207] [-3.7573,-3.0070] [-3.6665,-3.1639]ρ 0.9849 10.11 0.9807 4.55 0.9746 3.87
[0.9799,0.9895] [0.9702,0.9892 ] [0.9629,0.9853]σ2 0.0181 7.66 0.0249 4.25 0.0242 3.49
[0.0126,0.0235] [0.0169,0.0341] [0.0179,0.0308]
log(L) 260.22 256.57 208.36log(ML), a = 0.75 228.76 223.90 185.31log(ML), a = 0.99 229.01 224.16 185.56M-H ratio 0.39 0.40 0.40
This table reports estimation results for: ARFIMA(12,d,0)-SV, ARIMA(12,0,1)-SV and IMA(0,1)-SV. β(i80:7) andβ(i80:8) are included in the mean equation. log(L): log-likelihood, log(ML): log-marginal likelihood using the correspondingvalue of a. M-H ratio: Metropolis-Hastings acceptance ratio.
We follow Bos et al. (2012), and compare the estimates of the ARFIMA-SV model with three other
specifications. First, we present estimates for the same ARFIMA-SV model as in Table 3.7, but now
for changes in inflation rate, ∆yt. Hence, in this specification, we estimate d − 1. Furthermore, β0
drops out of the model. With regards to β80:7, we choose to separate the dummy variable and its lag.
The second column of Table 3.8 reports results for this model. We find that d−1 equals −0.66, nearly
equivalent to d = 0.29 for the ARFIMA-SV model. Furthermore, other parameter estimates are very
similar to parameter estimates of the ARFIMA-SV model for yt. We also compare our estimates with
those from an ARIMA-SV model, i.e. (3.5.1) with p = 12, d = 1 and q = 1. This model corresponds to
an ARIMA(12,0,1)-SV model for ∆yt. As before, we find φ1, ..., φ10 to be non-significant. Therefore,
they are set to zero. As noted in Stock and Watson (2007), changes in the long-run persistence of this
model are captured by the MA parameter, ψ1. The forth column of Table 3.8 reports results for the
ARIMA(12,1,1)-SV model. We find that ψ1 = −0.84. Finally, the last two columns of Table 3.8 report
results for an IMA(1,1)-SV model. This specification is equivalent to the unobserved components (UC)
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
72
model of Stock and Watson (2007). The estimate of ψ1 is equal to −0.83, which is almost identical to
the estimate of ψ1 for the ARIMA(12,1,1)-SV model. However, we see that leaving out φ11 and φ12
results in a considerably lower ML value.
Figure 3.4.: Estimation results on US core inflation rate, ARFIMA(12,d,0)-SV
1957 1969 1981 1993 20050.1
0.2
0.3
0.4(a) d
φ12 φ11
0 20 40 60 80 100−0.2
0.2
0.6
1.0(b) d
φ12 φ11
d
0 2000 4000 6000 8000 100000.0
0.2
0.4
0.6(c) d φ11 φ12
0 2000 4000 6000 8000 100000.0
0.2
0.4
0.6(d) φ11 φ12
Panel (a): Filtered volatility estimates, ARFIMA(12,d,0)-SV, (b): autocorrelation functions of d, φ11 and φ12, (c): d | YTagainst iteration (after a burn-in of 10000), and (d): φ11 | YT , φ12 | YT against iteration (after a burn-in of 10000).
3.5.3. Subsample estimation
In order to show the effects of the SV extension on the estimates and demonstrate presence of structural
breaks in θ, we present estimation results for several models for two different samples. Table 3.9 reports
estimation results for: a homoscedastic ARFIMA(12,d,0), ARFIMA(12,d,0)-SV and ARIMA(12,1,1)-
SV from 1957:1-1980:12 and 1981:1-2013:5, respectively12.
For the first sample, 1957:1-1980:12, we find that d is now closer to the nonstationary value of 0.5,
(0.42 for ARFIMA and 0.39 for ARFIMA-SV). Furthermore, the estimates of ρ are smaller than for
the full sample for both ARFIMA-SV and ARIMA-SV. We find that β0s are also higher for both
models, but so are their posterior standard deviations. There is still a very significant difference in
the marginal likelihood values between the ARFIMA model and the ARFIMA-SV model. Compared
to the full sample, the logBF in favor of ARFIMA-SV is however smaller.
Finally, the right side of Table 3.9 displays results for the period 1981:1-2013:5. The d parameter
is much smaller than the d parameter for the first period. On the other hand, ρ rises from 0.73 for
the first sample period to 0.96 for the second sample period. The unconditional volatility of volatility,
12Contrary to Bos et al. (2012), we choose to include 1983 and 1984 in the second subsample period.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
73
Tab
le3.
9.:
Sub
sam
ple
esti
mat
ion
resu
lts
fory t
AR
FIM
AA
RF
IMA
-SV
AR
IMA
-SV
AR
FIM
AA
RF
IMA
-SV
AR
IMA
-SV
Para
met
erθ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)θ
(RB
)19
57:1
-198
0:12
1981
:1-2
013:
5
d0.4
155
(2.6
1)
0.38
99(3
.04)
0.24
44(3
.73)
0.21
95(2
.44)
[0.3
361,
0.48
95]
[0.3
457,
0.43
49]
[0.1
581,
0.34
33]
[0.1
737,
0.26
38]
ψ1
-0.7
296
(2.5
8)-0
.937
5(2
.86)
[-0.
7733
,-0.
6848
][-
0.96
39,-
0.90
81]
φ11
0.0
889
(2.3
2)
0.15
23(3
.26)
0.06
80(2
.17)
0.14
54(3
.44)
0.11
91(2
.85)
0.09
28(2
.84)
[-0.0
201,0
.200
1]
[0.0
899,
0.21
35]
[0.0
203,
0.11
30]
[0.0
697,
0.22
47]
[0.0
794,
0.15
90]
[0.0
534,
0.13
30]
φ12
0.1
602
(2.7
7)
0.27
43(2
.79)
0.20
85(2
.25)
0.35
85(3
.37)
0.43
97(2
.95)
0.44
47(2
.78)
[0.0
481,0
.268
6]
[0.2
093,
0.33
70]
[0.1
606,
0.25
55]
[0.2
764,
0.43
94]
[0.3
982,
0.48
07]
[0.4
027,
0.48
51]
β0
0.2
609
(2.2
0)0.
4612
(2.9
2)0.
1140
(4.6
4)0.
1995
(2.3
2)[0
.028
6,0
.480
7]
[0.2
830,
0.64
03]
[0.0
544,
0.17
88]
[0.1
422,
0.25
65]
β(i
80:7
)-1
.0307
(1.9
7)
-1.0
504
(2.8
4)-1
.236
9(2
.26)
[-1.4
249,-
0.6
557]
[-1.
2715
,-0.
8450
][-
1.39
63,-
1.07
72]
β(i
80:8
)0.
9572
(2.1
5)
[0.7
865,
1.13
45]
µ-3
.159
7(3
.02)
-3.1
921
(2.1
1)-4
.272
3(2
.49)
-4.2
619
(2.6
2)[-
3.25
36,-
3.06
48]
[-3.
2775
,-3.
1038
][-
4.39
46,-
4.14
97]
[-4.
3971
,-4.
1287
]ρ
0.72
93(3
.38)
0.75
43(2
.71)
0.95
51(2
.86)
0.95
95(2
.83)
[0.6
107,
0.84
00]
[0.6
414,
0.85
79]
[0.9
459,
0.96
41]
[0.9
516,
0.96
72]
σ2
0.00
61(3
.51)
0.06
80(3
.13)
0.00
66(2
.78)
0.00
70(3
.43)
[0.0
027,
0.01
08]
[0.0
203,
0.11
30]
[0.0
035,
0.01
05]
[0.0
038,
0.01
09]
σ2 ε
0.0
485
(1.8
1)
0.01
80(1
.11)
[0.0
411,0
.057
2]
[0.0
156,
0.02
07]
log(
L)
25.8
245
.71
49.0
422
1.74
240.
3623
2.95
log(
ML
),a
=0.
75
8.1
723
.54
24.2
420
7.31
213.
3520
7.34
log(
ML
),a
=0.
99
8.0
823
.79
24.7
620
7.32
213.
6120
7.58
M-H
rati
o0.
500.
520.
570.
56
This
table
rep
ort
ses
tim
ati
on
resu
lts
for
diff
eren
tm
odel
sov
ertw
osu
bsa
mple
s,1957:1
-1980:1
2and
1981:1
-2013:5
.β(i80:7)
andβ(i80:8)
are
incl
uded
inth
em
ean
equati
on.RB
isin
dic
ate
din
side
the
pare
nth
eses
.lo
g(L
):lo
g-l
ikel
ihood,
log(M
L):
log-m
arg
inal
likel
ihood.
M-H
rati
o:
Met
rop
olis-
Hast
ings
acc
epta
nce
rati
o.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
74
σ/√
1− ρ2, rises from 0.12 for the first subsample to 0.28 for the second subsample13. The AR
parameters also show changes compared to the first subsample and the full sample. Finally, according
to the marginal likelihood criterion, we find that the contribution of the SV component is relatively
smaller in the second subsample. We also present results for the ARIMA-SV model from 1957:1-
1980:12 and 1981:1-2013:5 in Table 3.9. These results should be compared with Table 3.8. We see
that the most significant change is the shift in ψ1 which changes from −0.73 for 1957:1-1980:12 to
−0.94 for 1981:1-2013:5. The marginal likelihood of the ARIMA-SV model is slightly higher than
the ARFIMA-SV marginal likelihood for the first sample and lower than the ARFIMA-SV marginal
likelihood for the second sample.
Overall, we have shown that PMMH provides a very compelling and computationally fast frame-
work for estimation and model comparison. We conclude that we can qualitatively replicate already
established facts on structural changes in θ since the Great Moderation.
3.5.4. Forecasts
In this section, we perform a recursive out-of-sample forecasting exercise to evaluate the performance
of the models listed in Table 3.7. Given data up to time t − 1, Yt−1 = (y1, ..., yt−1)′, the predictive
likelihood, p (yt, .., yT | Yt−1), is the predictive density evaluated at the realized outcome, yt, ..., yT ,
t ≤ T , see Geweke (2005). The predictive likelihood contains the out-of-sample prediction record of
a particular model, making it the essential quantity of interest for model evaluation. The predictive
likelihood for model MA is given as
p (yt, .., yT | Yt−1,MA) =
ˆΘA
p (yt, .., yT | Yt−1, θA,MA) p (θA | Yt−1,MA) dθA. (3.5.3)
Notice that the terms on the right-hand side of (3.5.3) have parameter uncertainty integrated out. If
t = 1, this would be the marginal likelihood and (3.5.3) changes to
p (y1, ..., yT | MA) =
ˆΘA
p (y1, ..., yT | θA,MA) p (θA | MA) dθA,
where p (y1, ..., yT | θAMA) is the likelihood and p (θA | MA) is the prior for model MA. Hence,
the sum of log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the
marginal likelihood, but ignoring the initial t − 1 observations. The predictive likelihood (PL) can
be used to order models according to their predictive abilities. In a similar fashion to Bayes factors
which is based on all of the data, one can also compare the performance of models based on a specific
out-of-sample period by predictive Bayes factors. The predictive Bayes factor for model A versus B
is PFBAB = p (yt, .., yT | Yt−1,MA) /p (yt, .., yT | Yt−1,MB), and summarizes the relative evidence of
the two models over the out-of-sample data, yt, ..., yT . Equation (3.5.3) is simply the product of the
individual predictive likelihoods
p (yt, ..., yT | Yt−1,MA) =T∏s=t
p (ys | Ys−1,MA) . (3.5.4)
13At first, this may seem counterintuitive. However, we believe that this result is due to the choice of our subsamples.Furthermore, we also find that var (YT ) = 47.78 for 1957:1-1980:12 and var (YT ) = 87.83 for 1981:1-2013:5.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
75
Calculating (3.5.4) within a PMMH sampling scheme is easy. We can use the predictive decomposition
along with the output from M-H. Specifically, each term on the right-hand side of (3.5.4) can be
consistently estimated as
p (ys | Ys−1,MA) ≈ 1
N
N∑i=1
p(ys | Ys−1, θ
(i)A ,MA
). (3.5.5)
We can also compare forecasts of models based on the predictive mean. Similar to the predictive
likelihood, the predictive mean, E [yt | Yt−1,MA], can be computed using θ(1)A , ..., θ
(N)A . For instance,
in the context of the ARFIMA-SV model, we have that
(yt − β0) =
t−1∑l=1
πl (yt−l − β0) + σεtεt and
t−1∑l=1
πlLl =
Φ (L)
Ψ (L)(1− L)d .
Thus, we can calculate the predictive mean of yt based on Yt−1 as
E [yt | Yt−1,MA] ≈ 1
N
N∑i=1
t−1∑l=1
β(i)0 + π
(i)l
(yt−l − β
(i)0
).
For each model, we produce h-step ahead forecasts with h = 1, h = 4 and h = 8 using a rolling
window with a width of 265 months. We choose the out-of-sample period from 1979:3 till the end of
the sample for a total of 411 observations. Specifically, given Yt, t ≥ 265, we implement our sampling
scheme, obtain posteriors draws of θA and compute p (yt+h | Yt,MA), h = 1, 4, 8 using (3.5.5). We
also calculate E [yt+h | Yt,MA], using at each step h = 2, ... previously obtained forecasts until h-1.
As a new observation enters the information set, the posterior is updated through a new round of
sampling and the forecasting procedure is implemented.
The predictive likelihood of the models in Section 3.5.2 along with several benchmark alternatives
are displayed in the left panel of Table 3.10. The ARFIMA-SV models are estimated using PMCMC,
i.e. PMMH, while AR(5), AR(10), ARMA(1,1) and ARFIMA(12,d,0) are estimated using “pure”
Gibbs sampling.
The log-predictive likelihood of ARFIMA(12,d,0)-SV is larger than all the other models for h = 1.
The evidence for h = 4 and h = 8 is also strong. However, the performance of ARFIMA(12,d,0)-SV
deteriorates as forecast horizon lengthens. We also find that ARMA(1,1) is dominated by all the other
models, and it has poor performance compared with AR(5) or AR(10).
Although we focus on the predictive likelihood to measure predictive content, it is also interesting
to consider the out-of-sample point forecasts based on the predictive mean. Therefore, the right panel
of Table 3.10 reports the root mean squared error (RMSE) for the predictive mean. The out-of-sample
period corresponds exactly to the period used to calculate the predictive likelihood. As before, we find
that ARFIMA(12,d,0)-SV performs very well against all the other models. It is the top performer
for h = 4 and h = 8 and second best for h = 1. Compared to the homoscedastic ARFIMA model,
ARFIMA(12,d,0)-SV and ARFIMA(12,d,0)-SVt offer improvements in terms of out-of-sample point
forecasts. However, the improvements they offer are quite modest.
In order to perform a joint evaluation of the forecasts, the methodology of Hansen et al. (2011),
termed the Model Confidence Set (MCS) is applied. The appealing feature of the MCS approach is that
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
76
it allows for a user-defined criterion of “best”. Furthermore, it does not require a benchmark model for
comparison. In terms of implementation, the Ox package, Mulcom, provided by the authors is used.
As before, results clearly indicate that ARFIMA(12,d,0)-SV and ARFIMA(12,d,0)-SVt perform very
well. In terms of density forecasts, ARFIMA(12,d,0)-SV is the only model that belongs to the 5% MCS
for h = 1, h = 4 and h = 8, i.e. ARFIMA(12,d,0)-SV performs significantly better than all the other
models. In terms of point forecasts, there is no significant difference between ARFIMA(12,d,0)-SV
and ARFIMA(12,d,0)-SVt. These models are the only models that belong to the 5% MCS.
Table 3.10.: Out-of-sample forecast results, yt+hlog(PL) RMSE
Model h = 1 h = 4 h = 8 h = 1 h = 4 h = 8
AR(5) 127.77 62.48 60.15 0.1828 0.2152 0.2149AR(10) 128.61 78.21 89.55 0.1826 0.2069 0.2025ARMA(1,1) 111.28 44.19 34.22 0.1893 0.2219 0.2249ARFIMA(12,d,0) 158.97 124.86 125.18 0.1721 0.1913 0.1885
ARFIMA(12,d,0)-SV 199.69(∗) 159.75(∗) 150.93(∗) 0.1699(∗) 0.1882(∗) 0.1861(∗)
ARFIMA(12,d,0)-SVt 192.88 151.91 139.95 0.1698(∗) 0.1887(∗) 0.1869(∗)
This table reports the log-predictive likelihood, log(PL), and out-of-sample root mean squared error (RMSE) for thepredictive mean. The out-of-sample period is from 1979:3 to 2013:5. An asterisk, (∗), signifies that the model belongsto the 5% MCS of Hansen et al. (2011).
Figure 3.5.: Rolling window parameter estimates, ARFIMA(12,d,0)-SV
1957 1963 1969 1975 1981 1987 19930.0
0.2
0.4
0.6
0.8
1.0(a)
1957 1963 1969 1975 1981 1987 1993
0.1
0.2
0.3
0.4
0.5 (b)
1957 1963 1969 1975 1981 1987 1993−0.1
0.0
0.1
0.2
0.3(c)
1957 1963 1969 1975 1981 1987 19930.0
0.2
0.4
0.6
0.8(d)
Panel (a): β0, (b): d, (c): φ11 and (d): φ12. Window width: 265 months. First period: 1957:1-1979:2, last period1991:5-2013:5. The solid lines represent parameter estimates. The dashed lines denote the 95% credibility intervals.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
77
3.5.5. Parameter sensitivity analysis
In this section, we follow Bos et al. (2012) and perform parameter sensitivity analysis using rolling
estimates of θ. We use rolling estimates for window lengths of 265 months. We show recursive
estimates of β0, d, φ11 and φ12 along with their respective 95% credibility intervals in Figure 3.5. The
fractional integration order, d, captures the long-memory behavior, φ11 and φ12 capture the short-
memory behavior including seasonality. The values for 1957:1 correspond to the estimation period
1957:1-1979:2. Panel (a) shows that β0 really does not change much. It fluctuates around 0.4 until
the mid 1980s and thereafter drops to 0.2. Recursive estimates of d show that d gradually drops from
0.4 at the start of the sample to about 0.1 towards the end of the sample. The estimate of d is 0.16
for the last subsample which runs from 1990:5 to 2013:5. It is cautiously evident that long-memory
characteristics of US inflation might not have remained significant after the Great Moderation. It is
also clear that current long-run persistence is lower than in the period before the Great Moderation.
On the other hand, φ12 increases almost as the mirror image of d. Finally, except for a small peak in
the early 1980s, φ11 remains relatively constant around 0.1 throughout the sample.
Overall, recursive estimates of the parameters show slowly drifting trends. As expected, most
significant changes occur during the Great Moderation. We seem to agree with Stock and Watson
(2007) that US inflation may have become harder to forecast as the persistence and hence the month-
to-month memory has dropped.
3.6. Unobserved Components Model with SVM Effects
In this section, we follow Chan (2014) and Section 3.4. As before, the goal of this section is to
demonstrate the flexibility of PMCMC to adapt to more complicated model structures. We define the
unobserved components model with stochastic volatility in mean effects, UC-SVM, as
yt = αt + λt exp (ht/2) + exp (ht/2) εt, εt ∼ N (0, 1)
αt+1 = αt + σηηt, ηt ∼ N (0, 1)
λt+1 = λt + σζζt, ζt ∼ N (0, 1)
ht+1 = µ+ ρ (ht − µ) + σηt, ηt ∼ N (0, 1) .
This specification extends the UC-SVm model discussed in Section 3.4. In UC-SVM, the log-volatility,
ht, enters the conditional mean equation with a time-varying loading coefficient, λt. Modeling this
specification within the PMCMC framework is straightforward. For instance, we can implement the
particle Gibbs sampler of Andrieu et al. (2010). Alternatively, we can use a Rao-Blackwellization
scheme similar to Section 3.4. Conditional on ht the remaining model is a linear Gaussian state-space
model where (α1:T , λ1:T ) can be integrated out analytically using the Kalman filter, and we can sample
θ using M-H. Furthermore, estimating the above model is very difficult using “pure” Gibbs sampling,
see Chan (2014). For instance, Chan (2014) first samples (α1:T , λ1:T ) conditional on h1:T , θ and YT .
However, conditional on (α1:T , λ1:T ), θ and YT , the model cannot be written in linear state-space form.
Therefore, there is no obvious way to sample h1:T from its conditional posterior. Consequently, Chan
(2014) adopts an accept-reject M-H procedure and samples h1:T conditional on α1:T , λ1:T , θ, YT .
On the other hand, within the PG framework, we act as if we are operating within a traditional
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
78
Gibbs sampling scheme. Furthermore, in step 1, we can draw (α1:T , λ1:T , h1:T ) all-at-once using for
example the conditional SMC algorithm of Andrieu et al. (2010). Thus, we proceed by cycling through
the following steps.
1. α1, ..., αT , λ1, ..., λT and h1, ..., hT | θ, YT .
2. µ | ρ, σ2, h1, ..., hT .
3. ρ | µ, σ2, h1, ..., hT .
4. σ2 | µ, ρ, h1, ..., hT .
5. σ2η | α1, ..., αT .
6. σ2ζ | λ1, ..., λT .
As state above, we can use the particle Gibbs (PG) sampler of Andrieu et al. (2010) to estimate the
above model by cycling through steps 1-6. However, Lindsten et al. (2012) propose a relatively new
method which is computationally as elegant and in some cases even more robust than PG, namely,
particle Gibbs with ancestor sampling, PG-AS. This approach builds on the PG sampler proposed by
Andrieu et al. (2010). In PG, we start by running a sequential Monte Carlo (SMC) sampler in which
one particle trajectory is set deterministically to a reference trajectory that is specified a priori. After
a complete run of the SMC algorithm, a new trajectory is obtained by selecting one of the particle
trajectories with probabilities given by their importance weights. The effect of the reference trajectory
is that the target distribution of the resulting Markov kernel remains invariant, regardless of M , see
Andrieu et al. (2010). However, in some cases, PG can suffer from a serious drawback, which is that
the underlying mixing can be very poor when there is path degeneracy in the SMC sampler14. PG-AS
alleviates possible path degeneracy problems in a very computationally elegant fashion. Specifically,
the original PG kernel is modified using a so-called ancestor sampling step. Even though this is a
small modification of the algorithm, improvements in mixing can be quite considerable, see Lindsten
et al. (2012). The reader is referred to Lindsten et al. (2012) and the Appendix for more details
regarding PG-AS.
Accordingly, at iteration i of PG-AS, we consider the nonlinear framework of our UC model and
sample(α
(i)1:T , λ
(i)1:T , h
(i)1:T
)∼ p (α1:T , λ1:T , h1:T | θ, YT ) using the conditional particle filter with ancestor
sampling (CPF-AS) of Lindsten et al. (2012), see the Appendix. The elements in θ(i) are sampled
one-at-a-time using standard Gibbs sampling techniques conditional on α(i)1:T , λ
(i)1:T , h
(i)1:T , YT
15.
We assume independent priors for θ. Specifically, µ ∼ N (0, 1) and (ρ+ 1) /2 ∼ Beta (20, 1.5). As
before, the variance parameters are IG distributed with v0 = 4 and s0 = 0.02. Our data consists of UK
quarterly seasonally adjusted CPI inflation from 1955q1 to 2013q1. Specifically, given the quarterly
CPI number, CPIt, we use yt = 400ln (CPIt/CPIt−1) as the CPI inflation rate. Table 3.11 reports
posterior mean estimates, 95% credibility intervals (indicated inside the brackets) and inefficiency
factors for:14In some cases this problem can be addressed by adding a backward simulation step to the PG sampler, yielding a
method denoted as PG with backward simulation, see Lindsten and Schon (2013). In order to avoid this step andprovide an overall robust method for estimation, we choose to follow Lindsten et al. (2012). Therefore, we use theconditional particle filter with ancestor sampling of Lindsten et al. (2012).
15We could also first sample α(i)1:T , λ
(i)1:T | h
(i−1)1:T , θ(i−1), YT and then h
(i)1:T | α
(i)1:T , λ
(i)1:T , θ
(i−1), YT . However, we find thatthis approach leads to very similar results as our original approach.
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
79
1. M1: UC: a plain UC model, i.e. (3.4.1)-(3.4.2).
2. M2: UC-SVm: an UC model with SV effects in σ2ε .
3. M3: UC-SVM-λ-const: an UC model with SV in mean effects where λ1 = λ2 = ... = λT .
4. M4: UC-SVM: an UC model with SV in mean effects.
Table 3.11.: Estimation results, unobserved components models, UK inflation
UC UC-SVm UC-SVM-λ-const UC-SVMParameter θ (RB) θ (RB) θ (RB) θ (RB)
λ 1.8612 (28.56)[1.5312,2.6859]
µ 0.7530 (1.82) 0.7891 (1.95) 0.7374 (1.95)[-0.2216,3.6716] [-0.2991,3.1355] [-0.0892,3.2277]
ρ 0.9482 (18.95) 0.9708 (11.45) 0.9508 (17.25)[0.8996,0.9979] [0.9437,0.9998] [0.9063,0.9991]
σ2 0.2345 (36.49) 0.1039 (30.41) 0.1361 (33.26)[0.0953,1.1590] [0.0527,0.4320] [0.0714,0.4013]
σ2ε 10.1848 (1.18)
[8.6739,15.6103]σ2η 0.7456 (22.28) 0.2914 (34.50) 0.0116 (50.37) 0.0051 (39.67)
[0.4113,3.7689] [0.1282,1.2156] [0.0033,0.1065] [0.0023,0.0266]σ2ζ 0.0264 (49.50)
[0.0095,0.2204]
log(L) -610.20 -534.87 -523.53 -513.22log(ML), a = 0.75 -663.49 -566.22 -541.07 -531.19log(ML), a = 0.99 -663.24 -565.96 -540.93 -530.94
This table reports estimation results for different UC models. RB is indicated inside the parentheses. log(L): log-likelihood, log(ML): log-marginal likelihood for the corresponding value of a.
Furthermore, the PG-AS sampler provides us with p (YT | θ,Mk), k = 1, ..., 4. Thus, as before, we
are able to compute the marginal likelihood values for each model using G-D. On the other hand,p (YT | θ,Mk) is not as easily obtainable within a traditional Gibbs sampling scheme. For instance,
Chan (2014) adopts the approach in Koop et al. (2010) and instead computes the dynamic posterior
probabilities for M4 versus M2 over the data.
We see that UC-SVM performs best in terms of ML. For instance, compared to UC-SVm, which is
equivalent to UC-SVM when λt = 0, t = 1, ..., T , the logBF in favor of UC-SVM is 35.02. This is
very strong evidence. Furthermore, compared to UC-SVM-λ-const, the logBF in favor of UC-SVM
is 9.99, which confirms posterior evidence in favor of time-variation in λ. In Figure 3.6, we plot the
evolution of αt, λt and exp (ht). Intuitively, the estimates of exp (ht) are as expected. The inflation
volatility increases during the 1970s and subsequently stabilizes since the beginning of the 1980s. We
also report posterior estimates of αt and λt, t = 1, ..., T . Evidently, there is substantial time-variation
in these estimates. Estimates of λt highlight the importance of including this component in the model.
During the 1970s, λt is between 1.5 and 3.5, whereas λt becomes much smaller after the early 1980s16.
16However, for PG-AS, (on average) the RB of each parameter is higher than for PMMH. As stated in Section 3.4, thisis because we incorporate the steps in So et al. (2005) within the PMMH procedure, thus automatically reducing RB .Furthermore, the inefficiency factors of θ using PG-AS are relatively close to those of the Gibbs sampling approachof Chan (2014).
3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models
80
Figure 3.6.: Estimation results, UC-SVM, UK inflation rate
Inflation Filtered trend
1955 1965 1975 1985 1995 2005 2015−5
5
15
25
35Inflation Filtered trend exp(ht)
1955 1965 1975 1985 1995 2005 20150
10
20
30
40exp(ht)
αt
1955 1965 1975 1985 1995 2005 2015−0.10
0.02
0.14
0.26
0.38
0.50αt λt
1955 1965 1975 1985 1995 2005 2015−0.5
0.5
1.5
2.5
3.5
4.5λt
Top panels: inflation, filtered trend and estimates of the conditional variance of inflation, exp (ht). Bottom panels:evolution of αt and λt, t = 1, ..., T , with their respective 95% credibility intervals.
3.7. Conclusion
In this paper we present algorithms and implementations for analyzing different data sets using Ox in
combination with particle Markov chain Monte Carlo techniques. We briefly describe implementation
of these techniques in Section 3.2. We provide several examples in Sections 3.3 to 3.6. We show how
to estimate stochastic volatility models with different extensions. Thereafter, we focus on PMMH
estimation of unobserved components models with time-varying volatility using US inflation data.
Results using quarterly inflation data show that extending the unobserved components model towards
a model with time-varying volatility both in the irregular and regular components of inflation provides
improvements in terms of the marginal likelihood criterion.
We also show that it is relatively easy to estimate more complicated models using PMCMC. For
instance, we estimate different ARFIMA-SV type models. We show that our methods work very
well in identifying the true data generating parameters through Monte Carlo simulations. We find
that the ARFIMA-SV model performs best in terms of ML. Overall, we find that the SV part of
the model clearly captures the Great Moderation. Sensitivity analysis using rolling estimates of the
parameters for the ARFIMA-SV model provides a clear distinction between parameter changes in the
level, long-run dynamics and changes in parameters for the short-run dynamics. In terms of out-of-
sample forecasts, ARFIMA(12,d,0)-SV obtains the highest PL for h = 1, h = 4 and h = 8. In terms
of RMSE, ARFIMA(12,d,0)-SV is the top performer for h = 4 and h = 8.
4. Particle Gibbs with Ancestor Sampling for
Stochastic Volatility Models with: Heavy Tails,
in Mean Effects, Leverage, Serial Dependence
and Structural Breaks
Author: Nima Nonejad
Abstract: Particle Gibbs with ancestor sampling (PG-AS) is a new tool in the family of sequen-
tial Monte Carlo methods. We apply PG-AS to the challenging class of stochastic volatility models
with increasing complexity, including leverage and in mean effects. We provide applications that
demonstrate the flexibility of PG-AS under these different circumstances and justify applying it in
practice. We also combine discrete structural breaks within the stochastic volatility model framework.
For instance, we model changing time series characteristics of monthly postwar US core inflation rate
using a structural break autoregressive fractionally integrated moving average (ARFIMA) model with
stochastic volatility. We allow for structural breaks in the level, long and short-memory parameters
with simultaneous breaks in the level, persistence and the conditional volatility of the volatility of
inflation.
Keywords: ancestor sampling, Bayes, particle filtering, structural breaks
(JEL: C11, C22, C58, C63)
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
82
4.1. Introduction
Stochastic volatility (SV) models are widely used for modeling financial time series data, see Jacquiera
et al. (1994), Kim et al. (1998), Chib et al. (2002), Koopman and Hol Uspensky (2002), Berg et al.
(2004), Omori et al. (2007), Nakajima and Omori (2012), and Chan and Grant (2014). Furthermore,
time series models with SV specification have also become important in macroeconometric modeling
as they provide a flexible framework for estimation and interpretation of time variation in the volatility
of macroeconomic time series, see for instance Cogley and Sargent (2005), Primiceri (2005), Koop and
Korobilis (2010), Chan (2013) and Chan (2014).
Bayesian inference of these models generally relies on MCMC techniques. Typically, the main
practical difficulty in estimating SV models lies in simulating from the conditional posterior of the
latent volatility process. In general, there is no unified way to draw these latent volatilities. For
instance, methods that work for simple SV specifications do no work for specifications that allow for
leverage or SV in mean effects, see Omori et al. (2007) and Chan (2014). Modifying the algorithms
which can accommodate these features often requires much programming effort, and can in some cases
prove inefficient producing highly autocorrelated draws.
Recently, in their seminal paper, Andrieu et al. (2010) propose a novel combination called Particle
MCMC (PMCMC) which uses sequential Monte Carlo (SMC) techniques to design efficient high-
dimensional proposal distributions for MCMC algorithms. One of the main features of the PMCMC
method is that it can use the output of an SMC method, targeting the marginal density of the model
parameters as a proposal distribution for a Metropolis-Hastings update. Among recent works, Flury
and Shephard (2011) and Whiteley et al. (2010) use this technique to estimate stochastic volatility
and other time series models.
In this paper, we provide a unified Bayesian methodology in order to estimate stochastic volatility
(and models with time-varying volatility modeled as a SV process) models with heavy tails, in mean
effects, leverage and structural breaks. We apply a relatively new tool in the family of SMC methods,
which is particularly useful for inference in SV models, namely, particle Gibbs with ancestor sampling
(PG-AS), suggested in Lindsten et al. (2012). PG-AS is similar to the particle Gibbs (PG) sampler
proposed by Andrieu et al. (2010). In PG, we start by running a sequential Monte Carlo sampler in
which one particle trajectory is set deterministically to a reference trajectory that is specified a priori.
After a complete run of the SMC algorithm, a new trajectory is obtained by selecting one of the
particle trajectories with probabilities given by their importance weights. The effect of the reference
trajectory is that the target distribution of the resulting Markov kernel remains invariant, regardless
of the number of particles used in the underlying SMC algorithm. However, PG can suffer from a
serious drawback, which is that the underlying mixing can be very poor when there is path degeneracy
in the SMC sampler. In most cases, this problem can be addressed by adding a backward simulation
step to the PG sampler, yielding a method denoted as PG with backward simulation, see Whiteley
et al. (2010) and Lindsten and Schon (2013)1. PG-AS alleviates the path degeneracy problem in a
1In Andrieu et al. (2010), Whiteley suggests adding a backward step that enables exploration of all possible ancestrallineages. This approach can be considered as an alternative to the ancestor sampling part. However, this step willincrease computation time. Furthermore, the purpose of this paper is not compare the performance of PG-AS withother methods. On the contrary, we seek to show that PG-AS provides a general, simple and unified way to drawthe latent states without the need to add or modify backward steps. Finally, it is a consensus in the community thatPG-AS outperforms or at the very least performs as well as the original PG with its modifications, see Lindsten etal. (2012) and Lindsten et al. (2014).
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
83
very computationally elegant fashion. Specifically, the original PG kernel is modified using a so-called
ancestor sampling step. This way the same effect as backward sampling is achieved, but without the
need to run an explicit backward pass.
The main objective of this paper is to show that PG-AS provides a very compelling and computa-
tionally easy framework for estimating rather advanced models. We illustrate this on three specific
problems, producing rather generic methods. Accordingly, as we shall see, for both financial and
macroeconomic models that we consider, PG-AS requires limited design effort on the practitioner’s
part, especially if one desires to change some features in a particular model. On the other hand,
estimating the same types of models using “traditional” Gibbs sampling would require relatively more
programming effort2. Furthermore, contrary to typical Gibbs sampling applications which often re-
quire additional sampling steps to obtain the integrated likelihood, this quantity is easily obtainable
using PG-AS as the integrated likelihood is directly available from the conditional particle filter with
ancestor sampling output, see Section 4.3.
The initial model in Section 4.4 is the standard stochastic volatility (SV) model with Gaussian errors
applied to a financial data set concerning daily Dow Jones Industrial Average (DJIA) returns. Next,
we consider different well-known extensions of the SV model. First, we incorporate a leverage effect
by modeling a correlation parameter between measurement and state errors. The third extension is
PG-AS implementation of the stochastic volatility in mean model of Koopman and Hol Uspensky
(2002). In this specification, the unobserved volatility process appears in both the conditional mean
and the conditional variance. In the fourth extension, we implement a model that has both stochastic
volatility and moving average errors, see Chan (2013). Finally, we also consider a SV model with
Student-t distributed errors, a heavy-tailed SV model with stochastic volatility in mean effects, and
a SV model with heavy-tailed moving average errors, see Chan and Hsiao (2013). We show that
PG-AS provides a straightforward procedure for estimation, marginal likelihood (ML) and deviance
information criterion (DIC) calculation of these models3.
In the second part of the paper, we provide extensions where we combine discrete structural breaks
within the SV framework using macroeconomic time series data. We provide a general methodology
for modeling and forecasting in the presence of structural breaks caused by unpredictable changes
to model parameters. In our settings, structural breaks are modeled through irreversible Markov
switching, or so-called change-point dynamics, see Chib (1998). We estimate model parameters, log-
volatilities and change-points dates conditional on a fixed number of change points. For each of these
specifications, ML and DIC are calculated. They are then used to determine the optimal number of
change points, see Liu and Maheu (2008).
First, we consider modeling the real US GDP growth rate and document a structural break in
its volatility since the 1980s, see Gordon and Maheu (2008), among others. The flexibility of PG-
AS allows us to incorporate the GDP rate GDP volatility relationship by incorporating stochastic
volatility in mean effects within the change-point model. Overall, besides a one-time structural break
in the volatility of real US GDP growth rate in 1984, our results also point to a gradual volatility
2With“traditional”Gibbs sampling, we refer to sampling methods that draw the latent states using either: a single-stateprocedure, see Jacquiera et al. (1994), mixture samplers as in Kim et al. (1998), Omori et al. (2007) in the contextof the SV model with leverage effect, or an accept-reject Metropolis-Hastings procedure as for instance Chan (2014)in the context of the SV model with in mean effects.
3See Chan and Grant (2014) for a broader discussion on the advantages of using DIC based on the integrated likelihoodinstead of the conditional DIC, i.e. DIC conditional on the latent volatilities.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
84
reduction in the 1960s followed by a subsequent increase in the 1970s. Furthermore, we find that the
GDP rate GDP volatility relationship, as measured by the SV in mean feedback has a more serious
and negative impact on the GDP growth rate after the structural break4. In terms of point forecasts,
structural break specifications tend to dominate their constant parameter counterparts. However,
these improvements are quite modest.
Second, we model changing time series characteristics of postwar monthly US core inflation rate
using a structural break autoregressive fractionally integrated moving average model with stochastic
volatility. We allow for structural breaks in the level, autoregressive (AR), moving average (MA)
parameters, long-memory parameter, d, contemporaneously with breaks in the level, persistence and
the conditional volatility of the volatility of inflation. We find evidence of structural breaks in the
dynamics of US core inflation rate and show that we can qualitatively reproduce well-known empirical
facts regarding the dynamics of US inflation rate. As expected, most significant changes in the
model parameters occur during the Great Moderation. Furthermore, it is also cautiously evident
that the long-memory characteristics of US inflation might not have remained significant after the
Great Moderation. In comparison to a model that assumes no breaks, we find that our break model
performs better in terms of density and point forecasts.
Overall, we believe that applying PG-AS to time series SV models, especially structural break
specifications is the most important contribution that we provide. To our knowledge, no attempts
have been made to use PG-AS in the econometric analysis of these types of models. The remaining of
this paper is as follows. In Section 4.2 we intuitively explain the advantages of PG-AS, mainly from
a computational point of view. Section 4.3 describes the steps of PG-AS. We present our empirical
applications in Sections 4.4 and 4.5. Finally, the last section concludes.
4.2. Why is PG-AS Useful?
In this section we provide an intuitive justification for applying PG-AS in practice. We identify two
common weaknesses in existing Gibbs sampling methods, see (1) and (2) below. We then argue how
PG-AS can provide solutions to these problems. In Section 4.3 we provide technical details on PG-AS,
especially the conditional particle filter with ancestor sampling, CPF-AS.
Given a time series of observations, YT = (y1, ..., yT )′, a very plain stochastic volatility (SV) model
consists of a measurement equation,
yt = µ+√γ exp (ht/2) εt, εt ∼ N (0, 1) , (4.2.1)
that describes the distribution of the data given the log-volatilities, h1:T = (h1, ..., hT )′, where
ht = µh + φh (ht−1 − µh) + σhζt, ζt ∼ N (0, 1) , (4.2.2)
Equation (4.2.2) models the period-to-period variation of the volatilities as a Markov process. Typi-
cally, we let corr (εt, ζt) = 0. The parameter, µh is the drift term in the state equation, γ = exp (µh)
plays the role of a constant scaling factor, σh is the volatility of log-volatility and φh is the persistence
4Our change-point stochastic volatility in mean model is not by any means restricted only to GDP data. In fact, webelieve that this model can be used to analyze US inflation rate data, providing a flexible alternative to recent workson quarterly US inflation rate data such as Chan (2014) and Eisenstat and Strachen (2014).
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
85
parameter. In most cases, we impose that |φh| < 1 such that we have a stationary process with the
initial condition, h1 ∼ N(µh, σ
2h/(1− φ2
h
)). For identification reasons, we must either set γ equal to
1, leave µh unrestricted, or fix µh at zero and estimate γ > 0, see Kim et al. (1998). Thus, we choose
to set γ = 1 and leave µh unrestricted. Finally, we collect all model parameters in θ =(µ, µh, φh, σ
2h
)′.
It has been observed that the plain SV model is too restrictive in many financial applications. For
instance, the SV model that is studied in the bulk of the literature typically assumes that the measure-
ment and state equation disturbances are Gaussian and uncorrelated. Accordingly, it is interesting to
extend the SV model in different ways, for instance: (a) with a fat-tailed distribution of the conditional
mean innovations, εt, t = 1, ..., T , (b) with a “leverage” effect, in which εt and ζt are correlated, (c)
with SV in mean effects and (d) allowing for breaks in the model parameters.
Over the years, many papers have dealt with incorporating these extensions, see Chib et al. (2002),
Berg et al. (2004), Jacquiera et al. (2004), Omori et al. (2007), Nakajima and Omori (2012) and Chan
(2014). However, from a computational point of view, practitioners often face two main challenges
when dealing with the above extensions.
1. While sampling the model parameters, θ, from their conditional posteriors is relatively easy,
sampling h1:T ∼ p (h1:T | θ, YT ) is often difficult. Specifically, the degree of difficulty which
one encounters with regards to sampling h1:T ∼ p (h1:T | θ, YT ) depends on the specific model
structure at hand. For instance, we can estimate SV and its fat-tailed extension by using the
so-called auxiliary mixture sampler, see Kim et al. (1998) and Chib et al. (2002). However,
this approach is rather model specific, and cannot be easily generalized to estimate SV models
with leverage or with in mean effects. Over the years, some papers have dealt with Gibbs
sampling estimation of these extensions by either extending the auxiliary mixture sampler of
Kim et. al. (1998), see Omori et al. (2007), or adopting an accept-reject Metropolis-Hastings
procedure to draw h1:T from its conditional posterior, see Chan (2014). However, these extensions
require relatively more programming effort. More importantly, one cannot use a unified sampling
algorithm in order to accommodate (a), (b) and (c).
2. Even though sampling h1:T ∼ p (h1:T | θ, YT ) is possible within a Gibbs sampling scheme, ob-
taining p (YT | θ) which is necessary for ML and DIC computation is usually very cumbersome.
Often, one resorts to running additional sampling algorithms in order to obtain p (YT | θ), see
Kim et al. (1998), Chib et al. (2002) and Chan and Grant (2014).
On the other hand, for PG-AS, we can maintain the same program structure, incorporating minor
changes in the codes by directly changing the measurement or state equations inside the conditional
particle filter algorithm, see Section 4.3. At the same time, p (YT | θ) is directly available from the
particle filter, which can then be used to compute ML and DIC, see Section 4.3.1. Furthermore,
as stated in Section 4.1, PG-AS alleviates possible path degeneracy problems encountered in other
PMCMC algorithms in a very computationally elegant fashion through the ancestor step.
As correctly suggested by a referee, in our settings, we have the flexibility to sample some parameters
conditional on h1:T and YT , whereas we can sample other parameters marginally, i.e. conditional only
on YT . For instance, for the SV model, we can start by drawing h(i)1:T | θ(i−1), YT using the conditional
particle filter with ancestor sampling, see Section 4.3. Then, we can draw µ(i) | h(i)1:T , YT . However,
instead of sampling µ(i)h | φ
(i−1)h , σ
2(i−1)h , h
(i)1:T , φ
(i)h | µ
(i)h , σ
2(i−1)h , h
(i)1:T and σ
2(i)h | µ(i)
h , φ(i)h , h
(i)1:T element-
by-element, we can sample µ(i)h , φ
(i)h , σ
2(i)h | µ(i), YT in one-block without conditioning on h
(i)1:T using the
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
86
particle marginal Metropolis-Hastings approach of Andrieu et al. (2010). Specifically, we draw the
candidate, ϕ∗ =(µ∗h, φ
∗h, σ
2∗h
)′, from a random walk proposal, ϕ∗ = ϕ(i−1) + cε(i). The prior of ϕ,
p (ϕ) ∝ p (µh) p (φh) p(σ2h
), is evaluated directly. Finally, for the PMMH step, we run two additional
particle filters to obtain the estimates of p(YT | µ(i), ϕ∗
)and p
(YT | µ(i), ϕ(i−1)
)in order to perform
Metropolis-Hastings (M-H) to either accept or reject ϕ∗5, see equation (13) in Andrieu et al. (2010)
and Flury and Shephard (2011).
As we shall see, the above approach is very advantageous in the context of estimating the SV model
with leverage effect, SVL, see Section 4.3. Specifically, due to the leverage effect, ρh, the conditional
posteriors of µh, φh and σ2h for the SVL model are different than those given in Kim et al. (1998), see
Nakajima and Omori (2012). Therefore, we need to perform a good deal of modifications in the codes
if we are to draw these parameters element-by-element. However, the PG-AS/PMMH combination
avoids the need to perform major modifications in the codes.
4.3. Particle Gibbs with Ancestor Sampling
In the following, we describe the steps of the conditional particle filter with ancestor sampling (CPF-
AS) which is used to draw h1:T from p (h1:T | θ, YT ), see Lindsten et al. (2012)6. Consider (4.2.1)-
(4.2.2), let i = 1, ..., N denote the number of Gibbs sampling iterations, j = 1, ...,M denote the number
of particles, and let p (yt | θ, ht, Yt−1) denote the density of yt given θ, ht and Yt−1. Finally, let h(i−1)1:T
be a fixed reference trajectory of h1:T sampled at iteration i − 1 of the Gibbs sampler. The steps of
CPF-AS (particle filter conditional on h(i−1)1:T ) are as follows
1. if t = 1
(a) Draw h(j)1 | θ for j = 1, ...,M − 1 and set h
(M)1 = h
(i−1)1 .
(b) Set w(j)1 = τ
(j)1 /ΣM
k=1τ(k)1 , where τ
(j)1 = p
(y1 | θ, h(j)
1 , Y0
)for j = 1, ...,M .
2. else for t = 2 to T do
(a) Resample h(j)t−1
M−1j=1 using indices a
(j)t , where p
(a
(j)t = k
)∝ w(k)
t−1.
(b) Draw h(j)t | h
(a(j)t
)t−1 , θ for j = 1, ...,M − 1.
(c) Set h(M)t = h
(i−1)t .
(d) Draw a(M)t from p
(a
(M)t = j
)∝ w(j)
t−1p(h
(i−1)t | h(j)
t−1, θ)
.
(e) Set h(j)1:t =
(h
(a(j)t
)1:t−1 , h
(j)t
), w
(j)t = τ
(j)t /ΣM
k=1τ(k)t , where τ
(j)t = p
(yt | θ, h(j)
t , Yt−1
).
3. end for
4. Sample h(i)1:T | θ, YT with p
(h
(i)1:T = h
(j)1:T | θ, YT
)∝ w(j)
T .
5Intuitively, the latter approach should be more efficient in the sense that sampling ϕ marginally in one-block willprobably produce draws of ϕ that are relatively less autocorrelated. In fact, we estimate the SV model using bothmethods and compare the inefficiency factors for each procedure. Overall, compared to sampling the volatilityparameters element-by-element conditional on h1:T , we find that sampling ϕ all-at-once using PMMH reduces theinefficiency factors of φh and σ2
h. On the other hand, the inefficiency factor of µh increases.6As correctly pointed out by a referee, it is important to note that we are actually drawing from p
(h1:T , a
(M)1:T | θ, YT
),
where we draw a(M)1:T in step (d). Thus, from a technical point of view, we are not drawing from the true conditional
posterior, p (h1:T | θ, YT ), but from a close approximation, p(h1:T , a
(M)1:T | θ, YT
). However, in order to ease the
notation burden and avoid unnecessary confusion, we use the notation p (h1:T | θ, YT ) in the text.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
87
Notice that CPF-AS is akin to a standard particle filter, but with the difference that hM1:T is specified
a priori and serves as a reference trajectory. Hence, we use only M − 1 particles at each step.
Furthermore, whereas in the particle Gibbs algorithm of Andrieu et al. (2010), we set a(M)t = M , in
PG-AS, we sample a new value for the index variable, a(M)t , in an ancestor sampling step, (d). Even
though this is a small modification of the algorithm, improvements in mixing can be quite considerable,
see Lindsten et al. (2012) and Lindsten et al. (2014).
Extending the plain SV model within the PG-AS framework is often very straightforward. For
instance, assume that εt ∼ St (v), where St stands for the Student-t distribution with v > 2 degrees
of freedom. For this specification, at the ith iteration of the PG-AS sampler, we can follow Bollerslev
(1987) and use
p (yt | θ, ht, Yt−1) =Γ(v+1
2
)Γ(v2
)√(v − 2)π
1
σt
(1 +
(yt − µ)2
(v − 2)σ2t
)−(v+1)/2
(4.3.1)
inside the CPF-AS algorithm and obtain h(i)1:T . Conditional on h
(i)1:T and YT , we can perform In-
dependence Chain Metropolis-Hastings (M-H) to sample µ and v. For instance, in order to sample
v(i) ∼ p(v | µ(i), h
(i)1:T , YT
), we generate a candidate, v∗, from q (v) ∼ TN]2,∞[ (vML, V ), where TN]2,∞[
stands for the truncated Normal density on the domain ]2,∞[. vML is obtained by maximizing (4.3.1)
with respect to v using the already obtained values of h(i)1:T and µ(i). We set V = c · var (vML), where
c ∈ R+ and fine-tune V by adjusting c such that we can get a decent M-H acceptance ratio around 50
to 60%. The M-H acceptance probability is given as
aMH
(v∗, v(i−1)
)= min
1,p(v∗ | µ(i), h
(i)1:T , YT
)q(v(i−1)
)p(v(i−1) | µ(i), h
(i)1:T , YT
)q (v∗)
. (4.3.2)
We draw u from U (0, 1) and accept v∗, i.e. v(i) = v∗, if aMH
(v∗, v(i−1)
)> u, else v(i) = v(i−1).
Thereafter, we sample ϕ element-by-element, see Chib et al. (2002). For the SV models with heavy-
tailed errors, we also sample(µh, φh, σ
2h
)′using PMMH. However, we do not find any differences to
drawing these parameters conditional on h1:T using standard Gibbs sampling techniques.
Incorporating leverage or stochastic volatility in mean effects is also very straightforward using
PG-AS. First, we note that when the disturbances are conditionally Gaussian, we can write ζt as
ζt = ρhεt +√(
1− ρ2h
)ξt, ξt ∼ N (0, 1) and corr (εt, ξt) = 0. Thus, (4.2.2) can be reformulated as
ht = µh + φh (ht−1 − µh) + ρhσh (yt−1 − µ) exp (−ht−1/2) + σh
√(1− ρ2
h
)ξt.
Hence, the model adopts the above Gaussian nonlinear state-space form where the parameter, ρh,
measures the leverage effect. Alternatively, we can write the measurement error, εt, as εt = ρhζt +√(1− ρ2
h
)ξt. This way yt | ζt ∼ N
(ρh exp (ht/2) ζt,
(1− ρ2
h
)exp (ht)
), where ht follows (4.2.2),
see Malik and Pitt (2011). Thereafter, we proceed to sample µ | h1:T , YT and then sample ϕ =(µh, φh, σ
2h, ρh
)′all-at-once conditional on µ and YT using PMMH. We also ensure that
∣∣∣φ(i)h
∣∣∣ < 1,
σ2(i)h > 0 and
∣∣∣ρ(i)h
∣∣∣ < 1 by resampling these parameters until the conditions are satisfied. Furthermore,
we use the following random walk proposals for the parameters: 4µ(i)h = 0.3162ε
(i)1 , 4φ(i)
h = 0.01ε(i)2 ,
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
88
4σ2(i)h = 0.01ε
(i)3 and 4ρ(i)
h = 0.03ε(i)4 , where εk ∼ N (0, 1) for k = 1, ..., 4. This way, we obtain a M-H
acceptance ratio around 35 to 40%, see also Flury and Shephard (2011). However, contrary to Flury
and Shephard (2011), we sample ϕ all-at-once and not element-by-element. The latter approach is
computationally very demanding as after drawing h1:T and µ, we need to run a particle filter scheme
eight times at each PG-AS iteration in order to sample ϕ.
In a similar fashion, incorporating SV in mean effects (SVM) within the PG-AS context is also
very easy. For this specification, we note that p (yt | µ, λ, ϕ, ht, Yt−1) ∼ N (µ+ λ exp (ht) , exp (ht)).
Thus, we only need to modify step (e) of CPF-AS. We set τ(j)t = N
(µ+ λ exp
(h
(j)t
), exp
(h
(j)t
)),
j = 1, ...,M − 1 instead of τ(j)t = N
(µ, exp
(h
(j)t
))in case of the plain SV model. We then sample
ϕ =(µh, φh, σ
2h
)′element-by-element conditional on h1:T . The pair (µ, λ) is sampled in one block
from its Gaussian conditional posterior. Finally, we can combine SVM effects with heavy tails by
using (4.3.1), (4.3.2) and modifying step (e) of CPF-AS.
4.3.1. Model comparison using the output from PG-AS
One of the main outputs from CPF-AS is the log-likelihood of YT with h1:T integrated out, p (YT | θ).This quantity is the product of the individual integrated likelihood contributions
p (YT | θ) =T∏t=1
p (yt | θ, Yt−1) . (4.3.3)
For instance, (4.3.3) can be used to compute the marginal likelihood (ML) for a particular model. The
marginal likelihood is defined as
p (YT ) =
ˆΘp (YT | θ) p (θ) dθ. (4.3.4)
Equation (4.3.4) is a measure of the success the model has in accounting for the data after the
parameter uncertainty has been integrated out over the prior, p (θ). Gelfand and Dey (1994) propose
a very compelling and general method to calculate ML. It is efficient and utilizes the same routines
when calculating ML for different models. The Gelfand-Dey (G-D) estimate of ML is given as
1
N
N∑i=1
g(θ(i))
p(YT | θ(i)
)p(θ(i)) → p (YT )−1 as N →∞, (4.3.5)
where an estimate of p (YT | θ), p (YT | θ) = ΠTt=1M
−1ΣMj=1τ
(j)t , is directly available from CPF-AS
and E[
p (YT | θ)]
= p (YT | θ), see Flury and Shephard (2011). Gelfand and Dey (1994) show that
if g(θ(i))
is thin-tailed relative to p(YT | θ(i)
)p(θ(i)), then (4.3.5) is bounded and the estimator is
consistent. Following Geweke (2005) the truncated Normal distribution, TN (θ∗,Σ∗), is used for g (θ).
θ∗ and Σ∗ are the posterior sample moments calculated as
θ∗ =1
N
N∑i=1
θ(i) and Σ∗ =1
N
N∑i=1
(θ(i) − θ∗
)(θ(i) − θ∗
)′
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
89
whenever θ(i) is in the domain of the truncated Normal. This domain, Θ is defined as
Θ =
θ :(θ(i) − θ∗
)′(Σ∗)−1
(θ(i) − θ∗
)≤ χ2
α (z)
,
where z is the dimension of the parameter vector and χ2α (z) is the αth percentile of the Chi-
squared distribution with z degrees of freedom. In practice, 0.5, 0.75, 0.95 and 0.99 are popular
selections for α. Once the marginal likelihood for different specifications has been calculated, we
can compare them using Bayes factors, BF. The relative evidence for model MA versus MB is
BFMAB= p (YT | MA) /p (YT | MB). Kass and Raftery (1995) recommend considering twice the
logarithm of the Bayes factor for model comparison and suggest a rule-of-thumb of support for MA
based on 2 logBFMAB: 0 to 2 not worth more than a bare mention, 2 to 6 positive, 6 to 10 strong,
and greater than 10 as very strong.
We can also use p (YT | θ) and compute the deviance information criterion (DIC) of Spiegelhalter et
al. (2002). DIC is a compelling alternative to AIC or BIC, and it can be applied to nested or non-nested
models. Calculation of DIC in a PG-AS scheme is trivial. Contrary to AIC or BIC, DIC does not
require maximization over the parameter space. DIC is a combination of p (YT | θ) and a penalty term,
pD. This term describes the complexity of the model and serves as a penalization term that corrects
deviance’s propensity toward models with more parameters. More precisely, pD = D (θ)−D(θ), where
D (θ) is approximated by N−1ΣNi=1 − 2 log p
(YT | θ(i)
)and D
(θ)
= −2 log p(YT | θ
). θ is estimated
from the PG-AS output using the mean or mode of the posterior draws. The DIC is defined as
DIC = D(θ)
+ 2pD. It is worth noting that the best model is the one with the smaller DIC. Very
roughly, for differences of more than 10, we might definitely rule out the model with the higher DIC.
Furthermore, as pointed out by Spiegelhalter et al. (2002), we must be cautions against using ML
as a basis against which to assess DIC. ML addresses how well the prior has predicted the observed
data, whereas DIC addresses how well the posterior might predict future data generated by the same
parameters that give rise to the observed data.
4.4. Dow Jones Industrial Average
We summarize our stochastic volatility models in Table 4.1. Besides the models discussed in Section
4.3, we also consider versions of the moving average model with SV errors, i.e. yt = µ+ εt+ψεt−1, see
Chan (2013) and Chan and Hisao (2013). Furthermore, for comparison, we also estimate a standard
GARCH(1,1) model
yt = µ+ σtεt, εt ∼ N (0, 1) , σ2t = ω + aε2
t−1 + bσ2t−1, (4.4.1)
where ω > 0, a > 0, b > 0 and a + b < 1. For this model, we also choose to sample ϕ = (ω, a, b)′
all-at-once using the Independence Chain Metropolis-Hastings algorithm. Furthermore, we also ensure
that a(i) + b(i) < 1 by resampling a(i) > 0 and b(i) > 0, until a(i) + b(i) < 1. We assume the same
priors as in Kim et al. (1998) for µh, φh and σ2h. Furthermore, we assume that µ, λ ∼ N (0, 10),
ρh, ψ ∼ TN]−1,1[ (0, 10) and v ∼ U (2, 128), where U stands for the Uniform distribution with lower
(upper) endpoint of 2(128), see Chib et al. (2002). Finally, the priors on θ for the GARCH(1,1) model
are independent Normals with mean 0, variance 10, truncated (except for µ) to satisfy the restrictions.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
90
Table 4.1.: Model specifications
Model specification description
1 GARCH(1,1) plain GARCH(1,1) model, (4.4.1).
2 SV plain stochastic volatility model, (4.2.1)-(4.2.2).
3 SVL stochastic volatility model with leverage effect.
4 SVM stochastic volatility in mean model.
5 SV-MA(1) SV model where the measurement error, εt, follows a MA(1) process.
6 SVt stochastic volatility model with Student-t distributed errors, i.e. εt ∼ St (v).
7 SVMt stochastic volatility in mean model with Student-t distributed errors.
8 SVt-MA(1) SVt model where the measurement error, εt ∼ St (v), follows a MA(1) process.
This table presents the SV model specifications. The second column lists the model label and the third column briefly
explains the model characteristics.
The top right panel of Figure 4.1 displays the daily DJIA index for the period 01/02/2007-12/31/2013,
for a total of T = 1740 observations, followed by the daily returns, and the posterior estimates
of σt = exp (ht/2), t = 1, ..., T for SVt-MA(1). From these figures strong differences in return and
volatility are immediately apparent. As expected, the conditional volatility drastically increases during
the financial crises of 2008. For each model, we set M = 100 (throughout this paper) and run each
model based on 20 parallel chains each of which is of length 20000 after a burn-in period of 1000, for a
total of 200000 posterior draws7. We also experiment with different values of M to find out its effects
on estimation results, see the Appendix.
Figure 4.1.: PG-AS sampler for SVt-MA(1), DJIA daily returns, 2007 to 2013
07 08 09 10 11 12 13 140.5
0.9
1.3
1.7
2.1x 10
4 (a)
07 08 09 10 11 12 13 14−10
−4
2
8
14(b)
07 08 09 10 11 12 13 140
2
4
6
8(c)
1
3
5
7
9
PG−AS, M=100
(d)
95%−tileσ
t
5%−tile
Graph (a): daily DJIA index, (b): daily DJIA returns, (c): posterior estimates of σt, t = 1, ..., T , for SVt-MA(1), (d):box plot of the inefficiency factors of h1:T for SVt-MA(1).7In order to calculate the numerical standard errors of ML and DIC, we use “brute force”, which is re-estimating the
models 20 times and estimating the numerical standard errors of ML and DIC by their sample standard deviations,see Berg et al. (2004) and Chan and Grant (2014). Overall, we find numerical standard errors of around 0.8.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
91
Parameter estimates, ML and DIC values for the models in Table 4.1 are reported in Table 4.2. Overall,
we find that SVt-MA(1) performs best in terms of ML and DIC. In general, fat-tailed errors, volatility
in mean effects, moving average errors and the leverage effect all seem to be useful additions to the
plain SV model.
Table 4.2.: Posterior means and standard deviations (in parentheses), DJIA daily returns
Parameter GARCH(1,1) SV SVL SVM SV-MA(1) SVt SVMt SVt-MA(1)
µ 0.067 0.089 0.115 0.125 0.089 0.088 0.126 0.091
(0.019) (0.019) (0.021) (0.023) (0.018) (0.017) (0.020) (0.016)
λ -0.061 -0.058
(0.023) (0.019)
ψ -0.070 -0.069
(0.025) (0.025)
µh -0.123 -0.364 -0.119 -0.163 -0.126 -0.127 -0.126
(0.293) (0.343) (0.295) (0.323) (0.371) (0.367) (0.379)
φh 0.979 0.979 0.979 0.983 0.986 0.986 0.987
(0.006) (0.006) (0.006) (0.006) (0.005) (0.005) (0.005)
σ2h 0.052 0.051 0.051 0.043 0.034 0.033 0.032
(0.012) (0.013) (0.011) (0.010) (0.008) (0.009) (0.008)
ρh -0.367
(0.074)
v 9.362 9.456 9.264
(2.050) (2.057) (2.014)
ω 0.024
(0.002)
a 0.105
(0.020)
b 0.875
(0.020)
log(ML), α = 0.50 -2525.175 -2489.643 -2477.678 -2488.327 -2477.303 -2473.272 -2471.481 -2468.169
log(ML), α = 0.75 -2524.769 -2489.237 -2477.273 -2487.922 -2476.898 -2472.867 -2471.076 -2467.764
log(ML), α = 0.95 -2524.533 -2489.001 -2477.037 -2487.685 -2476.662 -2472.630 -2470.839 -2467.527
log(ML), α = 0.99 -2524.492 -2488.960 -2476.995 -2487.644 -2476.620 -2472.589 -2470.798 -2467.486
DIC 5035.615 4958.287 4939.868 4950.292 4939.034 4934.034 4929.033 4926.707
Rank 8 7 5 6 4 3 2 1
This table reports posterior means and standard deviations for various SV models using DJIA daily returns. log(ML):logarithm of the marginal likelihood for the corresponding value of α. DIC: deviance information criterion. Rank: rankof the model based on ML and DIC. Total number of observations, T = 1740.
Furthermore, SVt, SVMt and SVt-MA(1) outperform their Gaussian counterparts both in terms of
ML and DIC. Finally, the plain SV model outperforms the GARCH(1,1) model. For instance, the
logBF (DIC) of SV versus GARCH(1,1) is 35.53 (77.32).
We report posterior means and standard deviations of the model parameters in Table 4.2. It can
be seen that the estimated means and standard deviations of the parameters appear quite reasonable
and comparable with previous estimates reported in the literature. Typically, the volatility process
is estimated to be highly persistent. For SVL, the posterior mean of ρh is −0.36 with a posterior
standard deviation of 0.07. This suggests that the leverage effect is an important feature for the
DJIA returns. Furthermore, the posterior estimates of λ and ψ both support in mean and serial
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
92
dependence extensions. For instance, for ψ = 0, the SV-MA(1) model reduces to the plain SV model.
The posterior mean of ψ is estimated at −0.07 with a posterior standard deviation of 0.02, i.e. the
posterior distribution of ψ has little mass around zero. Furthermore, SV-MA(1) outperforms SV. For
models with Student-t distributed errors, we estimate the posterior mean of v at around 9, similar to
the values in Chib et al. (2002) and Berg et al. (2004), respectively.
We report the inefficiency factors of h1:T for SVt-MA(1) in panel (d) of Figure 4.1. The inefficiency
factor, RB, is defined as RB = 1 + 2∑B
l=1 ρ (l), where ρ (l) is the sample autocorrelation at lag l,
and B is the bandwidth, see Kim et al. (1998) for a further background on this measure. In these
calculations, we choose a bandwidth, B, of 100. Furthermore, note that ht is of length T , so we have
a total of T inefficiency factors. Therefore, we use box plots to report this information. The middle
line of the box denotes the median, while the lower and upper lines represent the 25% and 75%-tiles,
respectively. For instance, the box plot indicates that about 75% of the log-volatilities have inefficiency
factors of less than 5, and the maximum is close to 6.8. Overall, given M = 100, we see that PG-AS
is very capable of producing draws of h1:T that are not highly autocorrelated, see the Appendix.
4.5. Structural Breaks and PG-AS
In this section, we combine PG-AS with the change-point (or structural break) specification of Chib
(1998). We believe that this combination is the most important contribution that we provide in this
paper. In fact, only relatively few papers have addressed combining models with SV effects with what
in the literature is considered as a “state-of-the-art” structural break model.
We start by using a change-point autoregressive model with stochastic volatility in mean effects,
CP(m)-AR(p)-SVM. We follow Liu and Maheu (2008) and estimate our model conditional on 0, 1, ...,m
breaks. For each of these, we calculate ML, DIC and use them to determine the optimal number of
change points. Specifically, we can compare ML using Bayes factors, and use differences in the DIC
between different models to determine the number of structural breaks. Furthermore, in our analyses,
we do not get conflicting results with regards to change-point identification using ML and DIC.
Assume that there are m − 1, m ∈ 1, 2, ... change points at unknown times, τ1, τ2, ..., τm−1.
Separated by those change points, there are m different regimes. The m-state change-point linear
regression model with simultaneous breaks in the SVM coefficients is given as
yt = Xt−1βst + λstγst exp (ht) +√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.1)
ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) , (4.5.2)
where γst = exp (µh,st), st = 1, ...,m, s1:T = (s1, ..., sT )′
and st = k indicates that yt is from regime
k. As before, YT = (y1, ..., yT )′
is T × 1, XT is a T × n, matrix of regressors with row Xt−1 which can
also include lags of yt and βst is n× 1. The one-step ahead transition matrix for st is
P =
p11 p12 0 · · · 0
0 p22 p23 · · · 0...
......
......
...... 0 pm−1,m−1 pm−1,m
0 0 · · · 0 1
, (4.5.3)
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
93
where plk = Pr (st = k | st−1 = l) with k = l or k = l+ 1. plk is the probability of moving from regime
l at time t − 1 to regime k at time t. P ensures that given st = k at time t, the next period t + 1,
st + 1 remains in the same state or jumps to the next state. Once the last regime is reached, we stay
there forever, that is pm,m = 1. This structure enforces the following ordering
θt =
θ1 if t < τ1
θ2 if τ1 ≤ t < τ2
......
...
θm−1 if τm−2 ≤ t < τm−1
θm if τm−1 ≤ t
on the change points. Let θk =(β′k, λk, γk, φh,k, σ
2h,k
)′, k = 1, ...,m denote the parameters in regime k.
Modeling (4.5.1)-(4.5.2) using (4.5.3) is straightforward. Specifically, we proceed by cycling through
the following steps8
1. s1:T | βkmk=1, λkmk=1, γkmk=1, φh,kmk=1, σ2h,kmk=1, P, h1:T , XT , YT .
2. h1:T | βkmk=1, λkmk=1, γkmk=1, φh,kmk=1, σ2h,kmk=1, s1:T , XT , YT .
3. βk, λkmk=1 | γkmk=1, s1:T , h1:T , XT , YT .
4. γkmk=1 | βkmk=1, λkmk=1, s1:T , h1:T , XT , YT .
5. φh,kmk=1 | σ2h,kmk=1, s1:T , h1:T .
6. σ2h,kmk=1 | φh,kmk=1, s1:T , h1:T .
7. P | s1:T .
In step 1, we use the algorithm of Chib (1998) to draw s1:T , see Liu and Maheu (2008). h1:T is sampled
using CPF-AS from Section 4.3 conditional on the newly draws of s1:T and θ1, ..., θm from the previous
Gibbs iteration. The parameters within each regime: βk, λkmk=1, φh,kmk=1 and σ2h,kmk=1 are sampled
using standard Gibbs sampling techniques conditional on the newly draws of s1:T and h1:T . However,
for (4.5.1)-(4.5.2), the conditional posterior of γk does not have a closed form solution. Therefore, we
sample γk, k = 1, ...,m, using the Independence Chain Metropolis-Hastings algorithm. Specifically,
let hk = ht : st = k, Xk = Xt−1 : st = k, Yk = yt : st = k denote observations in regime k. At
iteration i, we sample γ∗k ∼ TN]0,∞[ (γk,ML, Vk), where γk,ML is obtained by maximizing the likelihood
of (4.5.1) conditional on β(i)k , λ
(i)k , h
(i)k , X
(i)k and Y
(i)k . As before, we set Vk = ck · var (vk,ML). The
M-H acceptance probability of γ∗k is given as
aMH
(γ∗k , γ
(i−1)k
)= min
1,p(γ∗k | β
(i)k , λ
(i)k , h
(i)k , X
(i)k , Y
(i)k
)q(γ
(i−1)k
)p(γ
(i−1)k | β(i)
k , λ(i)k , h
(i)k , X
(i)k , Y
(i)k
)q(γ∗k) .
8It is also possible to generate h1:T and s1:T simultaneously in one step. However, this approach requires majormodifications inside CPF-AS. We choose to sample h1:T and s1:T sequentially because it is computationally easierand more intuitive. Indeed, using the latter approach, we only need to incorporate a procedure to draw s1:T . Thus,we do not need to modify anything inside CPF-AS. In fact, this underlines one of the main points of this paper,namely, that we can estimate complicated models without the need for major modifications inside CPF-AS.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
94
The conditional posterior of pkk, k = 1, ...,m− 1, is Beta (a0 + nkk, b0 + 1), where nkk is the number
of one-step transitions from state k to state k in a given sequence of s1:T .
We let(β′k, λk
)∼ N
(0(n+1), I(n+1)
), γk ∼ IG
(42 ,
0.22
), (φh,k + 1) /2 ∼ Beta (20, 1.5), σ2
h,k ∼IG(
42 ,
0.22
), whereas before, IG
(.2 ,
.2
)stands for the Inverse-gamma density, see Kim and Nelson
(1999). Furthermore, we let pkk ∼ Beta (a0 = 20, b0 = 0.1) for k = 1, ...,m − 1. In this setting, most
priors are very uninformative, while the prior for pkk favors infrequent structural breaks. Finally,
notice that in order to obtain p (YT | θ, P ) in step 1, we use that
log p (YT | θ, P ) =T∑t=1
log p (yt | θ, P, Yt−1) , where
p (yt | θ, P, Yt−1) =
m∑k=1
p (yt | θ, Yt−1, st = k) p (st = k | θ, P, Yt−1) .
The first term, p (yt | θ, Yt−1, st = k), is obtained from CPF-AS. The last term is computed from
p (st = k | θ, P, Yt−1) =
k∑l=k−1
p (st−1 = l | θ, P, Yt−1) plk, k = 1, ...,m
p (st = k | θ, P, Yt) =p (st = k | θ, P, Yt−1) p (yt | θ, Yt−1, st = k)∑ml=1 p (st = l | θ, P, Yt−1) p (yt | θ, Yt−1, st = l)
, k = 1, ...,m.(4.5.4)
The last equation is obtained from Bayes’ rule. Note that in (4.5.4), the summation is only from k−1
to k, due to the restricted nature of the transition matrix.
4.5.1. Simulation example
As a simple illustration, consider data generated according to the following model
yt = µst + λstγst exp (ht) +√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.5)
ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) (4.5.6)
with st = 1, 2, µ1 = 1, µ2 = 0.1, λ1 = 0.2, λ2 = 1.2, γ1 = 1.4, γ2 = 0.2, φh,1 = 0.92, φh,2 = 0.87,
σ2h,1 = 0.04, σ2
h,2 = 0.02 and t = 1, ..., T = 500. The true date of the structural break is t = 230.
We estimate (4.5.5)-(4.5.6) conditional on m = 0, 1, 2 breaks. Both ML and DIC point that the
model with one break performs best. The top panel of Figure 4.2 compares the predictive mean of our
model (break) to a recursive OLS specification (no-break) along with yt and the estimated change-
point date. Both predictive means are basically similar before the break at t = 230. However, after
the break, we see a quick reduction in the predictive mean from the break model, while the predictive
mean from the no-break model remains high for a long time. Using the posterior mode of s(i)1:T Ni=1,
the estimated change-point date is t = 230. Clearly, our model is able to detect the correct date of
the break.
The marginal posteriors of γk, k = 1, 2 are bell shaped and centered around their means, 1.14 for
γ1 and 0.18 for γ2, respectively. In panels (c) and (d) of Figure 4.2, we plot the marginal posteriors
of λk, k = 1, 2. Similar to γk, the marginal posteriors of λk are bell shaped and centered around their
means, 0.48 for λ1 and 1.35 for λ2, respectively.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
95
Figure 4.2.: Simulation results
0 100 200 300 400 500−2
0
2
4
6(a)
0 100 200 300 400 5000
1.5
3
4.5
6(b)
−1.4 −0.2 1 2.2 3.4 4.60
0.3
0.6
0.9
1.2(c)
−1.4 −0.2 1 2.2 3.4 4.60
0.25
0.5
0.75
1(d)
yt
BreakNo−breakCP−date
True95%−tile
σ2t
5%−tile
Graph (a): data, predictive means and the estimated change-point date, (b): true σ2t process, posterior mean (solid line)
and 90% credibility intervals (dashed lines) of σ2t , (c) and (d): marginal posterior distributions of λk, k = 1, 2.
4.5.2. Real output
Recent literature documents a structural break in the volatility of US GDP growth rate, see Kim and
Nelson (1999), and Gordon and Maheu (2008). We follow Gordon and Maheu (2008) and consider
structural break estimates from structural break AR(2) models in real US GDP growth rate.
Let yt = 100 [log (qt/qt−1)− log (pt/pt−1)], where qt is quarterly seasonally adjusted US GDP and
pt is the US GDP price index. Our data ranges from 1947q4 to 2013q4, for T = 265 observations. In
the following, we compare the performance of (4.5.1)-(4.5.2) with 2 lags of yt, CP(m)-AR(2)-SVM to
CP(m)-AR(2)-SV and a simple change-point AR(2) model, henceforth CP(m)-AR(2)
yt = β1,st + β2,styt−1 + β3,styt−2 + σstεt, εt ∼ N (0, 1) . (4.5.7)
Equation (4.5.7) is estimated using Gibbs sampling, see Liu and Maheu (2008). We estimate (4.5.1)-
(4.5.2), (4.5.7) conditional on m = 0, 1, 2 change points. Thereafter, we determine the optimal number
of change points using ML and DIC. We also compute the marginal likelihood for the change-point
SV(M) models using the method of Sims et al. (2008). As pointed out in Sims et al. (2008), the G-D
method may not work for models with time-varying parameters as the posterior density tends to be
non-Gaussian. However, we do not find any significant changes compared to G-D. Thus, we choose
to retain these values. We also conduct a Monte Carlo analysis (not reported) generating data from
(4.5.1)-(4.5.2) for 0, .., 3 change points and comparing ML between different specifications.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
96
Table 4.3.: Posterior means and standard deviations (in parentheses), US quarterly GDP growth rate
Parameter AR(2) AR(2)-SV AR(2)-SVM CP(1)-AR(2) CP(1)-AR(2)-SV CP(1)-AR(2)-SVM
β1,1 0.461 0.436 0.436 0.560 0.574 0.772
(0.078) (0.071) (0.088) (0.123) (0.128) (0.313)
β2,1 0.337 0.299 0.299 0.323 0.325 0.321
(0.061) (0.065) (0.065) (0.084) (0.086) (0.084)
β3,1 0.093 0.159 0.156 0.052 0.056 0.049
(0.061) (0.063) (0.064) (0.083) (0.086) (0.083)
β1,2 0.275 0.365 0.740
(0.080) (0.090) (0.193)
β2,2 0.340 0.258 0.145
(0.090) (0.098) (0.111)
β3,2 0.255 0.258 0.217
(0.087) (0.088) (0.087)
λ1 0.007 -0.181
(0.105) (0.360)
λ2 -1.302
(0.645)
σ21 0.804 1.229
(0.070) (0.147)
σ22 0.282
(0.038)
γ1 0.686 0.682 1.081 0.976
(0.384) (0.350) (0.232) (0.215)
γ2 0.235 0.187
(0.059) (0.046)
φh,1 0.950 0.949 0.829 0.820
(0.026) (0.027) (0.101) (0.106)
φh,2 0.831 0.842
(0.091) (0.083)
σ2h,1 0.078 0.078 0.058 0.059
(0.034) (0.035) (0.030) (0.029)
σ2h,2 0.086 0.089
(0.051) (0.046)
log(ML), α = 0.50 -362.802 -305.252 -303.962 -312.651 -287.933 -279.568
log(ML), α = 0.75 -362.396 -304.847 -303.556 -312.246 -287.527 -279.162
log(ML), α = 0.95 -362.160 -304.610 -303.320 -312.009 -287.291 -279.926
log(ML), α = 0.99 -362.119 -304.569 -303.278 -311.968 -287.249 -279.885
DIC 701.187 657.393 656.923 666.927 643.983 639.892
Rank 6 4 3 5 2 1
This table reports posterior means and standard deviations for various AR(2) and CP(m)-AR(2) models. The parametersassociated with each regime are labeled with subscript 1, ...,m. log(ML): log-marginal likelihood for the correspondingvalue of α. DIC: deviance information criterion. Rank: rank of the model based on ML and DIC. Total number ofobservations, T = 265.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
97
Results indicate that the G-D method correctly identifies the true number of structural breaks9. We
find that all change-point models produce similar results suggesting that one structural break has
occurred. Compared to AR(2)-SVM, the logBF in favor of CP(1)-AR(2)-SVM is 23, see Table 4.3.
CP(1)-AR(2)-SVM also dominates its constant parameter counterpart in terms of DIC. Accordingly,
change-point SV specifications perform better than (4.5.7). At the same time, CP(1)-AR(2)-SVM
performs better than CP(1)-AR(2)-SV.
The posterior density of the change point for CP(1)-AR(2)-SVM is plotted in Figure 4.3. Using the
mode of s(i)1:T Ni=1, CP(1)-AR(2) and CP(1)-AR(2)-SV point that the break date is 1983q3, identical
to the break date of Gordon and Maheu (2008) and close to the break date of Kim and Nelson (1999).
Specifically, Kim and Nelson (1999) find evidence of a break in 1984q1 using data from 1953q2 to
1997q1. On the other hand, the mode of s(i)1:T Ni=1 for CP(1)-AR(2)-SVM points that the break date
is 1984q2. Evidently, structural break models indicate a significant one-time drop in the volatility
of yt. For CP(1)-AR(2), the first regime implies an unconditional variance for the US GDP growth
rate of 1.22, while for the second regime it is 0.28. For CP(1)-AR(2)-SVM, we find that γ2 = 0.18 is
estimated at a lower rate than γ1 = 0.97, confirming a significant fall in the average volatility of real
US GDP growth rate since the 1980s. For all change-point models, we also find that the unconditional
mean of yt falls after the break. Furthermore, for CP(1)-AR(2)-SVM, we estimate λ2 at −1.30 with a
posterior standard deviation of 0.64, whereas we estimate λ1 at −0.18 with a relatively larger posterior
standard deviation, see panel (d) of Figure 4.3.
Figure 4.3.: PG-AS sampler for US quarterly real GDP growth rate from 1947q4 to 2013q4
50 60 70 80 90 00 10−3
−1
1
3
5(a)
50 60 70 80 90 00 100
0.8
1.6
2.4
3.2(b)
50 60 70 80 90 00 100
0.06
0.12
0.18
0.24(c)
−4 −3 −2 −1 0 1 20
0.4
0.8
1.2
1.6(d)
95%−tile
σ2t
5%−tile
λ1
λ2
Graph (a): US quarterly real GDP growth rate, (b): posterior estimates of σ2t , t = 1, ..., T , for CP(1)-AR(2)-SVM, (c):
change-point density for CP(1)-AR(2)-SVM, (d): marginal posterior distribution of λk, k = 1, 2.
9We estimate the marginal likelihood of (4.5.7) for m = 0, 1, 2 structural breaks using the method of Chib (1995), seeLiu and Maheu (2008).
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
98
Accordingly, results point that volatility feedback has a much more negative and deeper impact on the
GDP growth rate after the structural break, confirming the effects of the Great Moderation10. The
posterior density of the change point and the estimates of σ2t = γst exp (ht), t = 1, ..., T for CP(1)-
AR(2)-SVM are also plotted in Figure 4.3. Evidently, besides a significant reduction in the volatility
of yt since the 1980s, results also point to a gradual reduction in the volatility of yt during the 1960s
followed by a subsequent increase up until the break point at 1984q211.
Finally, Table 4.4 displays out-of-sample results for one-period ahead direct forecasts (see Marcellino
et al. (2005)) for the models given in Table 4.3. In general, we carry out a forecasting exercise for
a specific out-of-sample period. We first estimate the models using the initial sample and forecast.
Then, we add one data point, update and forecast again, until the end of the out-of-sample data. This
strategy works for AR(2), AR(2)-SV and AR(2)-SVM as we do not need to specify the number of
structural breaks over the out-of-sample data. In the context of forecasting with the break models, we
want the optimal change-point number to vary over time as the number of regimes can increase as time
goes by. Thus, we follow Bauwens et al. (2011) and perform the following: for the first out-of-sample
observation at time t, we calculate ML and DIC for 1, ...,Kt−1 change points using Yt−1. Thereafter,
we choose the optimal change-point number, K∗t−1, and calculate the predictive mean, E [yt | Yt−1],
using the parameters associated with specification K∗t−112. Thereafter, we increase the out-of-sample
with one observation, calculate ML and DIC for 1, ...,K∗t−1 + 1 change points, choose the optimal K∗t
and repeat the above forecasting procedure to obtain E [yt+1 | Yt]. Furthermore, in addition to MAE
and RMSE, forecasts are also compared using the linear exponential (LINEX) loss function of Zellner
(1986). This loss function is defined as L (yt, yt) = b [exp (a (yt − yt))− a (yt − yt)− 1], where yt is the
forecast. L (yt, yt) ranks overprediction (underprediction) more heavily for a > 0 (a < 0).
Overall, structural break specifications offer improvements in terms of point forecasts. However,
compared to their respective constant parameter counterparts the improvements that they offer are
quite modest. For instance, we find that CP(m)-AR(2)-SVM outperforms AR(2)-SVM by only about
5% (9%) in terms of MAE (RMSE). When LINEX is used, the breaks model’s ability to capture
variations in higher moments also tend to provide gains in terms of point forecasts.
Table 4.4.: Out-of-sample forecasts
Model MAE RMSE LINEX, a = 1, b = 1 LINEX, a = −1, b = 1
AR(2) 0.573 0.611 0.401 0.407
AR(2)-SV 0.573 0.608 0.395 0.411
AR(2)-SVM 0.572 0.607 0.396 0.408
CP(m)-AR(2) 0.572 0.601 0.406 0.384
CP(m)-AR(2)-SV 0.570 0.604 0.418 0.377
CP(m)-AR(2)-SVM 0.543 0.553 0.358 0.364
This table reports mean absolute error (MAE) and root mean squared error (RMSE) for forecasts based on the predictivemean for one-period ahead. Furthermore, the average LINEX loss function is reported for a = 1, a = −1 and b = 1. Theout-of-sample period is from 1959q3 till the end of the sample.
10From an economic point of view, these results can possibly support the hypothesis that institutional changes have con-tributed to the reduction in the volatility of business cycle fluctuations. However, any major economic interpretationof these results is beyond the scope of the paper and is therefore left for future research.
11However, although γ2 < γ1 for CP(1)-AR(2)-SV and CP(1)-AR(2)-SVM, we find that the unconditional volatility of
volatility of yt, σh,st/√
1− φ2h,st
, increases in the second regime, which is very counterintuitive. We cannot find a
plausible explanation for this phenomenon. Furthermore, we also arrive at the same conclusion by manually splittingthe sample at 1984q2 and estimating AR(2)-SV and AR(2)-SVM models for each subsample.
12We do not get any conflicting results with regards to recursive change-point identification using ML and DIC.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
99
4.5.3. Structural break ARFIMA-SV model
In this section, we propose a structural break ARFIMA model with SV effects. Our model allows for
structural breaks in µ, d, autoregressive (AR), moving average (MA) coefficients, γ, φh and σ2h. The
change-point ARFIMA-SV model is as follows
yt − µst =Ψ (L)
Φ (L)(1− L)−dst
√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.8)
ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) , (4.5.9)
where st = 1, ...,m, Φ (L) = (1− φ1,stL− ...− φp,stLp) and Ψ (L) = (1 + ψ1,stL+ ...+ ψq,stLq) are
AR and MA polynomials in the lag operator, L, where Lpyt = yt=p for p = 0, 1, ... with integer orders
p ≥ 0 and q ≥ 0. The fractional difference operator, (1− L)−dst , with dst ∈ R is given by
(1− L)−dst =
∞∑j=0
Γ (j + dst)
Γ (j + 1) Γ (dst)Lj ,
where Γ (.) is the Gamma function. Equation (4.5.8) is a generalization of the ARMA model to
non-integer values of dst . Specifically, if dst > 0, the process is said to have long memory since the
autocorrelations die out at an hyperbolic rate. For 0 < dst < 0.5, (4.5.8) is a stationary long-memory
process with non-summable autocorrelation functions. For dst = 0, we have a change-point ARMA
model with stochastic volatility model. In this paper we assume that 0 < dst < 0.5 and that Φ (z) = 0,
Ψ (z) = 0 for unknown z do not have common roots.
In order to estimate (4.5.8)-(4.5.9), we rely mainly on the idea of Chan and Palma (1998). Specif-
ically, Chan and Palma (1998) consider an approximation of (4.5.8) based on a truncation lag of
order L. We proceed to draw θ = θkmk=1, where θk = (µk, dk, φ1,k, ..., φp,k, ψ1,k, ..., ψq,k)′, γ1, ..., γm,
φh,1, ..., φh,m, σ2h,1, ..., σ
2h,m, P and L from their respective conditional posteriors. However, the con-
ditional posteriors of θk do not have closed form solutions, see Raggi and Bordignon (2012). There-
fore, we sample θk using Metropolis-Hastings. As in Section 4.5.1, we note that θk depends only
on information in regime k. Thus, at the ith iteration of PG-AS, we can sample each element of
θk one-at-a-time using information in regime k. For instance, dk is sampled as follows. First, let
θ−dkk = (µk, φ1,k, ..., φp,k, ψ1,k, ..., ψq,k)′:
1. Then, sample a candidate, d∗k, from a Gaussian random walk proposal
q(d∗k | d
(i−1)k
)∼ TN]0,0.5[
(d
(i−1)k ,Σk
)We adjust Σk to get an acceptance rate around 30%. We experiment with different values of Σk,
until we find one which yields a reasonable acceptance rate probability.
2. Define the acceptance probability for d∗k as
aMH
(d∗k, d
(i−1)k
)= min
1,p(d∗k | θ
−dk(i−1)k , γ
(i−1)k , L(i−1), h
(i)k , Y
(i)k
)q(d
(i−1)k | d∗k
)p(d
(i−1)k | θ−dk(i−1)
k , γ(i−1)k , L(i−1), h
(i)k , Y
(i)k
)q(d∗k | d
(i−1)k
) .
3. Draw u ∼ U (0, 1). If u ≤ aMH
(d∗k, d
(i−1)k
), then set d
(i)k = d∗k, else set d
(i)k = d
(i−1)k .
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
100
We sample φh,k | σ2h,k, hk following Kim et al. (1998). σ2
h,k | φh,k, hk ∼ IG (νk/2, lk/2), vk = Tk + v0,
lk = ζ′kζk + l0, Tk is the number of observations in regime k and ζk = ζt : st = k. v0 and l0 are
prior hyperparameter values. γk | θk, L, hk, Yk ∼ IG (rk/2, gk/2), where rk = Tk + r0 and gk =
ε′kεk/ exp
(hk
)+ g0. As before, pkk | S ∼ Beta (a0 + nkk, b0 + 1), k = 1, ...,m− 1. Finally, we use the
method of Raggi and Bordignon (2012) to sample L from its conditional posterior.
We apply our model to a monthly time series of inflation, using the US City Average core consumer
price index (CUUR0000SA0L1E) of the Bureau of Labor Statistics (BLS). Our series excludes the
direct effects of price changes for food and energy. We denote the series by Pt and use data from
1960:1 until 2013:12, for a total of T = 648 observations. We follow Bos et al. (2012) and construct
the monthly US core inflation as πt = 100 log (Pt/Pt−1). To adapt for part of the seasonality in the
series, we regress the inflation on a series of seasonal dummies, D, as in π = Dβ + u. Instead of using
the original inflation, πt, we use yt = ut + π, where ut is the residual of adapting the inflation for the
major seasonal effects at time t, and π is the average inflation level.
We estimate (4.5.8)-(4.5.9) from 0 to 4 change points. We then choose the optimal number of
change points using ML and DIC. Both in terms of ML and DIC, results point that the specification
with 2 change points fits the data best. As before, we also compute the marginal likelihood for the
change-point ARFIMA(p,d,q)-SV models using the method of Sims et al. (2008) and do not find any
significant changes compared to G-D.
Table 4.5 reports estimation results for: a homoscedastic ARFIMA model, ARFIMA-SV and
ARFIMA-SV conditional on 2 change points, henceforth labeled as CP(2)-ARFIMA-SV. With re-
gards to model estimation, we experiment with different AR and MA lags. We find dst , φ11,st and
φ12,st to be significant. Thus, we assume that dst ∼ N (0, 1) truncated such that 0 < dst < 0.5,
φ1,st ∼ N (0, 1) , ..., φ12,st ∼ N (0, 1) and ensure that the roots of(1− φ1,stL− ...− φ12,stL
12)
lie
outside the unit circle by employing rejection sampling. However, φ1,st , ..., φ10,st are not significant
in our applications and are therefore fixed at zero. A suitable prior for the truncation lag, L, is
the Poisson truncated distribution with L ∈ Lmin, ..., Lmax. In this paper, we follow Raggi and
Bordignon (2012) and set Lmin = 10 and Lmax = 50. Finally, we follow Section 4.5.2 and assume
that γk, σ2h,k ∼ IG
(42 ,
0.22
), (φh,k + 1) /2 ∼ Beta (20, 1.5), k = 1, ...,m, and pkk ∼ Beta (20, 0.1),
k = 1, ...,m−1. In this setting, the prior on each element of θk is very standard, while the prior on pkk
favors infrequent structural breaks. In the Appendix, we evaluate sensitivity of the results to different
prior specifications by investigating alternative prior hyperparameter values on pkk.
We also report results for a change-point integrated moving average (IMA) model of order 1 with
SV effects, CP(m)-IMA(1,1 )-SV. This model corresponds to a CP(m)-ARFIMA(0,1,1 )-SV model for
yt, or a CP(m)-ARIMA(0,0,1 )-SV model for 4yt. In this model, changes in the long-run persistence
are captured by changes in the MA(1) parameter, ψst . We estimate CP(m)-IMA(1,1 )-SV from 0 to
4 change points. IMA(1,1 )-SV without structural breaks is equivalent to the unobserved components
model of Stock and Watson (2007). As before,we choose the optimal number of change points using
ML and DIC. Again, results point that the specification with 2 change points fits the data best. We
also report results for IMA(1,1 )-SV and CP(2)-IMA(1,1 )-SV on the right-hand side of Table 4.5.
Accordingly, we find that specifications with change points dominate ARFIMA(12,d,0). For in-
stance, compared to ARFIMA(12,d,0)-SV, the logBF in favor of CP(2)-ARFIMA(12,d,0)-SV is 75.
For the ARFIMA(12,d,0) model, the order of integration, d1, is estimated at 0.38. This implies that
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
101
US core inflation exhibits long-memory behavior. φ12,1 captures the main seasonal effects. The aver-
age inflation rate, µ1 is estimated at 0.17%. The residual standard deviation of the ARFIMA(12,d,0)
model, σ1 is at 0.18% per month. When we compare ARFIMA(12,d,0) with ARFIMA(12,d,0)-SV,
we find that d1 drops from 0.38 to 0.29. The AR coefficients, φ11,1 and φ12,1 increase from 0.11 and
0.24 to 0.14 and 0.36, respectively. The estimate of µ1, is also affected, being more precisely estimated
at a lower value. The SV component itself is nearly nonstationary as the autoregressive coefficient of
volatility, φh,1, is close to one and σ2h,1 is well identified at 0.03 with a standard error of 0.01. The
average variance of inflation, γ1, is estimated at 0.025%. The posterior mode of the change-point
density is associated with 1973:7 and 1984:2.
Table 4.5.: Posterior means and standard deviations (in parentheses), US core inflation rate
ARFIMA ARFIMA IMA CP(2)-ARFIMA CP(2)-IMA
(12,d,0) (12,d,0)-SV (1,1)-SV (12,d,0)-SV (1,1)-SV
Parameter mean std. dev mean std. dev mean std. dev mean std. dev mean std. dev
d1 0.384 (0.030) 0.299 (0.035) 0.202 (0.069)
d2 0.417 (0.053)
d3 0.154 (0.049)
ψ1 -0.840 (0.027) -0.847 (0.042)
ψ2 -0.569 (0.109)
ψ3 -0.857 (0.063)
µ1 0.171 (0.070) 0.115 (0.020) 0.135 (0.042)
µ2 0.536 (0.113)
µ3 0.072 (0.017)
φ11,1 0.116 (0.037) 0.149 (0.036) 0.203 (0.081)
φ12,1 0.241 (0.038) 0.368 (0.038) 0.302 (0.080)
φ11,2 0.027 (0.089)
φ12,2 0.097 (0.093)
φ11,3 0.150 (0.046)
φ12,3 0.505 (0.048)
σ21 0.032 (0.001)
γ1 0.025 (0.006) 0.028 (0.006) 0.035 (0.006) 0.034 (0.007)
γ2 0.060 (0.018) 0.063 0.020)
γ3 0.014 (0.003) 0.021 (0.010)
φh,1 0.975 (0.010) 0.965 (0.014) 0.792 (0.113) 0.821 (0.108)
φh,2 0.913 (0.055) 0.925 (0.046)
φh,3 0.838 (0.086) 0.854 (0.083)
σ2h,1 0.035 (0.011) 0.040 (0.012) 0.049 (0.024) 0.047 (0.022)
σ2h,2 0.059 (0.029) 0.063 (0.029)
σ2h,3 0.054 (0.024) 0.054 (0.024)
L 29.545 (4.499) 20.570 (5.379)
log(ML), α = 0.50 168.895 271.701 224.363 346.612 259.751
log(ML), α = 0.75 168.867 272.105 224.768 347.018 260.156
log(ML), α = 0.95 168.837 272.342 225.004 347.254 260.392
log(ML), α = 0.99 168.828 272.383 225.046 347.295 260.434
DIC -362.286 -521.928 -422.079 -549.385 -434.130
Rank 5 2 4 1 3
This table reports posterior means (mean) and standard deviations (std. dev) for different ARFIMA(p,d,q)-SV typemodels. The parameters associated with each regime are labeled with subscript 1, ...,m. log(ML): logarithm of themarginal likelihood using the corresponding value of α. DIC: deviance information criterion. Rank: rank of the modelbased on ML and DIC. Total number of observations, T = 648.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
102
Figure 4.4 displays the data, estimates of σ2t = γst exp (ht), t = 1, ..., T , and the posterior density of
the change-point dates for CP(2)-ARFIMA(12,d,0)-SV. Furthermore, the top right panel of Figure
4.4 shows a noticeable and persistent decrease in the volatility of inflation since the early 1980s. As
previously mentioned, this period is labeled as the Great Moderation, see Stock and Watson (2007).
Evidently, results from (4.5.8)-(4.5.9) conditional on 2 change points show that we can divide the
evolution of yt into three subsequent phases: the period from 1960:1-1973:6, 1973:7-1984:1 and 1984:2
till the end of the sample. Both dst and µst are much smaller after the last change point, d2 = 0.41
versus d3 = 0.15 and µ2 = 0.53 versus µ3 = 0.07. On the other hand, φ11,st , φ12,st increase from 0.02,
0.09 to 0.15, 0.50 respectively. At the same time, the estimate of γst almost doubles in the second
regime. On the other hand, γst falls from 0.06 to 0.01 after the last structural break. Furthermore,
the unconditional volatility of volatility, σh,st/√
1− φ2h,st
, rises from 0.36 in the first regime to 0.60 in
the second regime. Thereafter, it falls to 0.42 from the last change point till the end of the sample.
The last two columns of Table 4.5 report results for CP(2)-IMA(1,1)-SV. The estimate of ψ1,st rises
from −0.84 in the first phase to −0.56 in the second phase. ψ1,st drops to −0.85 in the subsequent
phase. Furthermore, similar to CP(2)-ARFIMA(12,d,0)-SV, the unconditional volatility of volatility
drops from the last change point till the end of the sample. However, CP(2)-IMA(1,1)-SV performs
worse that CP(2)-ARFIMA(12,d,0)-SV.
Figure 4.4.: PG-AS sampler for US monthly core inflation rate from 1960:1 to 2013:12
67 73 79 85 91 97 03 09−0.5
0
0.5
1
1.5(a)
67 73 79 85 91 97 03 090
0.08
0.16
0.24
0.32(b)
67 73 79 85 91 97 03 090
0.04
0.08
0.12
0.16(c)
67 73 79 85 91 97 03 090
0.06
0.12
0.18
0.24(d)
95%−tile
σ2t
5%−tile
Graph (a): US monthly core inflation adjusted for fixed seasonals, yt, (b): posterior estimates of the conditional varianceof inflation, (c) and (d): posterior density of the first and the second change point.
Overall, we find evidence of structural breaks in the dynamics of yt. As expected, most significant
changes in the model parameters occur during the Great Moderation. More importantly, it is also
cautiously evident that the long-memory characteristics of US inflation might have not remained
significant after the Great Moderation.
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
103
We follow Section 4.5.2 and compare the out-of-sample performance of CP(m)-ARFIMA(12,d,0)-SV
(break) with ARFIMA(12,d,0)-SV (no-break). Specifically, we compare the out-of-sample predictive
likelihood (PL) and predictive mean between these two models. Given the data up to time t−1, Yt−1,
the predictive likelihood (PL), p (yt, .., yT | Yt−1) is the predictive density evaluated at the realized
outcome, yt, ..., yT , t ≤ T , see Geweke (2005). The PL for model MA is given as
p (yt, .., yT | Yt−1,MA) =T∏s=t
N−1N∑i=1
p(ys | θ(i)
A , Ys−1,MA
). (4.5.10)
Notice that the terms on the right-hand-side of (4.5.10) have parameter uncertainty integrated out.
If t = 1, this would be the marginal likelihood and (4.5.10) changes to (4.3.4). Hence, the sum of
log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the marginal
likelihood, but ignoring the initial t − 1 observations. The predictive likelihood can be used to order
models according to their predictive abilities. In a similar fashion to Bayes factors, one can also com-
pare the performance of models based on a specific out-of-sample period by predictive Bayes factors,
PBF. Suppose we have two different models denoted by MA and MB. The PBF for yt, .., yT and
modelsMA versusMB is PBFAB = p (yt, ..., yT | Yt−1,MA) /p (yt, ..., yT | Yt−1,MB). It summarizes
the relative evidence of the two models over the out-of-sample data, yt, ..., yT .
In order to compare the out-of-sample density forecasts of the models, we calculate PBF for data
at and after the first break point, 1973:7. Hence, t − 1 = 1973:6. As a new observation arrives, we
update the posterior through a new round of sampling and perform forecasting. As in Section 4.5.2, in
the context of forecasting with the break model, we follow Bauwens et al. (2011). For one observation
out-of-sample, log (PBF ) = 2.36, 6 months log (PBF ) = 2.81, 1 year log (PBF ) = 2.94, 5 years
log (PBF ) = 5.11, 10 years log (PBF ) = 7.33 and 15 years log (PBF ) = 10.34, each in favor of the
break specification. The improvements continue till the end of sample, see Table 4.6. Finally, Table
4.6 also displays out-of-sample results for one-month ahead point forecasts for the no-break and the
break model. Overall, the break model offers improvements in terms of MAE and RMSE compared
to the no-break model.
Table 4.6.: Out-of-sample forecasts for US core inflation
Model MAE RMSE LINEX, a = 1, b = 1 LINEX, a = −1, b = 1 log(PL)
No-break 0.129 0.180 0.017 0.016 116.048
Break 0.121 0.168 0.014 0.015 130.556
This table reports mean absolute error (MAE), root mean squared error (RMSE) and average LINEX for the forecastsbased on the predictive mean for one-month ahead. Furthermore, the one-month ahead log-predictive likelihood, log(PL),is also reported. The out-of-sample period is from 1973:7 till the end of the sample.
4.6. Conclusion
In this paper we apply PG-AS to the challenging class of stochastic volatility models and demonstrate
its flexibility under different circumstances. We show that PG-AS provides a very flexible framework
for estimation, forecasting and model comparison in all of the cases that we consider. First, we
estimate various SV models using daily DJIA returns. We find that the SV model with moving
average and Student-t distributed errors performs best in terms of ML and DIC. We also show the
flexibility of PG-AS by combing it with the change-point specification of Chib (1998). Our empirical
4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks
104
application using US real GDP data shows that this combination provides reliable results in terms
of estimation, change-point identification, volatility feedback modeling and forecasting. Finally, we
analyze the behavior of US monthly core inflation rate using structural break ARFIMA-SV models.
We find evidence in favor of two structural breaks. Furthermore, we find considerable differences
in parameter estimates in each regime. We also demonstrate that accounting for structural breaks
improves density and point forecasts.
A. Appendix for Chapter 1
A.1. Estimation of the change-point model
To conduct estimation of the change-point model, we start by specifying independent conjugate priors
for the parameters in each regime. They are
βj ∼ N (n0, N0) , σ2j ∼ IG
(v02 ,
s02
), pj ∼ Beta (a0, b0)
for j = 1, ...,m and pm = 1. For the sake of notation let θj =(β′j , σ
2j
)′. Furthermore, in order to ease
the notation burden conditioning on XT is suppressed.
In order to perform Gibbs sampling, we divide the parameter space into three blocks: θ = θ1, ..., θm,the state of the system, S = (s1, ..., sT )
′, and the transition matrix, P . Below, we provide more details
on each step of the Gibbs sampler.
Step 1: Simulation of S | θ, P, YT . Chib (1998) shows that a joint draw of S can be achieved in
one step using
p (S | θ, P, YT ) = p (sT | θ, P, YT )T−1∏t=1
p (st | st+1, θ, P, Yt) (A.1.1)
in which one samples sequentially from each density on the right-hand-side of (A.1.1) beginning with
p (sT | θ, P, YT ), and then p (st | st+1, θ, P, Yt) t = T=1, ..., 1. At each step one conditions on the
previously drawn state, st+1, until a full draw of S is obtained. The individual densities in (A.1.1) are
obtained based on the following steps:
(a) Initialization: At t = 1, set p (s1 = 1 | θ, P, Y1) = 1.
(b) Compute the Hamilton (1989) filter, p (st = j | θ, P, Yt). This involves a prediction and an
update step in which one iterates on the following from t = 2, ..., T ,
p (st = j | θ, P, Yt−1) =
j∑l=j−1
p (st−1 = l | θ, P, Yt−1) plj , j = 1, ...,m (A.1.2)
p (st = j | θ, P, Yt) =p (st = j | θ, P, Yt−1) p (yt | θ, Yt−1, st = j)∑ml=1 p (st = l | θ, P, Yt−1) p (yt | θ, Yt−1, st = l)
, (A.1.3)
j = 1, ...,m.
The last equation is obtained from Bayes’ rule. Note that in (A.1.2) the summation is only from j− 1
to j, due to the restricted nature of the transition matrix, and p (yt | θ, Yt−1, st = j) ∼ N(Xt−1βj , σ
2j
)in (A.1.3) has a closed form solution.
A. Appendix for Chapter 1
106
(c) Finally, Chib (1998) shows that the individual densities in (A.1.1) are
p (st | st+1, θ, P, Yt) ∝ p (st | θ, P, Yt) p (st+1 | st, P ) .
Thus, given sT = m, st is drawn backwards over t = T − 1, T − 2, ..., 2 as
st | st+1, θ, P, Yt =
st+1 with probability ct
st+1 − 1 with probability 1− ct,
where
ct =p (st = j | θ, P, Yt) p (st+1 = j | st = j, P )∑j
l=j−1 p (st = l | θ, P, Yt) p (st+1 = j | st = l, P ).
Finally, note that p (s1 = 1 | s2, θ, P, Yt) = 1.
Step 2: Simulation of θ | S, YT . The conditional posterior density of θj depends only on the data
in regime j. Therefore, let Yj = yt : st = j, Xj = Xt−1 : st = j and use Gibbs sampling methods
for a linear regression model. Hence, βj | Yj , Xj , σ2j ∼ N (nj , Nj) where
Nj =(σ−2j X
′jXj +N−1
0
)−1, nj = Nj
(σ−2j X
′j Yj +N−1
0 n0
)and σ2
j | Xj , Yj , βj ∼ IG(vj
2 ,sj2
), where vj = Tj + v0, sj =
(Yj − Xjβj
)′ (Yj − Xjβj
)+ s0 and Tj is
the number of observations in regime j.
Step 3: Simulation of P | S. The conditional posterior for each diagonal component of P is very
simple and given by pj | S ∼ Beta (a0 + nj , b0 + 1), where nj is the number of one-step transitions
from state j to state j in a sequence of S.
A.2. Marginal likelihood computation for the change-point model
To compute the marginal likelihood for the change-point model, we use the method of Chib (1995)
which is based on
p (YT ) =p (YT | θ, P ) p (θ, P )
p (θ, P | YT ), (A.2.1)
where p (YT | θ, P ) is the likelihood function with S integrated out, p (θ, P ) is the prior density, and
p (θ, P | YT ) is the posterior density. As before, we set YT = (y1, ..., yT )′
and follow the notation from
the previous sections. In principle, any value of (θ, P ) can be used to compute (A.2.1). Following Liu
and Maheu (2008), we use the posterior mean,(θ, P
). Thus,
log p (YT ) = log p(YT | θ, P
)+ log p
(θ, P
)− log p
(θ, P | YT
), (A.2.2)
p(θ, P
)is evaluated directly and the likelihood function, p
(YT | θ, P
), is calculated as
log p (YT | θ, P ) =
T∑t=1
log p (yt | θ, P, Yt−1)
A. Appendix for Chapter 1
107
where
p (yt | θ, P, Yt−1) =
m∑j=1
p (yt | θ, Yt−1, st = j)
p (st = j | θ, P, Yt−1) .
The most difficult and demanding part of (A.2.2) is the computation of p(θ, P | YT
)since it must be
computed numerically. We use the decomposition
p(θ, P | YT
)= p
(β | YT
)p(σ2 | β, YT
)p(P | β, σ2, YT
), (A.2.3)
where each term on the right-hand-side can be estimated from MCMC simulations. The first term
can be estimated as
p(β | YT
)≈ 1
N
N∑i=1
p(β | σ2(i), S(i), YT
),
where p(β | σ2(i), S(i), YT
)=∏mj=1 p
(βj | σ2(i), S(i), YT
)and the draws
σ2(i), S(i)
Ni=1
are directly
available from the Gibbs output. The second term in (A.2.3) is equal to
p(σ2 | β, YT
)=
ˆp(σ2 | β, S, YT
)p(S | β, YT
)dS,
where p(σ2 | β, S, YT
)=∏mj=1 p
(σj
2 | β, S, YT). To obtain the draws from p
(S | β, YT
), we run an
additional reduced Gibbs sampling conditional on β, that is, we run a Gibbs sampling scheme where
we do not draw from β but fix them to be β. Thereafter, we useS(i)
Ni=1
and calculate p(σ2 | β, YT
)as
p(σ2 | β, YT
)≈ 1
N
N∑i=1
p(σ2 | β, S(i), YT
).
Finally, for p(P |β, σ2, YT
)=∏m−1j=1 p
(pj |β, σ2, YT
), j = 1, ...m−1, we sample
S(i)
Ni=1
from p(S | β, σ2, YT
)and set
p(pj |β, σ2, YT
)≈ 1
N
N∑i=1
p(pj | β, σ2, S(i), YT
)
A.3. Estimation of the MIA model
Consider the following model
yt = Xt−1βt + εt, εt ∼ N(0, σ2
), (A.3.1)
where βt = (β1t, ..., βkt)′
and
βit = βit−1 + κitηit, ηit ∼ N(0, q2
i
), i = 1, ..., k, (A.3.2)
A. Appendix for Chapter 1
108
where κit = 0, 1 with Pr (κit = 1) = πi. The parameters of (A.3.1)-(A.3.2) are: the structural break
probabilities, π = (π1, ..., πk)′, the magnitude of the breaks in the state equations, q =
(q2
1, ..., q2k
)′, and
σ2. These quantities are all collected in θ. As before, in order to ease the notation burden conditioning
on XT is suppressed. Let Kt = (κ1t, ..., κkt)′, K = KtTt=1 and B = βtTt=1. The Gibbs sampling
scheme for (A.3.1)-(A.3.2) is as follows
Sample K | θ, YT
The structural breaks are sampled using the algorithm of Gerlach et al. (2000). In particular, Gerlach
et al. (2000) has two important features. First, K is generated without conditioning on the states, B.
Second, the number of operations required to obtain a draw of K is reduced from O(T 2)
to O (T ).
Define K−t = KsTs=1,s 6=t. The conditional posterior of Kt is given as
p (Kt | θ,K−t, YT ) ∝ p (YT | θ,K) p (Kt | θ,K−t)
∝ p (yt+1, ..., yT | θ,K, Yt)
× p (yt | θ,K1, ...,Kt, Yt−1) p (Kt | θ,K−t) . (A.3.3)
The term p (Kt | θ,K−t) is obtained from the prior and p (yt | θ,K1, ...,Kt, Yt−1) is computed using the
Kalman filter. The important contribution of Gerlach et al. (2000) is that the second term in (A.3.3),
p (yt+1, ..., yT | θ,K, Yt) can be obtained in one step after an initial set of backward recursions. Finally,
since Kt can only take a finite number of values, it can be drawn by computing the right-hand-side of
(A.3.3) for all possible values of Kt and then normalizing. For more details on implementation of the
algorithm, we refer the reader to Gerlach et al. (2000).
Sample B | K, θ, YT
B is sampled from its conditional posterior using the simulation smoother of Carter and Kohn (1994).
The algorithm of Durbin and Koopman (2002) is also an interesting alternative.
Sample θ | K,B, YT
To sample q2i and σ2, inverse Gamma densities are used, see Kim and Nelson (1999). Finally, assume
that p (πi) ∼ Beta (ai0, bi0). Then, the conditional posterior of πi is Beta (ai, bi), where ai = ai0 +
ΣTt=1κit and bi = bi0 + T − ΣT
t=1κit.
B. Appendix for Chapter 2
B.1. A direct approach for evaluating the likelihood function
This appendix details a direct approach for evaluating the likelihood function of the ARFIMA model.
Consider the following ARFIMA model
yt = µ+ (1− L)−d εt, εt ∼ N(0, σ2
). (B.1.1)
Conditional on M , we write (B.1.1) as YT = u+Hε, where u, H and ε follow directly from the main
text. The special structure of H can be exploited to speed up computation. For instance, obtaining
the Cholesky decomposition of a banded T × T matrix with fixed bandwidth involves only O (T )
operations as opposed to O(T 3)
for a full matrix of the same size. Similar computational savings can
be generated in operations such as multiplication, forward and backward substitution by using block-
banded or sparse matrix algorithms. These banded and sparse matrix algorithms are implemented in
Matlab.
It follows from (B.1.1) that p (YT | θ,M) ∼ N (u,ΩYT ) where ΩYT = HSYTH′
and SYT = σ2IT .
Since SYT is a diagonal matrix, and H is a lower triangular sparse matrix, the product, ΩYT , is sparse.
Moreover, since |H| = 1 for any π1, ..., πM one has that |ΩYT | = |SYT |. The joint density of YT is
therefore given by
log p (YT | θ,M) = −T2
log (2π)− T
2log(σ2)− 1
2(YT − u)
′Ω−1YT
(YT − u) . (B.1.2)
As stated in Chan (2013), we do not need to obtain the T ×T inverse matrix, Ω−1YT
, in order to evaluate
(B.1.2), which would involve O(T 3)
operations. Instead, it can be computed in three steps, each of
which requires only O (T ) operations. Therefore, the following notation is introduced, see Chan (2013):
given a lower (upper) triangular T × T non-singular matrix, A, and a T × 1 vector, c, let A \ c denote
the unique solution to the triangular system, Ax = c, obtained by forward (backward) substitution,
i.e. A \ c = A−1c.
Now, obtain the Cholesky decomposition, CYT , of ΩYT such that CYTC′YT
= ΩYT . This involves only
O (T ) operations. Compute x1 = C′YT\(CYT \ (YT − u)) by forward followed by backward substitution,
each of which require O (T ) operations since CYT is also banded. Then, by definition
x1 = C−1′
YT
(C−1YT
(YT − u))
=(CYTC
′YT
)−1(YT − u) = Ω−1
YT(YT − u) .
Finally, compute x2 = −12 (YT − u)
′x1 = −1
2 (YT − u)′Ω−1YT
(YT − u), which gives the quadratic term
in (B.1.2).
C. Appendix for Chapter 3
C.1. Simulation evidence: SV-MA(1) and SVM
In this section, we present simulation results for SV-MA(1) and SVM. We simulate T = 1000 ob-
servations from these models, and report the true DGP parameters along with PMMH parameter
estimates in Table C.1. In each case, we also estimate a plain SV model for comparison. Overall, we
see that PMMH works very well as parameter estimates are close to their respective true values. Not
surprisingly, in each case the corresponding model outperforms the plain SV model in terms of ML.
Finally, we analyze the performance of PMMH with respect to the number of particles, M . We do
this by estimating SV-MA(1) using M = 1, M = 10, M = 100 and M = 1000. In all of these cases,
we choose N = 20000. We see that using very low values of M appear to be insufficient, see Figure
C.1. For instance, the chain gets stuck on a specific parameter value almost throughout the sample
for M = 1. For M = 100, we get better results. However, the chain still gets stuck for a considerable
time. However, we see drastic improvements in the performance of the algorithm for M = 1000. We
also run the PMMH algorithm for M = 2000. We get almost identical results as for M = 1000.
C. Appendix for Chapter 3
111
Tab
leC
.1.:
Sim
ula
tion
evid
ence
DG
P:
stoch
asti
cvo
lati
lity
wit
hM
A(1
)er
rors
,SV
-MA
(1)
DG
P:
stoch
asti
cvo
lati
lity
inm
ean,
SV
MSV
SV
-MA
(1)
SV
SV
MP
aram
eter
true
θRB
θRB
tru
eθ
RB
θRB
µ0.
20
0.4168
4.31
0.27
564.
400.
200.
7287
23.7
20.
2883
5.24
[0.2
129,0
.604
5][0
.041
9,0.
5011
][0
.410
6,1.
0132
][0
.058
0,0.
5191
]ρ
0.98
0.9701
5.07
0.97
814.
550.
980.
9815
21.6
30.
9790
4.86
[0.9
515,0
.986
1][0
.963
5,0.
9906
][0
.965
1,0.
9928
][0
.965
5,0.
9904
]σ
20.
01
0.0171
4.86
0.01
045.
220.
010.
0087
0.01
085.
38[0
.007
8,0
.029
1][0
.004
8,0.
0172
][0
.004
3,0.
0159
]14
.04
[0.0
055,
0.01
74]
ψ1
0.40
0.45
915.
57[0
.422
1,0.
4963
]λ
0.80
0.75
335.
09[0
.709
0,0.
8001
]
log(
L)
-166
6.1
-158
1.0
-181
1.0
-158
6.8
log(
ML
),a
=0.
75
-1671
.2-1
592.
4-1
817.
5-1
598.
5lo
g(M
L),a
=0.
99
-1673
.8-1
592.
2-1
817.
2-1
598.
2M
-Hra
tio
0.42
0.38
0.33
0.37
This
table
rep
ort
ses
tim
ati
on
resu
lts
for
SV
-MA
(1)
and
SV
Mm
odel
susi
ng
sim
ula
ted
data
.lo
g(L
):lo
g-l
ikel
ihood,
log(M
L):
log-m
arg
inal
likel
ihood
for
the
corr
esp
ondin
gva
lue
ofa.
M-H
rati
o:
Met
rop
olis-
Hast
ings
acc
epta
nce
rati
o.
C. Appendix for Chapter 3
112
Figure C.1.: Estimation results, SV-MA(1) model
µ |YT, M=1
0 10000 200000.0
0.5
1.0 µ |YT, M=1 ρ|YT, M=1
0 10000 200000.70
0.75
0.80ρ|YT, M=1 σ2|YT, M=1
0 10000 200000.00
0.15
0.30σ2|YT, M=1
ψ|YT, M=1
0 10000 200000.0
0.2
0.4
ψ|YT, M=1
µ |YT, M=10
0 10000 200000.0
0.5
1.0µ |YT, M=10 ρ|YT, M=10
0 10000 200000.7
0.8
0.9ρ|YT, M=10 σ2|YT, M=10
0 10000 200000.00
0.15
0.30σ2|YT, M=10
ψ|YT, M=10
0 10000 200000.00
0.25
0.50
ψ|YT, M=10
µ |YT, M=100
0 10000 20000−0.5
0.0
0.5
1.0µ |YT, M=100
ρ|YT, M=100
0 10000 200000.7
0.8
0.9
1.0
ρ|YT, M=100
σ2|YT, M=100
0 10000 200000.00
0.15
0.30σ2|YT, M=100
ψ|YT, M=100
0 10000 20000−0.1
0.2
0.5
ψ|YT, M=100
µ |YT, M=1000
0 10000 20000−0.8
0.0
0.8
1.6µ |YT, M=1000
0 10000 200000.7
0.8
0.9
1.0
ρ|YT, M=1000 ρ|YT, M=1000
σ2|YT, M=1000
0 10000 200000.00
0.15
0.30σ2|YT, M=1000
ψ|YT, M=1000
0 10000 20000−0.1
0.2
0.5
ψ|YT, M=1000
Each column shows posterior draws of the parameter of interest for different number of particles, M .
C. Appendix for Chapter 3
113
C.2. Particle Gibbs with ancestor sampling
In order to ease the notation burden, we consider the plain stochastic volatility (SV) model
yt = exp (αt/2) εt, εt ∼ N (0, 1) (C.2.1)
αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (C.2.2)
where θ =(µ, ρ, σ2
)′. Within the PG-AS framework, we approach estimating (C.2.1)-(C.2.2) directly.
First, we draw α1:T ∼ p (α1:T | θ, YT ) using the conditional particle filter with ancestor sampling,
CPF-AS. Thereafter, we draw θ ∼ p (θ | α1:T , YT ) using standard Gibbs sampling techniques. Let
i = 1, ..., N denote the number of Gibbs sampling iterations, j = 1, ...,M denote the number of
particles, and let p (yt | θ, αt, Yt−1) denote the density of yt given θ, αt and Yt−1. Finally, let α(i−1)1:T
be a fixed reference trajectory of α1:T sampled at iteration i − 1 of the Gibbs sampler. The steps of
CPF-AS for the SV model are as follows
1. if t = 1
(a) Draw α(j)1 | θ for j = 1, ...,M − 1 and set α
(M)1 = α
(i−1)1 .
(b) Set w(j)1 = τ
(j)1 /ΣM
k=1τ(k)1 , where τ
(j)1 = p
(y1 | θ, α(j)
1 , Y0
)for j = 1, ...,M .
2. else for t = 2 to T do
(a) Resample α(j)t−1
M−1j=1 using indices δ
(j)t , where p
(δ
(j)t = k
)∝ w(j)
t−1.
(b) Draw α(j)t | α
(δ(j)t
)t−1 , θ for j = 1, ...,M − 1.
(c) Set α(M)t = α
(i−1)t .
(d) Draw δ(M)t from p
(δ
(M)t = j
)∝ w(j)
t−1p(α
(i−1)t | α(j)
t−1, θ)
.
(e) Set α(j)1:t =
(α
(δ(j)t
)1:t−1 , α
(j)t
), w
(j)t = τ
(j)t /ΣM
k=1τ(k)t .
3. end for
4. Sample α(i)1:T | θ, YT with p
(α
(i)1:T = α
(j)1:T | θ, YT
)∝ w(j)
T .
Notice that CPF-AS is akin to a standard particle filter, but with the difference that αM1:T is specified
a priori and serves as a reference trajectory. Hence, we use only M − 1 particles at each step.
Furthermore, whereas in the particle Gibbs algorithm of Andrieu et al. (2010), we set δ(M)t = M , in
PG-AS, we sample a new value for the index variable, δ(M)t , in an ancestor sampling step, (d). Finally,
we can include more unobserved processes in the model of interest. All we need to do is to modify
(a), (b), (c), (e), and thus draw particles for each process. At the same time, we still have one sets of
weights. Thereafter, we can sample the unobserved processes using step 4.
D. Appendix for Chapter 4
D.1. Prior sensitivity analysis
In this section, sensitivity of the results to prior specification is evaluated by investigating alternative
prior hyperparameter values on the transition probabilities, pkk ∼ Beta (a0, b0), keeping prior hyper-
parameter values of the other parameters the same as in the main text. pkk, k = 1, ...,m− 1 is one of
the key parameters of the model because it controls the duration of each regime in S.
We experiment with different hyperparameter values on pkk in Table D.1. We report the break
dates for each of them by estimating CP(2)-ARFIMA-SV using the corresponding values of a0 and
b0. For instance, the first alternative prior is pkk ∼ Beta (0.1, 0.1), which is relatively flat. With this
prior, we still find that the change-point dates correspond to 1973:7 and 1984:2. In fact, regardless
of the values of a0 and b0, we still find that the change-point dates for each of these specifications
correspond to 1973:7 and 1984:2. We also report logBF of CP(2)-ARFIMA-SV versus ARFIMA-SV
using the corresponding values of α, along with the difference in DIC between CP(2)-ARFIMA-SV
and ARFIMA-SV, see Table D.1. These results overwhelmingly suggest existence of structural breaks.
More importantly, we find that the choice of prior hyperparameter values on P is of relatively limited
importance.
Table D.1.: Prior sensitivity analysis, CP(2)-ARFIMA-SV
Prior break dates logBFα = 0.50
logBFα = 0.75
logBFα = 0.95
logBFα = 0.99
diff(DIC)
Beta (0.1, 0.1) 1973:7, 1984:2 59.345 59.347 59.346 59.346 -33.251Beta (8, 0.1) 1973:7, 1984:2 67.807 67.808 67.808 67.808 -33.385Beta (20, 0.1) 1973:7, 1984:2 74.911 74.913 74.912 74.912 -27.454Beta (100, 0.1) 1973:7, 1984:2 60.123 60.125 60.124 60.124 -29.880
This table compares the performance of CP(2)-ARFIMA-SV for different values of a0 and b0, where pkk ∼ Beta (a0, b0).The priors of the other parameters are set according to the main text. logBF : logarithm of the Bayes factor ofCP(2)-ARFIMA-SV versus ARFIMA-SV using the corresponding value of α. diff(DIC): difference in DIC betweenCP(2)-ARFIMA-SV and ARFIMA-SV.
D.2. Sensitivity of PG-AS with respect to M
We often find that the choice of M is important because it ensures that the estimate of h1:T is not
too jittery or imprecise. Furthermore, increasing M also increases the computation time. Therefore,
it is important to find a reasonable value for M that avoids the above mentioned problems. In the
following, we experiment with different values of M to find out its effects on estimation results. We
do this by re-estimating the SV model using the DJIA data for M = 2, 10, 100 and 1000.
We report parameter estimates of the SV model using the above mentioned number of particles
in Table D.2. Besides these estimates, we also report the inefficiency factors (RB) of the parameters
D. Appendix for Chapter 4
115
and h1:T for each case, see Figure D.1. Furthermore, we compute Geweke’s convergence statistics
and present estimation time in seconds for each M . In each case, we sample N = 20000 draws from
p (θ, h1:T | YT ) after a burn-in of 1000.
Overall, we see that PG-AS performs very well as parameter estimates are very similar regardless
the values of M . Furthermore, the choice of M = 100 is very sensible. In fact, we get almost identical
results for M = 10 and M = 100. However, the RBs decrease as we set M = 100. For instance,
in Figure D.1, for M = 10, 75% of h1:T s have inefficiency factors less than 8, while for M = 100,
this number is close to 4. Compared to M = 100, we do not obtain any significant gains in RB for
M = 1000. However, as M increases the computation time also increases. From this point of view,
M = 1000 seems computationally very demanding.
Figure D.1.: Sensitivity of the PG-AS sampler with respect to M
20
40
60
80
100
PG−AS, M=2
(a)
2
5
8
11
14
PG−AS, M=10
(b)
2
2.8
3.6
4.4
5.2
PG−AS, M=100
(c)
2
2.8
3.6
4.4
5.2
PG−AS, M=1000
(d)
Graphs (a)-(d): box plots of the inefficiency factors of h1:T using the corresponding number of particles.
D. Appendix for Chapter 4
116
Table D.2.: Sensitivity of PG-AS with respect to M , DJIA daily returns
Parameter mean std. dev 5%-tile 95%-tile RB Geweke
M = 2
µ 0.088 (0.019) 0.057 0.119 6.091 0.800
µh -0.126 (0.289) -0.587 0.342 1.887 -0.532
φh 0.980 (0.006) 0.969 0.989 22.883 1.175
σ2h 0.051 (0.011) 0.035 0.070 61.345 -1.446
log(ML), α = 0.99 -2494.392
DIC 4969.476
Time (seconds) 8778.005
M = 10
µ 0.089 (0.019) 0.057 0.120 2.458 0.186
µh -0.120 (0.299) -0.589 0.362 1.280 0.309
φh 0.980 (0.006) 0.969 0.989 17.867 -0.134
σ2h 0.050 (0.011) 0.034 0.069 48.682 0.629
log(ML), α = 0.99 -2490.604
DIC 4961.471
Time (seconds) 9108.450
M = 100
µ 0.089 (0.019) 0.058 0.120 1.804 1.103
µh -0.123 (0.293) -0.590 0.338 1.050 0.757
φh 0.979 (0.006) 0.968 0.989 17.703 -1.691
σ2h 0.052 (0.012) 0.035 0.073 46.176 1.719
log(ML), α = 0.99 -2488.960
DIC 4958.287
Time (seconds) 12227.616
M = 1000
µ 0.088 (0.019) 0.057 0.119 1.924 -0.899
µh -0.120 (0.296) -0.587 0.352 0.982 -1.522
φh 0.979 (0.006) 0.969 0.989 17.550 0.021
σ2h 0.051 (0.011) 0.034 0.071 45.178 0.331
log(ML), α = 0.99 -2488.923
DIC 4958.711
Time (seconds) 41303.696
RB : inefficiency factor (using a bandwidth, B, of 100). Geweke: Geweke’s convergence
statistic. log(ML): log-marginal likelihood. DIC: deviance information criterion.
Bibliography
[1] Abanto-Valle, C. A., D. Bandyopadhyay, V. H. Lachos, and I. Enriquez. 2010. “Robust
Bayesian analysis of heavy-tailed stochastic volatility models using scale mixtures of
normal distributions.” Computational Statistics and Data Analysis 54(12): 2883-2898.
[2] Andersen, T. G., T. Bollerslev, F. X. Diebold, and H. Ebens. 2001. “The Distribution of
Realized Stock Return Volatility.” Journal of Financial Economics 61(1): 43-76.
[3] Andersen, T. G., T. Bollerslev, and F. X. Diebold. 2007. “Roughing It Up: Including
Jump Components in the Measurement, Modeling and Forecasting of Return Volatility.”
Review of Economics and Statistics 89(4): 701-720.
[4] Andrieu, C., and A. Doucet. 2002. “Particle filtering for partially observed Gaussian state
space models.” Journal of the Royal Statistical Society B 64(4): 827-836.
[5] Andrieu, C., A. Doucet, and R. Holenstein. 2010. “Particle Markov chain Monte Carlo
methods (with discussion).” Journal of the Royal Statistical Society B 72(3): 1-33.
[6] Baillie, R. T., C. F. Chung, and M. A. Tieslau. 1996. “Analysing inflation by the frac-
tionally integrated ARFIMA-GARCH model.” Journal of Applied Econometrics 11(1):
23-40.
[7] Barndorff-Nielsen, O. E., and N. Shephard. 2002a. “Econometric Analysis of Realized
Volatility and its Use in Estimating Stochastic Volatility Models.” Journal of the Royal
Statistical Society B 64(2): 253-280.
[8] Barndorff-Nielsen, O. E., and N. Shephard. 2002b.“Estimating Quadratic Variation using
Realised Variance.” Journal of Applied Econometrics 17(5): 457-477.
[9] Barndorff-Nielsen, O. E., and N. Shephard. 2004. “Power and Bipower Variation with
Stochastic Volatility and Jumps.” Journal of Financial Econometrics 2(1): 1-37.
[10] Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard. 2009. “Realised
Kernels in Practice: Trades and Quotes.” The Econometrics Journal 12(3): 1-32.
[11] Bauwens, L., M. Lubrano, and J. F. Richard. 1999. Bayesian Inference in Dynamic
Econometric Models. Oxford University Press.
[12] Bauwens, L., G. Koop, D. Korobilis, and V. K. Rombouts. 2011. “A Comparison of
Forecasting Models for Macroeconomics Series: The Contribution of Structural Break
Models .” Working paper, University of Strathclyde.
[13] Beran, J. 1994. Statistics for Long-Memory Processes. Chapman and Hall.
BIBLIOGRAPHY
118
[14] Berg, A., R. Meyer, and J. Yu. 2004. “Deviance Information Criterion for Comparing
Stochastic Volatility Models.” Journal of Business and Economic Statistics 22(1): 107-
120.
[15] Bollerslev, T. 1987. “A Conditional Heteroskedastic Time Series Model for Speculative
Prices and Rates of Return.” The Review of Economics and Statistics 69(3): 542-547.
[16] Bollerslev, T., U. Kretschmer, C. Pigorsch, and G. E. Tauchen. 2007. “A Discrete-Time
Model for Daily S&P 500 Returns and Realized Variations: Jumps and Leverage Effects.”
Journal of Econometrics 150 : 151-166.
[17] Bos, C. S. 2011. “A Bayesian Analysis of Unobserved Component Models Using Ox.”
Journal of Statistical Software 41(13): 1-24.
[18] Bos, C. S. 2013. GnuDraw. URL http://www.tinbergen.nl/˜cbos/gnudraw.html.
[19] Bos, C. S., S. J. Koopman, and M. Ooms. 2012. “Long memory with stochastic vari-
ance model: A recursive analysis for U.S. inflation.” Computational Statistics and Data
Analysis 76(3): 144-157.
[20] Chan, J. 2013.“Moving Average Stochastic Volatility Models with Application to Inflation
Forecast.” Journal of Econometrics 176 (2): 162-172.
[21] Chan, J. 2014. “The Stochastic Volatility in Mean Model with Time-Varying Parameters:
An Application to Inflation Modeling.” Working paper, Research School of Economics,
Australian National University.
[22] Chan, J., and A. L. Grant. 2014. “Issues in Comparing Stochastic Volatility Models Us-
ing the Deviance Information Criterion.” Working paper, Research School of Economics,
Australian National University.
[23] Chan, J., and C. Hsiao. 2013. “Estimation of Stochastic Volatility Models with Heavy
Tails and Serial Dependence.” In Bayesian Inference in the Social Sciences. John Wiley
& Sons, New york.
[24] Chan, J., and I. Jeliazkov. 2009. “Efficient Simulation and Integrated Likelihood Esti-
mation in State Space Models.” International Journal of Mathematical Modelling and
Numerical Optimisation 1 : 101-120.
[25] Chan, N. H., and W. Palma. 1998. “State space modeling of long-memory processes.”
Annals of Statistics 26(2): 719-740.
[26] Carter, C., and R. Kohn. 1994. “On Gibbs sampling for state space models.” Biometrika
81(3): 541-553.
[27] Chib, S. 1995. “Marginal Likelihood from the Gibbs Output.” Journal of the American
Statistical Association 90(432): 1313-1321.
[28] Chib, S., and E. Greenberg. 1995. “Understanding the Metropolis-Hastings Algorithm.”
The American Statistician 49(4): 327-335.
BIBLIOGRAPHY
119
[29] Chib, S. 1996. “Calculating posterior distributions and modal estimates in Markov mix-
ture models.” Journal of Econometrics 75(1): 79-97.
[30] Chib, S. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal
of Econometrics 86(2): 221-241.
[31] Chib, S., F. Nadari, and N. Shephard. 2002. “Markov chain Monte Carlo methods for
stochastic volatility models.” Journal of Econometrics 108(2): 281-316.
[32] Cogley, T., and T. J. Sargent. 2005. “Drifts and volatilities: Monetary policies and out-
comes in the post WWII US.” Review of Economic Dynamics, 8(2): 262-302.
[33] Corsi, F., U. Kretschmer, S. Mittnik, and C. Pigorsch. 2008. “The Volatility of Realized
Volatility.” Econometric Reviews 27(1-3): 46-78.
[34] Corsi, F. 2009.“A Simple Approximate Long-Memory Model of Realized Volatility.”Jour-
nal of Financial Econometrics 7(2): 174-196.
[35] Creal, D. 2012. “A survey of sequential Monte Carlo methods for economics and finance.”
Econometric Reviews 31(3 ): 245-296.
[36] Del Moral, P. 2004. Feynman-Kac Formulae: Genealogical and Interacting Particle Sys-
tems with Applications. Springer.
[37] Diebold, F. X., and A. Inoue. 2001. “Long Memory and Regime Switching.” Journal of
Econometrics 105 : 131-159.
[38] Doornik, J. A. 2009. An Object-Oriented Matrix Language Ox 6. Timberlake Consultants
Press.
[39] Doucet, A., S. J. Godsill, and C. Andrieu. 2000. “On sequential Monte Carlo sampling
methods for Bayesian filtering.” Statistics and Computing 10(3): 197-208.
[40] Doucet, A., and A. Johansen. 2011. “A tutorial on particle filtering and smoothing: Fif-
teen years later.” In The Oxford Handbook of Nonlinear Filtering. D. Crisan and B.
Rozovsky, Eds. Oxford University Press.
[41] Durbin, J., and S. J. Koopman. 2002. “A simple and efficient simulation smoother for
state space time series analysis.” Biometrika 89(3): 603-616.
[42] Eisenstat, E., and R. W. Strachan. 2014. “Modelling inflation volatility.” CAMA Working
Paper 24.
[43] Flury, T., and N. Shephard. 2011. “Bayesian inference based only on simulated likelihood:
particle filter analysis of dynamic economic models.”Econometric Theory 27(5): 933-956.
[44] Fouque, J. P., C. H. Han, and M. Molina. 2010.“MCMC Estimation of Multiscale Stochas-
tic Volatility Models.” Handbook of Quantitative Finance and Risk Management 1109-
1120.
BIBLIOGRAPHY
120
[45] Gerlach, R., C. Carter, and R. Kohn. 2000. “Efficient Bayesian Inference for Dynamic
Mixture Models.” Journal of the American Statistical Association 95 : 819-828.
[46] Gelfand, A., and D. Dey. 1994. “Bayesian Model Choice: Asymptotics and Exact Calcu-
lations.” Journal of the Royal Statistical Society B 56(3): 501-514.
[47] Geweke, J. 2005. Contemporary Bayesian Econometrics and Statistics. Wiley.
[48] Geweke, J., and C. Whiteman. 2006. “Bayesian Forecasting.” In G. Elliott, C. Granger,
and A. Timmermann (eds.) Handbook of Economic Forecasting, vol. 1. New York: Else-
vier.
[49] Giordani, P., and R. Kohn. 2008. “Efficient Bayesian Inference for Multiple Change-Point
and Mixture Innovation Models.” Journal of Business and Economic Statistics 26(1):
66-77.
[50] Gordon, S., and J. Maheu. 2008. “Learning, Forecasting and Structural Breaks.” Journal
of Applied Econometrics 23(5): 553-583.
[51] Grassi, S., and T. Proietti. 2010. “Has the Volatility of U.S. Inflation Changed and How?”
Journal of Time Series Econometrics 2(1): Article 6.
[52] Groen, J., R. Paap, and F. Ravazzolo. 2012. “Real-time Inflation Forecasting in a Chang-
ing World.” Journal of Business and Economic Statistics 31(1): 29-44.
[53] Ghysels, E., P. Santa-Clara, and R. Valkanov. (2006). “Predicting Volatility: Getting the
Most out of Return Data Sampled at Different Frequencies.” Journal of Econometrics
131(1-2): 59-95.
[54] Hamilton, J. D. 1989.“A New Approach to the Economic Analysis of Nonstationary Time
Series and the Business Cycle.” Econometrica 57(2): 357-384.
[55] Hansen, P. R., and A. Lunde. 2006.“Realized Variance and Market Microstructure Noise.”
Journal of Business and Economic Statistics 24(2): 127-161.
[56] Hansen, P. R., A. Lunde, and J. M. Nason. 2011. “The Model confidence set.” Economet-
rica 79(2): 453-497.
[57] Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter,
Cambridge University Press.
[58] Huang, X., and G. Tauchen. 2005. “ The Relative Contribution of Jumps to Total Price
Variance.” Journal of Financial Econometrics 3(4): 456-499.
[59] Jacquiera, E., N. G. Polson, and P. E. Rossi. 1994. “Bayesian Analysis of Stochastic
Volatility Models.” Journal of Business and Economic Statistics 12 : 371-417.
[60] Jacquiera, E., N. G. Polson, and P. E. Rossi. 2004. “Bayesian analysis of stochastic
volatility models with fat-tails and correlated errors.” Journal of Econometrics 122(1):
185-212.
BIBLIOGRAPHY
121
[61] Kass, R. E., and A. E. Raftery. 1995. “Bayes Factors and Model Uncertainty.” Journal of
the American Statistical Association 90 : 773-795.
[62] Kim, S., N. Shephard, and S. Chib. 1998. “Stochastic Volatility: Likelihood Inference and
Comparison with ARCH Models.” Review of Economic Studies 65(3): 361-393.
[63] Kim, C. J., and C. R. Nelson. 1999. State-Space Models with Regime Switching: Classical
and Gibbs-sampling Approaches with Applications. MIT press.
[64] Kim, C. J., and C. R. Nelson. 1999. “Has the US economy become more stable? A
Bayesian approach based on a Markov-switching model of the business cycle.” Review of
Economics and Statistics 81(4): 608-616.
[65] Kim, C. J., C. R. Nelson, and J. Piger. 2004. “The less-volatile US economy: A Bayesian
investigation of timing, breath, and potential explanations.” Journal of Business and
Economic Statistics 22(1): 80-93.
[66] Kim, C. J., C. Morley, and C. Nelson. 2005. “The Structural Break in the Equity Pre-
mium.” Journal of Business and Economic Statistics 23(2): 181-191.
[67] Koop, G. 2003. Bayesian Econometrics. John Wiley & Sons Ltd.
[68] Koop, G., and D. Korobilis. 2010. “Bayesian Multivariate Time Series Methods for Em-
pirical Macroeconomics.” Foundations and Trends in Econometrics 3 : 267-358.
[69] Koop, G., E. Ley, J. Osiewalski, and M. Steel. 1997. “Bayesian analysis of long memory
and persistence using ARFIMA models.” Journal of Econometrics 76(1-2): 149-169.
[70] Koop, G., R. Leon-Gonzalez, and R. W. Strachan. 2010. “Dynamic probabilities of re-
strictions in state space models: an application to the Phillips curve.”Journal of Business
and Economic Statistics 28(3): 370-379.
[71] Koopman, S. J., and E. H. Uspensky. 2002. “The stochastic volatility in mean model:
empirical evidence from international stock markets.” Journal of Applied Econometrics
17(6): 667-689.
[72] Koopman, S. J., B. Jungbacker, and E. Hol. 2005. “Forecasting Daily Variability of the
S&P 100 Stock Index Using Historical, Realised and Implied Volatility Measurements.”
Journal of Empirical Finance 12(3): 445-475.
[73] Lancaster, T. 2004. Introduction to Modern Bayesian Econometrics. Wiley-Blackwell.
[74] Lindsten, F., M. I. Jordan, and T. B. Schon. 2012.“Ancestor Sampling for Particle Gibbs.”
Advances in Neural Information Processing Systems (NIPS) 25 : 2600-2608.
[75] Lindsten, F., M. I. Jordan, and T. B. Schon. 2014. “Particle Gibbs with Ancestor Sam-
pling.” Journal of Machine Learning Research 15 : 2145-2184.
[76] Lindsten, F., and T. B. Schon. 2013. “Backward Simulation Methods for Monte Carlo
Statistical Inference.” Foundations and Trends in Machine Learning 6(1): 1-14.
BIBLIOGRAPHY
122
[77] Liu, J. S., and R. Chen. 1998. “Sequential Monte Carlo Methods for Dynamic Systems.”
Journal of the American Statistical Association 93(443): 1032-1044.
[78] Liu, C., and J. Maheu. 2008. “Are There Structural Breaks in Realized Volatility?” Jour-
nal of Financial Econometrics 6(3): 326-360.
[79] Liu, C., and J. Maheu. 2009. “Forecasting Realized Volatility: A Bayesian Model Aver-
aging Approach.” Journal of Applied Econometrics 24(5): 709-733.
[80] Malik, S., and M. K. Pitt. 2011. “Modelling stochastic volatility with leverage and jumps:
a simulated maximum likelihood approach via particle filtering.” Working paper, Univer-
sity of Warwick.
[81] Marcellino, M., J. H. Stock, and M. W. Watson. 2005. “A Comparison of Direct and
Iterated AR Methods for Forecasting Macroeconomic Series h-Steps Ahead.” Journal of
Econometrics 135(1-2): 499-526.
[82] McAleer, M., and M. Medeiros. 2008.“Realized volatility: a review.”Econometric Reviews
27(1-3): 10-45.
[83] Muller, U. A., M. M. Dacorogna, R. D. Dave, R. B. Olsen, O. V. Pictet, and J. E. von
Weizsacker. 1997. “Volatilities of different time resolutions-Analyzing the dynamics of
market components.” Journal of Empirical Finance 4(2-3): 213-239.
[84] Nakajima, J., and Y. Omori. 2012. “Stochastic volatility model with leverage and asym-
metrically heavy-tailed error using GH skew Student’s t-distribution.” Computational
Statistics and Data Analysis 56(11): 3690-3704.
[85] Omori, Y., S. Chib, N. Shephard, and J. Nakajima. 2007. “Stochastic volatility with
leverage: Fast and efficient likelihood inference.”Journal of Econometrics 135(1-2): 499-
526.
[86] Pesaran, H., D. Pettenuzzo, and A. Timmermann. 2006.“Forecasting Time Series Subject
to Multiple Structural Breaks.” Review of Economic Studies 73(4): 1057-1084.
[87] Pitt, M. K., and N. Shephard. 1999. “Filtering via Simulation: Auxiliary Particle Filters.”
Journal of the American Statistical Association 94 : 590-599.
[88] Primiceri, G. E. 2005. “Time varying structural vector autoregressions and monetary
policy.” Review of Economic Studies 72(3): 821-852.
[89] Raggi, D., and S. Bordignon. 2012. “Long Memory and Nonlinearities in Realized Volatil-
ity: A Markov Switching Approach.”Computational Statistics and Data Analysis 56(11):
3730-3742.
[90] Robert, C., and G. Casella. 1999. Monte Carlo Statistical Methods. Springer, Berlin.
[91] Shephard, N. 1996. Statistical Aspects of ARCH and Stochastic Volatility. In Time Series
Models in Econometrics, Finance and Other Fields. Chapman and Hall.
BIBLIOGRAPHY
123
[92] Sims, C. A., D. F. Waggoner, and T. Zha. 2008.“Methods for Inference in Large Multiple-
Equation Markov-Switching Models.” Journal of Econometrics 146(2): 255-274.
[93] So, M. K. P., C. W. S. Chen, and M. Chen. 2005. “A Bayesian Threshold Nonlinearity
Test for Financial Time Series.” Journal of Forecasting 24(1): 61-75.
[94] Stock, J. H., and M. W. Watson. 2007. “Why Has U.S. Inflation Become Harder to
Forecast?” Journal of Money, Credit, and Banking 39(1): 3-34.
[95] Spiegelhalter, D., N. Best, B. Carlin, and A. van der Linde. 2002. “Bayesian Measures of
Model Complexity and Fit (with comments)”. Journal of the Royal Statistical Society B
64(4): 583-639.
[96] Watanabe, T., and Y. Omori. 2004.“A Multi-move Sampler for Estimating Non-Gaussian
Time Series Models: Comments on Shephard and Pitt (1997).” Biometrika 91(1): 246-
248.
[97] Whiteley, N., C. Andrieu, and A. Doucet. 2010.“Efficient Bayesian Inference for Switching
State-Space Models using Particle Markov chain Monte Carlo methods.”Bristol Statistics
Research Report 10:04.
[98] Zellner, A. 1986. “Bayesian Estimation and Prediction Using Asymmetric Loss Func-
tions.” Journal of the American Statistical Association 81 : 446-451.
DEPARTMENT OF ECONOMICS AND BUSINESS AARHUS UNIVERSITY
SCHOOL OF BUSINESS AND SOCIAL SCIENCES www.econ.au.dk
PhD Theses since 1 July 2011 2011-4 Anders Bredahl Kock: Forecasting and Oracle Efficient Econometrics 2011-5 Christian Bach: The Game of Risk 2011-6 Stefan Holst Bache: Quantile Regression: Three Econometric Studies 2011:12 Bisheng Du: Essays on Advance Demand Information, Prioritization and Real Options
in Inventory Management 2011:13 Christian Gormsen Schmidt: Exploring the Barriers to Globalization 2011:16 Dewi Fitriasari: Analyses of Social and Environmental Reporting as a Practice of
Accountability to Stakeholders 2011:22 Sanne Hiller: Essays on International Trade and Migration: Firm Behavior, Networks
and Barriers to Trade 2012-1 Johannes Tang Kristensen: From Determinants of Low Birthweight to Factor-Based
Macroeconomic Forecasting 2012-2 Karina Hjortshøj Kjeldsen: Routing and Scheduling in Liner Shipping 2012-3 Soheil Abginehchi: Essays on Inventory Control in Presence of Multiple Sourcing 2012-4 Zhenjiang Qin: Essays on Heterogeneous Beliefs, Public Information, and Asset
Pricing 2012-5 Lasse Frisgaard Gunnersen: Income Redistribution Policies 2012-6 Miriam Wüst: Essays on early investments in child health 2012-7 Yukai Yang: Modelling Nonlinear Vector Economic Time Series 2012-8 Lene Kjærsgaard: Empirical Essays of Active Labor Market Policy on Employment 2012-9 Henrik Nørholm: Structured Retail Products and Return Predictability 2012-10 Signe Frederiksen: Empirical Essays on Placements in Outside Home Care
2012-11 Mateusz P. Dziubinski: Essays on Financial Econometrics and Derivatives Pricing 2012-12 Jens Riis Andersen: Option Games under Incomplete Information 2012-13 Margit Malmmose: The Role of Management Accounting in New Public Management Reforms: Implications in a Socio-Political Health Care Context 2012-14 Laurent Callot: Large Panels and High-dimensional VAR 2012-15 Christian Rix-Nielsen: Strategic Investment 2013-1 Kenneth Lykke Sørensen: Essays on Wage Determination 2013-2 Tue Rauff Lind Christensen: Network Design Problems with Piecewise Linear Cost
Functions
2013-3 Dominyka Sakalauskaite: A Challenge for Experts: Auditors, Forensic Specialists and the Detection of Fraud 2013-4 Rune Bysted: Essays on Innovative Work Behavior 2013-5 Mikkel Nørlem Hermansen: Longer Human Lifespan and the Retirement Decision 2013-6 Jannie H.G. Kristoffersen: Empirical Essays on Economics of Education 2013-7 Mark Strøm Kristoffersen: Essays on Economic Policies over the Business Cycle 2013-8 Philipp Meinen: Essays on Firms in International Trade 2013-9 Cédric Gorinas: Essays on Marginalization and Integration of Immigrants and Young Criminals – A Labour Economics Perspective 2013-10 Ina Charlotte Jäkel: Product Quality, Trade Policy, and Voter Preferences: Essays on
International Trade 2013-11 Anna Gerstrøm: World Disruption - How Bankers Reconstruct the Financial Crisis: Essays on Interpretation 2013-12 Paola Andrea Barrientos Quiroga: Essays on Development Economics 2013-13 Peter Bodnar: Essays on Warehouse Operations 2013-14 Rune Vammen Lesner: Essays on Determinants of Inequality 2013-15 Peter Arendorf Bache: Firms and International Trade 2013-16 Anders Laugesen: On Complementarities, Heterogeneous Firms, and International Trade
2013-17 Anders Bruun Jonassen: Regression Discontinuity Analyses of the Disincentive Effects of Increasing Social Assistance 2014-1 David Sloth Pedersen: A Journey into the Dark Arts of Quantitative Finance 2014-2 Martin Schultz-Nielsen: Optimal Corporate Investments and Capital Structure 2014-3 Lukas Bach: Routing and Scheduling Problems - Optimization using Exact and Heuristic Methods 2014-4 Tanja Groth: Regulatory impacts in relation to a renewable fuel CHP technology:
A financial and socioeconomic analysis 2014-5 Niels Strange Hansen: Forecasting Based on Unobserved Variables 2014-6 Ritwik Banerjee: Economics of Misbehavior 2014-7 Christina Annette Gravert: Giving and Taking – Essays in Experimental Economics 2014-8 Astrid Hanghøj: Papers in purchasing and supply management: A capability-based perspective 2014-9 Nima Nonejad: Essays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques in Time Series Econometrics
ISBN: 9788793195059