+ All Categories
Home > Documents > Essays in Bayesian Particle and Markov chain...

Essays in Bayesian Particle and Markov chain...

Date post: 20-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
136
Essays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques in Time Series Econometrics 2014-9 Nima Nonejad PhD Thesis DEPARTMENT OF ECONOMICS AND BUSINESS AARHUS UNIVERSITY DENMARK
Transcript
Page 1: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Essays in Applied Bayesian Particle and Markov Chain

Monte Carlo Techniques in Time Series Econometrics

2014-9

Nima Nonejad

PhD Thesis

DEPARTMENT OF ECONOMICS AND BUSINESS

AARHUS UNIVERSITY DENMARK

Page 2: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Essays in Applied Bayesian Particle and Markov

Chain Monte Carlo Techniques in Time Series

Econometrics

By Nima Nonejad

A PhD thesis submitted to

School of Business and Social Sciences, Aarhus University,

in partial fulfillment of the requirements of

the PhD degree in

Economics and Business

October 2014

Page 3: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Contents

Contents i

Preface iv

Summary v

Resume vii

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory 1

1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2. Modeling Structural Breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1. Change-point model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2. Mixture innovation model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.4. Breaks in the conditional variance . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.5. Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.6. Calculating the predictive likelihood and the predictive mean . . . . . . . . . . 7

1.3. Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6.1. Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6.2. Full sample estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.6.3. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.7. Prior Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.7.1. CPHAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.7.2. MHAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach 25

2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2. Change-point ARFIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3. Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1. Breaks in µ and d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.2. Only breaks in σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.3. Bayes factors and marginal likelihood computation . . . . . . . . . . . . . . . . 32

Page 4: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

CONTENTS

ii

2.4. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4.1. Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4.2. Change-point identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.3. Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.4. Deviance information criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4.5. Higher number of change points . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.6. Sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.5. Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.6. Application to S&P 500 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.6.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.6.3. Robustness to minimum duration restrictions . . . . . . . . . . . . . . . . . . . 46

2.6.4. Prior sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.5. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.6. Structural breaks and GARCH effects . . . . . . . . . . . . . . . . . . . . . . . 49

2.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models 51

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2. Markov Chains and Particle Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2.1. A Particle filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.2. Bayes factors and marginal likelihood computation . . . . . . . . . . . . . . . . 57

3.3. Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4. Unobserved Components Model of US Inflation . . . . . . . . . . . . . . . . . . . . . . 62

3.5. Long Memory with Stochastic Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5.1. Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.5.2. US core inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5.3. Subsample estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.5.4. Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.5.5. Parameter sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.6. Unobserved Components Model with SVM Effects . . . . . . . . . . . . . . . . . . . . 77

3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks 81

4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.2. Why is PG-AS Useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3. Particle Gibbs with Ancestor Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.1. Model comparison using the output from PG-AS . . . . . . . . . . . . . . . . . 88

4.4. Dow Jones Industrial Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.5. Structural Breaks and PG-AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.5.1. Simulation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.5.2. Real output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.5.3. Structural break ARFIMA-SV model . . . . . . . . . . . . . . . . . . . . . . . . 99

Page 5: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

CONTENTS

iii

4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A. Appendix for Chapter 1 105

A.1. Estimation of the change-point model . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.2. Marginal likelihood computation for the change-point model . . . . . . . . . . . . . . . 106

A.3. Estimation of the MIA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B. Appendix for Chapter 2 109

B.1. A direct approach for evaluating the likelihood function . . . . . . . . . . . . . . . . . 109

C. Appendix for Chapter 3 110

C.1. Simulation evidence: SV-MA(1) and SVM . . . . . . . . . . . . . . . . . . . . . . . . . 110

C.2. Particle Gibbs with ancestor sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

D. Appendix for Chapter 4 114

D.1. Prior sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

D.2. Sensitivity of PG-AS with respect to M . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Bibliography 117

Page 6: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Preface

In many econometric problems there are no obvious ways to address (a): nonlinearities, (b): time-

variation in the model parameters, or (c): dealing with high-dimensional models. These extensions are

often necessary from a theoretical point of view but very difficult to implement in practice. However,

by capitalizing on general advances in computer speed and computational power, Bayes’ theorem can

be used to generate effective estimation procedures that can handle (a), (b) and (c). The performance

of these procedures can then be evaluated using non-Bayesian criteria. Furthermore, these techniques

enable researchers to build relatively complicated econometric models which in turn better fit economic

theory. However, interest in these methods varies greatly between learning institutions. One major

reason can be a general lack of undergraduate or graduate courses on Bayesian methods and Markov

chain Monte Carlo techniques1.

In the light of the above discussion, this dissertation aims at illustrating the attractive aspects

of Bayesian inference which are purely computational and qualitative. This is done by using Gibbs

sampling, particle Gibbs with ancestor sampling and particle marginal Metropolis-Hastings to draw

from the analytical intractable posterior density defined from Bayes’ theorem. In each chapter, I either

try to extend a recent estimation procedure, or develop more flexible computational techniques such

that one can estimate new or extend already known econometric models.

I would like to thank the School of Business and Social Sciences at Aarhus University as well as

the Center for Research in the Econometric Analysis of Time Series (CREATES), funded by the

Danish National Research Foundation for hosting me, providing me with excellent research facilities,

a stimulating intellectual environment and generous financial support. Finally, a number people have

contributed in the making of this dissertation. First and foremost, I would like to thank my adviser

Asger Lunde for his guidance and support. I am also very grateful to Niels Haldrup, Henning Bunzel

and CREATES for providing me with the necessary computational facilities.

Updated preface

The pre-defence took place on October the 2nd 2014 in the presence of the jury composed of Eric

Hillebrand (Aarhus University and CREATES), Jim Griffin (University of Kent), Michel van der Wel

(Erasmus University Rotterdam and CREATES) and Asger Lunde. The jury made a number of

constructive comments and suggestions in order to improve the papers contained in the dissertation.

I have tried to implement as many of these as possible given the time constraint in this updated version.

Nima Nonejad

Aarhus, October 2014

1This is of course very strange because there is no shortage of good textbooks which cover the advances that haverevolutionized the field of Bayesian econometrics since the late 1980s. Among these are textbooks such as: Bauwens,Lubrano and Richard (1999), Koop (2003), Lancaster (2004) and Geweke (2005).

Page 7: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Summary

Each chapter of this dissertation stands as an independent article and is therefore self-contained.

The analyses demonstrate the usefulness of combining Bayesian methods with simulation methods

such as Gibbs sampling, particle Gibbs with ancestor sampling and particle marginal Metropolis-

Hastings with special focus on statistical analysis of economic time series data. I provide motivation

for the econometric techniques used in each individual chapter. All necessary computational details

are provided in the Appendices.

Chapter 1 proposes a flexible model that is able to simultaneously approximate long-memory be-

havior and incorporate structural breaks in the model parameters. This is achieved by combining the

mixture innovation (MIA) specification of Gerlach et al. (2000) with the heterogeneous autoregres-

sive (HAR) model of Corsi (2009). It is an open question if the MIA specification will perform well

when dealing with the sort of structural changes present in realized volatility data. The main purpose

of this chapter is to shed light on this question. Therefore, I compare the performance of the MIA

specification with an existing method for modeling structural breaks, namely, the change-point (CP)

specification of Chib (1998). I believe that applying the MIA specification to realized volatility data

and comparing its performance with the change-point specification, which in the literature is consid-

ered as a “state-of-the-art” structural break model is the most important contribution that I provide.

To my knowledge, no work has been done on comparing the out-of-sample forecasting performance of

these methods. These methods differ in important aspects, for example, their treatment of the break

process. The change-point specification imposes the restriction that a precise number of breaks occurs

in the sample, whereas for the MIA specification, the number of breaks is treated as unknown.

In an extensive empirical evaluation involving several volatility series, I demonstrate presence of

structural breaks and their importance for forecasting. However, one cannot establish that there is

one single forecasting method which can be recommended universally. That is, for some series MIA

outperforms the change-point specification, whereas for other series CP performs better than MIA.

Chapter 2 builds on Raggi and Bordignon (2012). Specifically, the aforementioned paper consid-

ers estimating a two-state Markov switching ARFIMA (autoregressive fractionally integrated moving

average) model using Markov chain Monte Carlo techniques. Chapter 2 on the other hand, consid-

ers a more challenging specification, namely, a structural break specification of the ARFIMA model.

Structural breaks occur through irreversible Markov switching or so-called change-point dynamics.

The parameters subject to structural breaks, the unobserved states, which determine the position of

the structural breaks are sampled from the joint posterior density by sampling from their respective

conditional posteriors using Gibbs sampling and Metropolis-Hastings. Furthermore, instead of using

traditional approaches to evaluate the likelihood function, I extend previous works on precision-based

algorithms and provide a direct approach to evaluate the likelihood function. I believe that incorpo-

rating these features within the ARFIMA setting is the most important contribution that I provide.

An extensive Monte Carlo experiment is conducted to investigate if the proposed method works

Page 8: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Summary

vi

well in identifying: data generating parameters, true structural break dates and the correct number

of structural breaks. With regards to the last point, I compare the ability of the marginal likelihood

(ML) and the deviance information criterion (DIC) to detect the correct number of structural breaks.

Results show that ML and DIC perform very well in identifying the true number of structural breaks.

The higher the number of parameters that are affected by a break, the more likely it is that structural

breaks are correctly identified. Finally, it becomes more difficult to identify one or more breaks when

only the persistence parameter, d, changes. Applied to daily S&P 500 data from 2000 to 2009, one

finds strong evidence in favor of four structural breaks. The evidence in favor of structural breaks

is robust to different specifications including a GARCH specification for the conditional volatility of

realized volatility.

Chapter 3 is inspired by Bos (2011), Bos et al. (2012), Creal (2012), Flury and Shephard (2011). It

provides some contribution in the field of sequential Monte Carlo methods. Specifically, this chapter

details particle Markov chain Monte Carlo (PMCMC) techniques for analysis of unobserved component

time series models. The aim of this chapter is to describe the basic steps of PMCMC together with

details on implementation of some of the key algorithms in Ox2. I show that PMCMC allows one to

develop a general and flexible framework for estimation, forecasting and model comparison. On the

other hand, estimating the same types of models using “pure” Gibbs sampling would require relatively

more programming effort.

Chapter 4 is the last chapter of the dissertation. In this chapter, I apply a relatively new tool in

the family of sequential Monte Carlo methods, which is particularly useful for inference of stochastic

volatility models, namely, particle Gibbs with ancestor sampling (PG-AS), suggested in Lindsten et

al. (2012). I apply PG-AS to the challenging class of stochastic volatility models with increasing com-

plexity, including leverage and in mean effects. I provide applications that demonstrate the flexibility

of PG-AS under these different circumstances and justify applying it in practice.

Finally, I also provide applications where I combine discrete structural breaks within the stochastic

volatility model framework using both simulated and macroeconomic time series data. Structural

breaks are modeled through irreversible Markov switching, or so-called change-point dynamics. I

estimate model parameters, log-volatilities and change-points dates conditional on a fixed number of

change points. For each of these specifications, ML and DIC are calculated. They are then used to

determine the number of structural breaks. For instance, I model changing time series characteristics

of postwar monthly US core inflation rate using a structural break ARFIMA model with stochastic

volatility. I allow for structural breaks in the level, AR, MA parameters, persistence parameter, d,

level of volatility, persistence of volatility and volatility of volatility. Overall, compared to constant

parameter models, structural break specifications provide better in-sample fit, produce better out-of-

sample point and density forecasts.

Each of the subsequent chapters has been submitted to peer reviewed field journals. Chapter 1 has

been submitted to Journal of Forecasting. Chapter 2 and an earlier version of Chapter 3 have been

submitted to Journal of Financial Econometrics and Journal of Time Series Econometrics. Chapter 4

has been submitted to Studies in Nonlinear Dynamics and Econometrics.

2Both mentioned papers provide excellent introductions to particle filtering. Furthermore, the authors provide thesoftware associated with their work. Hence, the idea behind this chapter is to do the same, however, for PMCMC.This way other researchers can replicate the results using my codes. The choice of Ox is mainly because it is a popularsoftware among econometricians.

Page 9: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Resume

Denne afhandling bestar af fire uafhængige kapitler. Den røde trad i disse kapitler er modellering af

tidsrækker ved hjælp af simulationsmetoder som Markov chain Monte Carlo (MCMC) og sequential

Monte Carlo (SMC). Hvordan skal man modellere ikke-lineariteter, inklusion af uobserverede kompo-

nenter, strukturelle ændringer i modelparametre og andre vanskelige tekniske egenskaber? Hvordan

kan man opna palidelig estimation og inferens? I denne afhandling forsøger jeg at bidrage til at finde

en løsning pa disse problemer.

De første to kapitler foreslar en strategi for estimation og forecasting af realized volatility ved

hjælp af Gibbs sampling og Metropolis-Hastings. Jeg undersøger, om estimation og præcisionen af

prognoserne kan forbedres ved at tage højde for strukturelle ændringer i modelparametre. Pa baggrund

af en række simulationer samt empiriske anvendelser konkluderer jeg, at undladelse af tidsvariation

i modelparametre kan være et alvorligt problem, der kan give anledning til reduceret præcision af

prognoser, samt at brugen af de foreslaede metoder bidrager til at afhjælpe dette.

De sidste to kapitler undersøger muligheden for at opna robust estimation af modeller med uob-

serverede komponenter ved hjælp af particle Gibbs med ancestor sampling, PG-AS, og particle marginal

Metropolis-Hastings. Begge kapitler fokuserer mest pa de nævnte metoders tekniske egenskaber. Ved

simple parametriske antagelser, er jeg i stand til at estimere, og danne makroøkonomiske prognoser.

Jeg dokumenterer pa ny tilstedeværelsen af tidsvarierende parametre pa makroøkonomiske data. Igen,

bliver præcisionen af prognoserne forbedret ved at tage højde for tidsvariation i modelparametrene.

Page 10: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation Heterogeneous

Autoregressive Model for Structural Breaks and

Long Memory

Author: Nima Nonejad

Abstract: We propose a flexible model that is able to approximate long-memory behavior and in-

corporate structural breaks in the model parameters. Our model is an extension of the heterogeneous

autoregressive (HAR) model, which is designed to model and forecast volatility of financial time se-

ries. In an extensive empirical evaluation involving several volatility series, we demonstrate presence

of structural breaks and their importance for forecasting. We find that the choice of how to model

break processes is important in achieving good forecast performance. Furthermore, structural break

specifications perform better than simple, rolling window forecasts.

Keywords: Bayesian inference, forecasting, mixture innovation models, realized volatility

(JEL: C11, C22, C51, C53)

Page 11: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

2

1.1. Introduction

Measuring and modeling the conditional variance or volatility of financial time series is an important

issue in econometrics. General approaches of estimating volatility are based on parametric models

such as generalized autoregressive conditional heteroskedasticity (GARCH) models proposed by Engle

(1982) and Bollerslev (1986), or stochastic volatility (SV) models as in Kim et al. (1998). However,

during the past two decades, the approach of using improved measures of expost volatility constructed

from high-frequency data has become very popular. This measure is called realized volatility (RV),

see Andersen et al. (2001), and Barndorff-Nielsen and Shephard (2002 a,b) for a formal discussion.

This paper proposes a simple model that merges long-memory dynamics and nonlinearities. The

specification that is put forward is a generalization of the heterogeneous autoregressive (HAR) model,

see Corsi (2009). The HAR model has been applied with success in modeling and forecasting realized

volatility, see Andersen et al. (2007). Our model is called the mixture innovation heterogeneous au-

toregressive model, MHAR. It combines ingredients from HAR and mixture innovation (MIA) models,

see Gerlach et al. (2000), Giordani and Kohn (2008) and Groen et al. (2012). This approach builds on

the state-space representation, modeling the breaks through mixture distributions in state innovations

of linear Gaussian state-space models. Indeed, this approach is very intuitive and has several desirable

features such as: the possibility of including a random number of breaks that can occur in the sample,

the possibility of jointly modeling small possibly frequent and large possibly less frequent breaks, while

allowing different parameters to change at different points in time.

It is an open question if the MHAR model will perform well when dealing with the sort of structural

changes present in realized volatility data. The main purpose of this paper is to shed light on this

question. Therefore, we compare the performance of the MIA specification with an existing method

for modeling structural breaks in realized volatility data, namely, the change-point specification of

Chib (1998). Henceforth, we refer to this specification as CPHAR, see Liu and Maheu (2008). In

addition, we also include alternative forecasting procedures such as: recursively estimating the HAR

model and a random walk time-varying parameter HAR model.

We believe that applying the MIA specification to realized volatility data and comparing its perfor-

mance with the change-point specification, which in the literature is considered as a “state-of-the-art”

structural break model is the most important contribution that we provide. To our knowledge, no work

has been done on comparing the out-of-sample performance of these methods. MHAR and CPHAR

differ in important aspects, for example, their treatment of the break process. Specifically, the “gen-

eral” change-point specification imposes the restriction that a precise number of breaks occurs in the

sample. Furthermore, all model parameters change simultaneously due to a structural break. How-

ever, for the MIA specification, the number of breaks is treated as unknown, and model parameters

are allowed to change separately at different points in time.

For eleven realized volatility series between 2004 and 2009, we consider the aforementioned models

and produce daily, weekly and biweekly forecasts. We evaluate forecast performance using two criteria:

predictive likelihood (PL), see Geweke (2005) and root mean squared error, RMSE. Both criteria

are easily attainable within the Bayesian estimation procedure. It turns out that these two loss

functions lead to similar qualitative conclusions. Overall, structural breaks play an important role for

forecasting in all of the volatility series that we consider. Specifically, we find that each structural break

specification outperforms the HAR model, regardless of criterion or forecast horizon. Furthermore, the

Page 12: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

3

MHAR model with time-varying volatility tends to perform better than the other models, especially

for longer forecast horizons. However, we find that there is no one single method which can be

recommended universally, i.e. for all series and all forecast horizons. For some series CPHAR performs

better than MHAR specifications and vice versa.

The structure of this paper is as follows. Section 1.2 discusses the econometric issues for Bayesian

estimation. Section 1.3 reviews the theory behind the volatility measures used in this paper. Section

1.4 briefly presents the HAR model. Details on the data are presented in Section 1.5. Section 1.6

discusses the empirical results. Finally, the last section concludes.

1.2. Modeling Structural Breaks

1.2.1. Change-point model

The models considered in this paper use the framework of a Gaussian linear regression model

yt = Xt−1β + εt, εt ∼ N(0, σ2

)(1.2.1)

for t = 1, .., T . Let YT = (y1, ..., yT )′

be a vector of size T , and XT be a T × k, matrix of regressors

with row Xt−1, which can also include lags of yt. Obviously, different structural break models vary

in the way they model breaks in (1.2.1) by allowing β and possibly σ2 to vary over time. To begin

with, we focus mainly on structural breaks in the regression coefficients, β, and assume that σ2 is

fixed through time. However, breaks in σ2 can also be modeled rather easily. We consider two main

approaches to modeling structural breaks in β and later in(β;σ2

).

We start by considering the change-point (or structural break) specification proposed by Chib (1998).

This specification uses a hidden Markov model with a restricted transition matrix to model the change

points. A test for the number of structural breaks is then a test of the dimension of the hidden Markov

chain. Model parameters and change points are jointly estimated conditional on a fixed number of

change points. Bayes factors are then used to compare the evidence for the number of breaks.

Assume that there are m−1, m ∈ 1, 2, ... change points at unknown times, Ωm = τ1, τ2, ..., τm−1.Separated by those change points, there are m different regimes. Thus,

βt =

β1 if t < τ1

β2 if τ1 ≤ t < τ2

......

...

βm−1 if τm−2 ≤ t < τm−1

βm if τm−1 ≤ t

. (1.2.2)

The density of observation yt, t = 1, ..., T depends on βj , j = 1, 2, ...m whose value changes at the

change points, Ωm, and σ2. Let S = (s1, ..., sT )′

denote the unobserved state system, where st = j

indicates that yt is from regime j and follows the conditional distribution p(yt | βj , σ2, Yt−1

). The

one-step ahead transition probability matrix for st, P , is assumed to be

Pr (st = j | st−1 = j) = pj , Pr (st = j + 1 | st−1 = j) = 1− pj (1.2.3)

Page 13: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

4

for j = 1, ...,m− 1 and Pr (st = m | st−1 = m) = 1. The other elements of P are set to zero. Hence, if

regime j holds at time t−1, then at time t, the process can either remain in regime j (with probability

pj), or a break occurs and the process moves to regime j + 1 (with probability 1− pj). Once the last

regime is reached, one stays there forever. This structure enforces the ordering (1.2.2) on the change

points1. Technical details about estimation of (1.2.1) using (1.2.2) are provided in the Appendix2.

As previously mentioned, the change-point regression model requires a fixed number of structural

breaks to occur3. In this paper we follow Pesaran et al. (2006), Liu and Maheu (2008), Bauwens

et al. (2011) and estimate models for different number of structural breaks (0 to 7 in our empirical

application). Then, we compare results across these models using the marginal likelihood criterion.

Specifically, let models with i and j structural breaks be denoted by Mi and Mj , respectively. For each

specification, we can calculate the marginal likelihood, p (y1, ..., yT |Mi), following Chib (1995). We

then rank models by means of their Bayes factors, BFij = p (y1, ..., yT |Mi) /p (y1, ..., yT |Mj). Large

values of BFij indicate that data supports Mi over Mj , see Kass and Raftery (1995). In the Appendix,

we provide details on how to compute the marginal likelihood for the change-point regression model.

1.2.2. Mixture innovation model

An alternative specification that allows for structural breaks in the regression parameters of (1.2.1)

can be defined in the following way

yt = Xt−1βt + εt, εt ∼ N(0, σ2

), (1.2.4)

where βt = (β1t, ..., βkt)′

is a vector of time-varying regression parameters. Each element of βt in

(1.2.4) evolves according to the following form

βit = βit−1 + κitηit, ηit ∼ N(0, q2

i

), i = 1, ..., k, (1.2.5)

where κit = 0, 1 is an unobserved process with Pr (κit = 1) = πi, Kt = (κ1t, ..., κkt)′, K = KtTt=1

and B = βtTt=1. Accordingly, the ith regression coefficient, βit, remains the same as its previous

value, βit−1, unless κit = 1 in which it changes with ηit.

This specification implies that βit in (1.2.5) is allowed to change every time period, but it does

not necessarily need to change at all. Also, changes in the separate parameters are not restricted to

coincide as in the change-point model. Rather, changes in each βit are allowed to occur at different

points in time. Furthermore, for the change-point model, we specify the number of structural breaks

by comparing ML of models with different number of change points. Here, we estimate only one

specification, and allow the data to determine the nature of structural breaks in each element of βt.

In the Appendix, we provide details on how to estimate the MIA model using Gibbs sampling.

1Theoretically, we could allow for breaks in each element of β to occur independently. In this case, st will be a k × 1discrete random variable with the first element controlling breaks in β1, the second element controlling breaks in β2and so on. Furthermore, in order to ease the notation, conditioning on XT is suppressed.

2Notice that our specification is identical to Liu and Maheu (2008) but differs slightly from the hierarchical priorspecification (on the conditional mean, variance and the prior on the regime durations) of Pesaran et al. (2006).However, since we perform direct forecasting it does not make any difference which specification is used.

3Koop and Potter (2007) argue that (1.2.3) may be restrictive in some situations. They suggest the use of the moreflexible Poisson distribution for the durations. However, in order to avoid the very heavy additional computationalcost inherent to the use of the Poisson prior, we choose only to implement the change-point model using (1.2.3).

Page 14: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

5

1.2.3. Monte Carlo

The plan of our Monte Carlo study is to judge the performance of CP and MIA under different

circumstances. Specifically, we use a model very similar to (1.2.1) and consider cases of: 0 breaks, 1

break where all parameters change at the same time, and 2 breaks where the first (last) break in each

parameter occurs at a different time (the same time). This setting will provide us with information on

the tendency of each model to give false signals of structural breaks and misinterpret the magnitude

of parameter changes. For instance, as stated in Section 1.2.1, we must specify the number of change

points for the CP model. Furthermore, all the elements in β change due to a structural break. On the

other hand, changes in the separate parameters are not restricted to coincide for the MIA specification.

Thus, it is interesting to find out how well CP performs compared to MIA when structural breaks in

each parameter occur at different points in time. In a similar fashion, it is also interesting to find out

how well MIA performs compared to CP or a simple OLS when there are no structural breaks in the

parameters of the data generating (DGP) process.

For each DGP, 100 samples of 500 observations are generated. The true values of β are given in

Table 1.1. We choose Xt−1 ∼ N (0, I2) and σ2 = 0.01 in all of our simulations. For each Monte

Carlo repetition, we report: (a) the full in-sample root mean squared error (RMSE) of CP (MIA) over

OLS, and (b) the in-sample root mean squared error of CP (MIA) over OLS using only data from

the last structural break till the end of the sample, “last regime”. This requires exact knowledge of

the true date of the last break. Therefore, it is not available outside simulation studies. As shown

in the second column of Table 1.1, there is little to lose from using the MIA specification when there

are no structural breaks in the DGP. The increase in the RMSE of MIA over OLS is only about 5%.

We estimate models with zero up to five change points for the CP specification. When there are no

structural breaks in the DGP, the CP specification with no change point obtains the highest ML value

at every Monte Carlo repetition4.

On the other hand, compared to OLS, large gains are obtained for both models in the presence

of structural breaks. When there is one structural break in the DGP, CP and MIA perform very

similarly. Furthermore, both models perform almost as well as if the date of the structural break

was known. For instance, the RMSEs of CP/OLS and MIA/OLS for “last regime” are almost equal

to 1. For Case 2, we see that MIA performs slightly better than CP. More importantly, for CP, we

discover that the specification with four change points obtains the highest ML value at every Monte

Carlo repetition, even though there are only two structural breaks in the DGP. Recall that in this

case, the first structural break in each parameter occurs at a different point in time. Clearly, CP has

some difficulties incorporating this feature as changes in the parameters are restricted to coincide thus

making inference difficult. On average, the first change point is estimated at t = 125, the second at

t = 200 and the third at t = 350. Furthermore, CP also estimates a fourth change point at t = 400.

Evidently, this is incorrect as there are no structural breaks at t = 400 in the DGP. However, the

average Monte Carlo estimates of β are almost identical in the last two regimes. On the other hand,

MIA does not encounter any difficulty incorporating this feature, and is able to detect the break dates.

Finally, CP (MIA) performs almost as well as if the date of the last structural break was known. The

increase in the RMSE of CP (MIA) over OLS for the last regime is 0.5% (0.8%).

4The numbers inside the parentheses indicate the best change-point specification over the total number of Monte Carlorepetitions. In each case, the log(BF) in favor of the correct change-point specification is about 20 to 30.

Page 15: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

6

Table 1.1.: Monte Carlo results

DGP Case 0 Case 1 Case 2

Number of breaks 0 1 2β1 0.8 0.8, 0.2 0.8, 0.2, 0.8β2 -0.4 -0.4, 0.1 -0.4, 0.1, -0.4

True break datest = 250 for β1

t = 250 for β2

t = 125, 350 for β1

t = 200, 350 for β2

Full sample RMSE

CP/OLS1.012

(100/100, 0 CP)0.314

(100/100, 1 CP)0.377

(100/100, 4 CP)MIA/OLS 1.048 0.326 0.365Last regime RMSE

CP/OLS1.002

(100/100, 1 CP)1.005

(100/100, 4 CP)MIA/OLS 1.004 1.008

This table reports the RMSE ratio of CP and MIA over OLS using both the entire sample as well as only data

from the last break point till the end of the sample, “last regime”. The numbers inside the parentheses indicate

the number of times over the Monte Carlo repetitions when the specific change-point specification obtains the

highest ML value. CP denotes the number of change points that are conditioned on.

1.2.4. Breaks in the conditional variance

In this section, we briefly describe how to model structural breaks in the conditional variance, σ2,

for CP and MIA specifications. For the change-point model, the conditional posterior density of

θj =(β′j , σ

2j

)′depends only on observations in regime j = 1, ...,m. Therefore, let yj = yt : st = j,

Xj = Xt−1 : st = j and use standard Gibbs sampling methods for the linear model. Once we sample

βj for j, we can use εj = yj − Xjβj , and sample σ2j from the inverse Gamma density, see Liu and

Maheu (2008). Ideally, it would be desirable to allow σ2 to vary independently from β. However,

incorporating this feature is computationally more demanding and will probably not provide any

significant improvements.

Modeling structural breaks in σ2 for the MIA model is a bit more complicated. In this paper we

take the same approach as Giordani and Kohn (2008) and Groen et al. (2012):

Initialize the sampler with a time series of conditional variances, σ21, ..., σ

2T .

Conditional on σ21, ..., σ

2T , draw K, B, q2

1, ..., q2k and π1, ..., πk from their respective conditional

posteriors. Compute the residual for time t as εt = yt−Xt−1βt = σtut, where ut ∼ N (0, 1). We

can then square both sides, take the logarithm such that log ε2t = log σ2

t + log u2t , where log u2

t is

logχ21 distributed and can be very accurately approximated by a mixture of Normals with seven

components, see Kim et al. (1998). We follow the stochastic volatility literature and incorporate

structural breaks in σ2 as log σ2t = log σ2

t−1 + κSVt ηt, where ηt ∼ N(0, σ2

η

).

Here, κSVt = 0, 1 and evolves independently from Kt. As before, we can use the conditioning features

of the Gibbs sampler to sample log σ2t , κ

SVt , t = 1, ..., T , πSV and σ2

η from their respective conditional

posteriors. Specifically, we draw KSV =(κSV1 , ..., κSVT

)′using the algorithm of Gerlach et al. (2000).

Thereafter, we sample log σ2t , t = 1, ..., T using Carter and Kohn (1994) conditional on KSV and σ2

η.

Page 16: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

7

1.2.5. Model comparison

In this paper we compare the performance of models using a specific out-of-sample period. Consider

the universeM = (M1, ...,Mn) of models. Let p (yt | θk, Yt−1,Mk) denote the conditional data density

of model Mk given Yt−1 and the model parameters, θk. Conditional on Yt−1 = (y1, ..., yt−1)′, the

predictive likelihood (PL) of model Mk for yt, ..., yT , t < T is defined as

p (yt, ..., yT | Yt−1,Mk) =

ˆΘk

p (yt, ..., yT | θk, Yt−1,Mk) p (θk | Yt−1,Mk) dθk. (1.2.6)

Note that, if t = 1 this would be the marginal likelihood. Thus, (1.2.6) changes to

p (y1, ..., yT |Mk) =

ˆΘk

p (y1, ..., yT | θk,Mk) p (θk |Mk) dθk,

where p (y1, ..., yT | θk,Mk) is the likelihood and p (θk |Mk) is the prior density of model Mk. Hence,

the sum of log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the

marginal likelihood, but ignoring the initial t − 1 observations. The predictive likelihood indicates

how well model Mk accounts for the realizations, yt, ..., yT , such that the best model is the one which

achieves its maximum value. Hence, it can be used to order models according to their predictive

abilities. For instance, (1.2.6) is simply the product of the individual predictive likelihoods

p (yt, ..., yT | Yt−1,Mk) =T∏s=t

p (ys | Ys−1,Mk) , (1.2.7)

where each of the terms, p (ys | Ys−1,Mk), has parameter uncertainty integrated out. We can compare

the relative value of density forecasts using the realized data, yt, ..., yT , with the predictive likelihoods

for two or more models.

The Bayesian approach also allows for the comparison and ranking of models by predictive Bayes

factors, PBF. As before, suppose we have n different models denoted by Mk, k = 1, ..., n. The PBF

for yt, ..., yT and M1 versus M2 is

PBF12 = p (yt, ..., yT | Yt−1,M1) /p (yt, ..., yT | Yt−1,M2) .

It provides an estimate of the relative evidence for model M1 versus M2 over yt, ..., yT . PBFs include

Occam’s razor in that they penalize highly parametrized models that do not deliver improved predictive

content. Kass and Raftery (1995) recommend considering twice the logarithm of PBF for model

comparison. Evidence in favor of model M1 can be interpreted as: not worth more than a bare mention

for 0 ≤ 2 log (PBF12) < 2; positive for 2 ≤ 2 log (PBF12) < 6; strong for 6 ≤ 2 log (PBF12) < 10 and

very strong for 2 log (PBF12) > 10, see Kass and Raftery (1995).

1.2.6. Calculating the predictive likelihood and the predictive mean

Calculating the predictive likelihood within a Gibbs sampling scheme is easy. We can simply use the

predictive decomposition along with the output from the Gibbs sampler, θ(1), ..., θ(N). Specifically,

each term on the right-hand side of (1.2.7) can be consistently estimated from the Gibbs sampler

Page 17: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

8

output as

p (yt | Yt−1,Mk) ≈1

N

N∑i=1

p(yt | θ(i)

k , Yt−1,Mk

). (1.2.8)

For example, in the context of (1.2.1), θ(i) =(β′(i), σ2(i)

)′and p

(yt | θ(i), Yt−1

)denotes the Normal

density with mean Xt−1β(i), variance σ2(i), evaluated at yt. The Gibbs sampler draws are obtained

based on the information set, Yt−1. As a new observation enters the information set, the posterior is

updated through a new round of Gibbs sampling. The predictive density, p (yt+1 | Yt), can then be

calculated in a similar manner.

We can also compare forecasts of models based on the predictive mean. Similar to the predictive

likelihood, the predictive mean can be computed using the Gibbs draws. For instance, in the context

of (1.2.1), we calculate the predictive mean of yt conditional on Yt−1 as

E [yt | Yt−1] ≈ 1

N

N∑i=1

Xt−1β(i). (1.2.9)

Calculating (1.2.8) or (1.2.9) for the change-point and MIA specifications is a bit more complicated

because one must also consider uncertainty regarding the timing of the structural breaks. This uncer-

tainty is accounted for by using draws of S(i), K(i) and KSV (i), i = 1, ..., N . All of these quantities

are available from the Gibbs output.

1.3. Realized Volatility

Assume that the price process belongs to the class of special semi-martingales, which is a very broad

class of processes including Ito and jump processes. Andersen et al. (2001), and Barndorff-Nielsen and

Shephard (2002a, b) show that the quadratic variation of the process which is defined as integrated

volatility plus a jump component provides a natural measure of expost volatility. Hence, consider the

following logarithmic price process

dp (t) = µ (t) dt+ σ (t) dW (t) + J (t) dq (t) , 0 ≤ t ≤ T, (1.3.1)

where µ (t) is the drift term, σ (t) is the stochastic volatility process, W (t) is a standard Wiener

process, dq (t) is a Poisson process with dq (t) = 1 which corresponds to a jump at time t, dq (t) = 0

which corresponds to no jump, a jump intensity, λ (t), and J (t) refers to the size of a realized jump.

The increment in quadratic variation from time 0 to t is

QVt =

ˆ t

0σ2 (s) ds+

∑0≤s≤t,dq(s)=1

J2 (s) ,

where the first term, integrated volatility, is from the continuous component of equation (1.3.1), and

the second term is the contribution from discrete jumps. To consider estimation of QVt, the daily

time interval is normalized to unity and divided into n periods. Each period has length 4 = 1/n.

The 4period return is defined as rt,j = p (t+ j4)− p (t+ (j − 1)4), j = 1, ..., n. The daily return is

Page 18: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

9

simply given as rt = Σnj=1rt,j . Andersen et al. (2001), and Barndorff-Nielsen and Shephard (2002a,

b) apply the following estimator called realized volatility. It is defined as

RVt =n∑j=1

r2t,j

p→ QVt.

Therefore, RVt is the relevant quantity to focus on with regards to modeling and forecasting of volatil-

ity. Barndorff-Nielsen and Shephard (2004) also show how the continuous component can be separated

from the jump component of volatility. They define realized bipower variation as

RBPt = µ−21

n∑j=2

|rt,j−1||rt,j |,

where µ1 =√

2/π. As n→∞, we have that

RBPtp→ˆ t

0σ2 (s) ds.

The difference between RVt and RBPt is an estimate of the daily jump component. Market mi-

crostructure dynamics contaminate the price process with noise. In some instances the noise can be

time dependent and may be correlated with the efficient price. Hence, RVt can be a biased and incon-

sistent estimator of QVt. Hansen and Lunde (2006) provide a bias correction to realized volatility by

using autocovariances of intraday returns in the following way

RV qt =

n∑j=1

r2t,j + 2

q∑w=1

(1− w

1 + q

) n−w∑j=1

rt,jrt,j+w,

where q is a small non-negative integer, and we set q = 1 in this paper. Market microstructure also

contaminates bipower variation. We follow Andersen et al. (2007) and adjust bipower variation by

using staggered returns as

RBPt =π

2

n

n− 2

n∑j=3

|rt,j−2||rt,j |.

In the following, RV qt is referred to as RVt and RBPt is referred to as RBPt.

1.4. Model

An important feature of the time series of RV is the strong serial dependence, see for instance Andersen

et al. (2003), Corsi (2009), Koopman et al. (2005) and Ghysels et al. (2006). Corsi (2009) shows that

the heterogeneous autoregressive (HAR) model can capture the strong persistence in the data with a

simple linear structure. The HAR model provides a flexible way to model and forecast realized volatil-

ity. According to the framework of Corsi (2009), partial volatility is defined as the volatility generated

by a certain market component. The model is then an additive cascade of different partial volatil-

ities, see for instance Muller et al. (1997). By straightforward recursive substitution of the partial

Page 19: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

10

volatilities, Corsi (2009) shows that the additive volatility cascade leads to a simple restricted linear

autoregressive model. The HAR model can approximate many of the features of realized volatility,

including long-memory. Our benchmark HAR model is given as

yt,h = β0 + βJJt−1 + βdyt−1,1 + βwyt−5,5 + βmyt−22,22 + εt,h, εt,h ∼ N(0, σ2

), (1.4.1)

where yt,h = h−1Σhi=1RVt+i−1 is the average realized volatility, h ≥ 1 periods ahead. Evidently,

yt,1 = yt. This model postulates that three factors affect yt,h: daily volatility, yt−1, weekly volatility,

yt−5,5, and monthly volatility, yt−22,22. For h = 1, we have a HAR model for the daily volatility,

for h = 5 a HAR model for the weekly volatility and so forth. Following Andersen et al. (2007),

we also include a jump term in (1.4.1) as Jt = max 0, RVt −RBPt. The HAR model with a jump

component can be cast into a standard regression form, yt = Xt−1β + εt, where yt = yt,h, εt = εt,h,

β = (β0, βJ , βd, βm)′

and Xt−1 = [1, Jt−1, yt−1,1, yt−5,5, yt−22,22]. Finally, besides (1.4.1), we can also

estimate structural breaks versions of the HAR model using the techniques from Section 1.2.

1.5. Data

The data consists of high-frequency observations of trades on the S&P 500 index using the Spyder

(SPY) fund, Boeing (BA), Bank of America (BAC), Caterpillar (CAT), General Electric (GE), IBM,

Johnson & Johnson (JNJ), JP Morgan (JPM), Pepsi (PEP), Walmart (WMT) and Exxon (XOM)

from January 2, 2004 to December 31, 2009, for a total of 1511 trading days.

The cleaning of the data is carried out using the steps in Barndorff-Nielsen et al. (2009). After

cleaning, a 5-minute grid from 9:30 to 16:00 is constructed using previous-tick method, see Hansen

and Lunde (2006). From this grid, 5-minute intraday log-returns are constructed. The log-returns

are then used to construct realized volatility and realized bipower variation. Conditioning on the first

22 observations, the final data consists of T = 1488 observations. We define yt,h as√

252RVt,h/100.

Table 1.2 presents summary statistics for yt.

Table 1.2.: Summary statistics

Series mean median std. dev. min max

BA 0.225 0.130 0.189 0.069 1.319BAC 0.303 0.365 0.152 0.048 3.646CAT 0.266 0.167 0.212 0.081 1.609GE 0.222 0.203 0.151 0.041 1.977IBM 0.181 0.116 0.148 0.045 1.309JNJ 0.138 0.084 0.118 0.035 1.266JPM 0.285 0.262 0.178 0.056 2.426PEP 0.159 0.094 0.137 0.052 1.514SPY 0.136 0.108 0.103 0.028 1.349

WMT 0.184 0.099 0.160 0.051 1.414XOM 0.206 0.125 0.176 0.057 1.996

Summary statistics, yt, January 2, 2004 to December 31, 2009.

In total 1510 observations.

Page 20: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

11

1.6. Results

1.6.1. Priors

For HAR, CPHAR and MHAR models, estimation and forecasting is performed conditional on the

following priors

HAR

β ∼ N (0, 100I) , σ2 ∼ IG(

4

2,0.2

2

)

CPHAR

βj ∼ N (0, 100I) , σ2j ∼ IG

(4

2,0.2

2

)pj ∼ Beta (20, 0.1) , j = 1, ...,m, pm = 1

MHAR

q2i ∼ IG

(4

2,0.2

2

), σ2 ∼ IG

(4

2,0.2

2

)πi ∼ Beta (0.5, 37) , i = d,w,m,

where IG(.2 ,

.2

)stands for the inverse Gamma density with E

[σ2j

]= E

[σ2]

= E[q2i

]= 0.10, see

Kim and Nelson (1999). In this setting, priors on the regression coefficients, βj , σ2j , σ

2 and q2i are

uninformative, while the priors on pj and πi tend to favor infrequent structural breaks. In Section 1.7,

we conduct a prior sensitivity analysis for CPHAR as well as for MHAR. Overall, we see that results

are slightly sensitive to prior settings on π for MHAR, whereas results are very robust with respect to

different hyperparameter values on P for CPHAR.

Table 1.3.: Change-point dates based on the full sample

Series # CP dates

BA 2 07-25-07 09-11-08BAC 3 02-26-07 10-30-07 09-11-08CAT 2 07-17-07 09-02-08GE 3 05-14-04 06-18-07 09-08-08IBM 4 07-18-07 03-24-08 09-05-08 12-09-08JNJ 2 12-31-07 09-05-08JPM 2 07-18-07 09-02-08PEP 2 07-18-07 09-15-08SPY 4 02-27-07 03-19-08 09-08-08 12-29-08

WMT 2 09-29-06 09-09-08XOM 4 07-19-07 02-12-08 09-08-08 12-08-08

This table reports the change-point dates for each series. The first column lists the volatility series.

The second column lists the number of the change points (CP) that are conditioned on. The third

column shows the change-point dates. The change-point dates are defined as the first observation

of a new regime, using the mode of S(i)Ni=1. Our sample period starts from February 2, 2004 to

December 31, 2009 (1488 observations).

Page 21: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

12

Finally, besides these models, we also estimate a MHAR model with time-varying volatility, see Section

1.2.4. Henceforth, we refer to this model as MHAR-SV. For this model, we choose the same prior hy-

perparameter values as the MHAR model for q2i and πi. With regards to the additional parameters, we

let πSV ∼ Beta (0.5, 42) and σ2η ∼ IG (4/2, 10/2). We also experiment with different hyperparameter

values on πSV and σ2η. However, we do not find any significant changes worth mentioning.

1.6.2. Full sample estimation

We estimate daily CPHAR, MHAR and MHAR-SV models using the volatility series listed in Table

1.2. Thus, we obtain a better understanding with regards to nature and dates of possible breaks.

For each series, we estimate the CPHAR model from 0 to 7 change points. We relax the assumption

of homoskedastic errors of Section 1.2.1 and thus incorporate structural breaks in σ2 as well as in β,

see Section 1.2.4. We then choose the change-point specification with the highest marginal likelihood

value. With regards to MHAR specifications, we find that βdt, βwt and βmt are able to capture

structural changes in yt as β0t and βJt estimates are basically constant through time. Therefore, we

choose to estimate the intercept and the jump component as being constant coefficients. Hence, the

MHAR (MHAR-SV) model is respecified as

yt,h = β0 + βJJt−1 + βdtyt−1,1 + βwtyt−5,5 + βmtyt−22,22 + εt,h,

where yt,h =√

252RVt,h/100. Estimating β0 and βJ within a Gibbs sampling scheme is straightforward

as we can sample these parameters from their Gaussian conditional posterior using sampled draws of

the regression parameters and the conditional variance. In Table 1.3 we report structural break dates

found in daily CPHAR models using the complete sample. In our calculations break dates are defined

as the first observation of the new regime, using the mode of the posterior draws of S, S(i)Ni=1. For

most of the series, we find evidence in favor 2 change points. On the other hand, we find evidence in

favor of 4 change points for IBM, SPY and XOM. Overall, there is clear evidence that our realized

volatility series are subject to structural breaks. Accordingly, all of our series have at least one

structural break when modeled using the CP specification. Furthermore, for all series, we do not find

any posterior uncertainty regarding the number of change points.

The next question is how large are the parameter changes when breaks occur and which parame-

ters are mostly affected. Table 1.4 reports posterior mean and standard deviation estimates of the

parameters for the CPHAR equations for BAC, IBM and SPY. We get similar results for the other

series. Focusing on these series, we observe that the more sensitive parameters are βd, βw and σ2. As

expected, σ2 increases during the financial crises of 2008. For IBM and SPY, the posterior estimate

of σ2 decreases substantially in the last regime which starts from December 2008 till the end of the

sample. In general, changes in the regression parameters, β, are less spectacular than changes in σ2.

For IBM and SPY, βd and βw change, while β0 and βJ basically remain constant across regimes. We

picture the change-point dates for BAC, IBM and SPY in Figure 1.1.

We do not obtain exact change-point dates or point estimates of βdt, βwt and βmt for the MHAR

specifications. Instead, we plot the structural break probabilities, κdt, κwt, κmt, along with estimates

of βdt, βwt and βmt, for t = 1, ..., T , using the mean of the Gibbs sampler draws. Results for BAC,

IBM and SPY are given in Figures 1.2 to 1.7. Overall, looking at the break probabilities and estimates

Page 22: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

13

of the regression coefficients clearly confirm structural breaks occurring during the fall of 2008 and

the beginning of 2009. The behavior of K for BAC and JPM, which are the only two bank series is

very similar. For the remaining series, the behavior of K is very similar to IBM and SPY. On the

other hand, the magnitude and the direction of the changes in β tend to differ for different assets.

Therefore, we cannot confirm any general trends in the changes in β for the series that we consider.

When we allow for structural breaks in the volatility of realized volatility, the behavior of K shows

that the regression coefficients tend to behave more smoothly with more frequent and smaller changes

in the beginning of the sample for BAC and JPM. Furthermore, we clearly confirm an almost linear

increase in the level of σ2 since the beginning of 2007 for all of the series. This increase is then followed

by a subsequent gradual fall from the beginning of 2009 till the end of the sample.

Table 1.4.: Parameter estimates, CPHAR model

Posterior mean (standard deviation)Regime β0 βJ βd βw βm σ2

BAC1 0.234 -0.190 0.395 0.198 0.084 0.009

(0.011) (0.137) (0.046) (0.070) (0.081) (0.001)2 0.273 -0.467 0.492 0.297 0.057 0.038

(0.012) (0.210) (0.084) (0.130) (0.120) (0.004)3 0.312 -0.382 0.652 0.265 -0.030 0.174

(0.011) (0.155) (0.075) (0.101) (0.078) (0.017)4 0.315 -0.200 0.433 0.321 0.189 1.032

(0.026) (0.187) (0.066) (0.101) (0.087) (0.079)

IBM1 0.190 -0.313 0.208 0.498 -0.063 0.011

(0.006) (0.112) (0.045) (0.069) (0.079) (0.001)2 0.230 -0.606 0.438 0.195 0.131 0.055

(0.006) (0.147) (0.089) (0.157) (0.156) (0.006)3 0.210 -0.029 0.319 0.386 -0.020 0.018

(0.006) (0.244) (0.115) (0.198) (0.173) (0.002)4 0.3469 -0.263 0.224 0.418 0.008 0.339

(0.056) (0.328) (0.139) (0.242) (0.189) (0.062)5 0.217 -0.320 0.349 0.489 0.081 0.021

(0.003) (0.181) (0.079) (0.113) (0.078) (0.001)

SPY1 0.137 -0.229 0.231 0.480 0.049 0.004

(0.003) (0.135) (0.069) (0.107) (0.074) (0.001)2 0.155 -0.298 0.553 0.219 0.062 0.025

(0.004) (0.215) (0.083) (0.123) (0.132) (0.004)3 0.149 -0.001 0.497 0.309 -0.103 0.007

(0.003) (0.223) (0.107) (0.144) (0.104) (0.001)4 0.260 -0.172 0.279 0.415 -0.045 0.309

(0.053) (0.334) (0.132) (0.215) (0.180) (0.055)5 0.151 -0.197 0.340 0.484 0.152 0.015

(0.003) (0.151) (0.081) (0.098) (0.094) (0.001)

This table reports posterior means and standard deviations (indicated inside the

parentheses) of model parameters from the preferred CPHAR model. Our sample

period starts from February 2, 2004 to December 31, 2009.

Page 23: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

14

Figure 1.1.: Change-point dates, CPHAR model

04 05 06 07 08 09 100

2

4BAC

04 05 06 07 08 09 100

0.5

1Change−point dates

04 05 06 07 08 09 100

0.7

1.4IBM

04 05 06 07 08 09 100

0.5

1Change−point dates

04 05 06 07 08 09 100

0.7

1.4SPY

04 05 06 07 08 09 100

0.5

1Change−point dates

Left: annual realized volatility. Right: change-point dates.

Page 24: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

15

Figure 1.2.: Posterior estimates, MHAR model, BAC

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 100.34

0.44

0.54

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.14

0.22

0.3

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 100.15

0.25

0.35

βmt

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.

Page 25: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

16

Figure 1.3.: Posterior estimates, MHAR model, IBM

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 100.25

0.3

0.35

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.42

0.49

0.56

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 100.08

0.13

0.18

βmt

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.

Page 26: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

17

Figure 1.4.: Posterior estimates, MHAR model, SPY

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 10−0.1

0.1

0.3

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.1

0.4

0.7

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 10−0.1

0.1

0.3

βmt

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt and βmt.

Page 27: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

18

Figure 1.5.: Posterior estimates, MHAR-SV model, BAC

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 100.35

0.45

0.55

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.15

0.22

0.29

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 100.1

0.19

0.28

βmt

04 05 06 07 08 09 100

0.5

1

κtSV

04 05 06 07 08 09 100

0.4

0.8

σ2t

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2

t .

Page 28: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

19

Figure 1.6.: Posterior estimates, MHAR-SV model, IBM

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 100.22

0.27

0.32

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.43

0.48

0.53

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 100.1

0.14

0.18

βmt

04 05 06 07 08 09 100

0.5

1

κtSV

04 05 06 07 08 09 100

0.04

0.08

σ2t

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2

t .

Page 29: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

20

Figure 1.7.: Posterior estimates, MHAR-SV model, SPY

04 05 06 07 08 09 100

0.5

1

κdt

04 05 06 07 08 09 100.25

0.3

0.35

βdt

04 05 06 07 08 09 100

0.5

1

κwt

04 05 06 07 08 09 100.42

0.47

0.52

βwt

04 05 06 07 08 09 100

0.5

1

κmt

04 05 06 07 08 09 100.08

0.13

0.18

βmt

04 05 06 07 08 09 100

0.5

1

κtSV

04 05 06 07 08 09 100

0.04

0.08

σ2t

Left: posterior structural break probabilities. Right: posterior estimates of the regression coefficients, βdt, βwt, βmt, andthe time-varying conditional variance, σ2

t .

1.6.3. Forecasts

In this section, we briefly explain how we forecast with the different structural break models. There-

after, we present results for forecasting daily (h = 1), weekly (h = 5) and biweekly (h = 10) realized

volatility using the direct method of forecasting, see Marcellino et al. (2005), and Liu and Maheu

(2009)5. In general, we carry out a forecasting exercise for a specific out-of-sample period. This means

that we first estimate the models with an initial sample and forecast. We then add one data point,

estimate and forecast again, until we have consumed all the out-of-sample data. The following is a list

of the forecasting models used in this paper along with their acronyms

5Liu and Maheu (2009) take a very similar approach to forecasting realized volatility within a Bayesian model averagingcontext. Furthermore, they justify using the direct method of forecasting both with regards to forecasts based onpredictive likelihood as well as forecasts based on the predictive mean.

Page 30: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

21

1. M1: HAR: constant parameter HAR model.

2. M2: CPHAR: structural break HAR model using the specification of Chib (1998).

3. M3: MHAR: mixture innovation HAR model.

4. M4: MHAR-SV: MHAR model with structural breaks in the volatility of realized volatility.

5. M5: TVPHAR: random walk time-varying parameter HAR model. This specification is a stan-

dard time-varying parameter model where βit = βit−1 + ηit, for i = d,w,m. As before, we

estimate the intercept and the jump component as being constant coefficients. This specification

assumes typically small and gradual breaks in βt. Finally, notice that TVPHAR is a restricted

version of MHAR when κdt = κwt = κmt = 1, for t = 1, ..., T .

Tables 1.5 to 1.7 present results of our forecasting exercise for h = 1, h = 5 and h = 10, respectively.

For each series listed in Table 1.2, we choose the out-of-sample period from February 2, 2008 to

December 31, 2009, for a total of 483 observations. We first estimate the models using the initial

sample and forecast. Then, we add one data point, update and forecast again, until the end of the

out-of-sample data. This strategy works for HAR, MHAR, MHAR-SV and TVPHAR specifications

as we do not need to specify the number of structural breaks over the out-of-sample data.

In the context of forecasting with CPHAR, we follow Bauwens et al. (2011). Thus, we perform the

following: for the first out-of-sample observation at time t, we calculate the marginal likelihood for

various number of change points, (0, ...,K∗), where K∗ ∈ 1, 2, ... using Yt−1. Thereafter, we choose

the optimal change-point number, K∗t−1, using the specification with the highest ML. We calculate the

predictive likelihood, p (yt | Yt−1,M2), and the predictive mean, E [yt | Yt−1,M2], using the parameters

associated with specification K∗t−1. Thereafter, we add one data point, calculate marginal likelihoods

for(0, ...,K∗t−1 + 1

)change points, choose the optimal change-point number, K∗t , repeat the above

forecasting procedure to obtain p (yt+1 | Yt,M2) and E [yt+1 | Yt,M2]. Thus, we allow the optimal

change-point number to vary over time as the number of regimes can increase as time goes by.

We report the logarithm of PBF for M2, ...,M5 versus M1 and the ratio of RMSE for M2, ...,M5

over M1 for the out-of-sample data. Overall, we see that structural break specifications outperform

the HAR model both in terms of the predictive likelihood and point forecasts, especially as forecast

horizon lengthens. For instance, for SPY, the log (PBF) in favor of MHAR over HAR is 50.54 for

h = 1, 60.64 for h = 5 and 72.54 for h = 10 (64.58 for h = 1, 62.84 for h = 5 and 74.50 for h = 10 for

CPHAR). With regards to point forecasts, we find that MHAR and CPHAR on average outperform

the HAR model by 5% to 10% for h = 1, 10% to 25% for h = 5 and h = 10. The TVPHAR model

outperforms the HAR model in terms of the predictive likelihood, regardless of forecast horizon. On

the other hand, TVPHAR performs slightly worse than HAR in terms of point forecasts for h = 1.

We also compare results between models that allow for structural breaks. Here, we get very inter-

esting results. For example, when we compare CPHAR with MHAR, we find that for some series and

forecast horizons the CPHAR model performs better, while for other series the MHAR model per-

forms better. For instance, we find that MHAR outperforms CPHAR, regardless of forecast horizon

or criterion for BAC. Compared to CPHAR, the log (PBF) in favor of MHAR is 5.29 for h = 1, 15.08

for h = 5, 29.61 and h = 10. In terms of point forecasts, compared to CPHAR, we see a reduction

of 20% for h = 1, 24% for h = 5 and 26% for h = 10 in RMSE when we use the MHAR model. On

Page 31: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

22

the other hand, we see that CPHAR outperforms MHAR by 8.14 for h = 1, 1.87 for h = 5 and 4.98

for h = 10 in terms of log (PBF) for XOM. Furthermore, these models tend to perform better than

TVPHAR in terms of the predictive likelihood, regardless of forecast horizon. However, TVPHAR

performs relatively better, especially in terms of point forecasts for h = 5 and h = 10. The difference

between the predictive likelihood and the predictive mean is that the predictive likelihood criterion

takes into account the whole shape of the predictive density, whereas the predictive mean does not.

Finally, the MHAR-SV model tends to dominate its homoskedastic counterpart as well as the

CPHAR model. Density forecasts show the most improvements, while point forecasts often show

only modest gains over MHAR or CPHAR. For example, MHAR-SV improves upon MHAR with

increases in the log(PL) of 7.17 for h = 1, 1.07 for h = 5 and 4.01 for h = 10 for SPY.

Table 1.5.: Out-of-Sample forecast results, RVt,h, h = 1

log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR

BA 20.09 17.52 19.23 17.51 0.97 0.98 0.97 1.03BAC 90.14 95.44 101.27 92.62 0.97 0.78 0.77 0.83CAT 26.36 22.76 26.23 17.90 0.96 1.00 0.98 1.01GE 81.62 75.51 83.02 74.00 0.96 0.89 0.88 0.91IBM 36.77 26.88 32.38 24.63 0.94 0.97 0.96 1.01JNJ 28.68 36.35 41.68 35.83 0.95 0.86 0.89 1.01JPM 56.06 62.00 65.36 56.68 0.97 0.86 0.85 0.90PEP 42.01 47.43 52.68 49.19 0.96 0.88 0.87 0.94SPY 64.58 50.54 57.72 47.93 0.93 0.91 0.90 1.00

WMT 19.12 27.06 35.62 27.31 0.95 0.86 0.86 0.95XOM 47.61 39.47 45.67 37.73 0.93 0.89 0.88 0.99

This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the

out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.

The out-of-sample period is from February 2, 2008 to December 31, 2009.

Table 1.6.: Out-of-Sample forecast results, RVt,h, h = 5

log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR

BA 42.80 43.25 41.97 38.92 0.86 0.88 0.85 0.89BAC 110.81 125.89 126.28 110.12 0.91 0.69 0.67 0.81CAT 47.09 50.31 48.67 46.19 0.85 0.87 0.86 0.87GE 114.89 117.22 114.02 105.46 0.81 0.80 0.77 0.96IBM 43.69 44.08 49.02 38.29 0.89 0.92 0.91 0.85JNJ 58.30 53.73 64.30 50.34 0.87 0.89 0.84 1.02JPM 91.58 96.45 99.46 86.80 0.79 0.75 0.71 0.91PEP 65.13 63.61 70.49 58.66 0.90 0.85 0.84 1.04SPY 62.84 60.64 61.71 59.08 0.88 0.93 0.87 0.93

WMT 49.99 47.73 50.56 43.12 0.87 0.84 0.82 1.07XOM 55.79 53.92 59.26 49.13 0.90 0.85 0.80 0.89

This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the

out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.

The out-of-sample period is from February 2, 2008 to December 31, 2009.

Page 32: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

23

Table 1.7.: Out-of-Sample forecast results, RVt,h, h = 10

log(PBF) RMSESeries CPHAR MHAR MHAR-SV TVPHAR CPHAR MHAR MHAR-SV TVPHAR

BA 43.70 55.75 55.53 46.81 0.80 0.80 0.81 0.75BAC 120.79 150.40 144.17 127.49 0.85 0.63 0.64 0.72CAT 64.62 72.21 64.59 67.34 0.77 0.79 0.77 0.78GE 128.75 135.01 124.37 115.61 0.71 0.70 0.72 0.85IBM 55.58 57.53 57.30 54.22 0.83 0.85 0.82 0.73JNJ 66.98 68.59 79.90 62.66 0.82 0.82 0.79 1.10JPM 96.42 107.35 107.19 84.41 0.72 0.68 0.68 0.80PEP 72.52 72.06 78.91 65.56 0.86 0.83 0.80 1.09SPY 74.50 72.54 76.54 68.31 0.81 0.86 0.84 0.79

WMT 56.41 58.51 64.77 51.20 0.88 0.82 0.81 0.97XOM 68.19 63.20 69.12 60.65 0.86 0.77 0.72 0.80

This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the

out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.

The out-of-sample period is from February 2, 2008 to December 31, 2009.

1.7. Prior Sensitivity Analysis

1.7.1. CPHAR

In this section, sensitivity of the results to prior specification is evaluated by investigating alternative

prior hyperparameter values on the transition probabilities, pj ∼ Beta (a0, b0), j = 1, ...,m−1, keeping

the prior values of the other parameters the same as in Section 1.6.1. This parameter is very important

because it controls the duration of each regime. Models with different hyperparameter values on βj

and σ2j are also estimated. Results are very similar to those given in Tables 1.5 to 1.7.

We repeat the forecasting exercise of Section 1.6.3 using the SPY data, while experimenting with

different values of a0 and b0. Results are reported in Table 1.8. For instance, the first alternative

prior is pj ∼ Beta (0.5, 0.5), which is relatively flat. The last alternative prior is pj ∼ Beta (100, 0.1),

which is relatively very tight. For instance, for pj ∼ Beta (100, 0.1), we assume a priori that the

expected duration of each regime is about 1000 days before we see the data. As before, results over-

whelmingly suggest existence of structural breaks over the out-of-sample data as log (PBFM2,M1) > 0.

Furthermore, the choice of prior hyperparameter values on P is of limited importance.

Table 1.8.: Prior sensitivity analysis, CPHAR model, SPY

log(PBF) RMSEh = 1 h = 5 h = 10 h = 1 h = 5 h = 10

pj ∼ Beta (0.1, 0.1) 64.28 62.71 72.60 0.93 0.88 0.81pj ∼ Beta (8, 0.1) 64.13 62.72 73.27 0.94 0.88 0.82pj ∼ Beta (10, 0.1) 64.06 65.00 72.60 0.94 0.88 0.81pj ∼ Beta (8, 2) 63.70 63.89 72.61 0.94 0.88 0.81pj ∼ Beta (20, 2) 64.57 65.10 72.60 0.93 0.88 0.81pj ∼ Beta (100, 0.1) 64.56 65.16 74.70 0.93 0.88 0.81

This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the

out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.

The results are for the CHAR model considering six different prior hyperparameter values on P . The out-of-sample

period is from February 2, 2008 to December 31, 2009.

Page 33: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

1. A Mixture Innovation HAR Model for Structural Breaks and Long Memory

24

1.7.2. MHAR

We consider sensitivity of the results to prior hyperparameter values on πi, where πi ∼ Beta (ai0, bi0),

for i = d,w,m. The forecasting exercise of Section 1.6.3 is repeated using the SPY data. We experi-

ment with different prior hyperparameter values on πi, keeping the other hyperparameter values the

same as in Section 1.6.1. Results are presented in Table 1.9. It can be seen that these different prior

hyperparameter values are overall yielding fairly similar results. However, for πi ∼ Beta (2, 8), we get

better results in terms of log (PBF) and RMSE for h = 10.

Table 1.9.: Prior sensitivity analysis, MHAR model, SPY

log(PBF) RMSEh = 1 h = 5 h = 10 h = 1 h = 5 h = 10

πi ∼ Beta (0.5, 27) 50.58 60.87 73.80 0.88 0.93 0.86πi ∼ Beta (0.5, 47) 50.54 61.24 73.22 0.89 0.93 0.86πi ∼ Beta (0.5, 8) 50.04 61.34 73.96 0.90 0.93 0.86πi ∼ Beta (2, 8) 49.12 61.79 71.69 0.90 0.92 0.82πi ∼ Beta (0.5, 100) 50.45 60.23 73.35 0.91 0.93 0.87πi ∼ Beta (0.5, 1000) 50.54 60.96 72.12 0.92 0.93 0.87

This table reports the log-predictive Bayes factor, log (PBF), of the model of interest versus the HAR model, and the

out-of-sample root mean squared error, RMSE, for the predictive mean of the model of interest over the HAR model.

The results are for the MHAR model considering different prior hyperparameter values on π. The out-of-sample period

is from February 2, 2008 to December 31, 2009.

1.8. Conclusion

In this paper we compare different forecasting procedures which allow for structural breaks in the

model parameters using realized volatility data. Our set of forecasting models is divided into three

groups: constant parameter model, HAR, one which formally specifies the number of structural breaks,

CPHAR, and those which determine the nature of structural changes in the parameters using MIA

and random walk specifications.

The empirical application provides some interesting results. First, we add to the literature estab-

lishing existence of structural breaks in realized volatility data. Second, our results also show the

importance of using a forecasting method that allows for some sort of structural changes in the model

parameters. Furthermore, perhaps as expected, we cannot establish that there is one single forecasting

method that can be recommended universally. On the contrary, we find that for some series the MHAR

model outperforms the CPHAR model, whereas for other series the CPHAR model works better.

Finally, when we account for structural breaks in the volatility of realized volatility in the MHAR

model, we find that this specification tends to dominate its homoskedastic counterpart as well as

the CPHAR model. Density forecasts show the most improvements, while point forecasts show only

modest gains compared to competing specifications.

Page 34: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in

Realized Volatility: An Irreversible Markov

Switching Approach

Author: Nima Nonejad

Abstract: We propose an ARFIMA (autoregressive fractionally integrated moving average) model

that is able to capture long memory and incorporate structural breaks in the model parameters. We

model structural breaks through irreversible Markov switching or so-called change-point dynamics.

Monte Carlo simulations demonstrate that our approach is effective in estimating the model param-

eters, identifying and dating structural breaks. Applied to daily S&P 500 data, we find evidence of

four structural breaks. The evidence of structural breaks is robust to different specifications including

a GARCH specification for the conditional volatility of realized volatility.

Keywords: change points, Gibbs sampling, long memory, structural breaks

(JEL: C11, C22, C52, G10)

Page 35: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

26

2.1. Introduction

Measuring and modeling volatility is a very important issue in many pricing and risk management

problems. Recently, a new observable measure of volatility, called realized volatility (RV) has been

proposed. Realized volatility uses high-frequency data information and has been shown to be an

accurate estimate of expost volatility. RV is constructed from the sum of intraday squared returns,

and converges to quadratic variation for a broad class of continuous time models. Many empirical

features of RV are very well-documented in recent literature. For instance, a detailed review has been

provided by McAleer and Medeiros (2008). One of the most well-known and relevant properties of

RV is the strong serial dependence, see Andersen et al. (2001) and Andersen et al. (2007). For

this reason, long-memory models such as the ARFIMA (autoregressive fractionally integrated moving

average) model have been applied to RV data. Other strategies to model serial dependencies for

RV have also been proposed. For instance, Barndorff-Nielsen and Shephard (2002a) model serial

dependencies for RV through a superposition of ARMA(1,1) processes, and Corsi (2009) has introduced

the Heterogeneous Autoregressive model (HAR) given by a combination of volatilities measured over

different time horizons.

In this paper we provide a Bayesian analysis of structural breaks in daily S&P 500 realized volatility.

We propose an ARFIMA model in which the level, persistence and volatility of realized volatility

parameters are subject to structural breaks. The basis of the analysis is on an ARFIMA model

which builds on the Hidden Markov Chain (HMC) formulation of the multiple change-point model

proposed by Chib (1998). Breaks are captured through an integer-valued state variable, st, that tracks

the regime from which a particular observation, yt, is drawn. st is modeled as a discrete first order

Markov process with a constrained transition probability matrix. At each point in time, st can either

remain in the current state or jump to the next state1.

We investigate specifications which allow for all parameters, as well as only for a subset of parameters

to change due to structural breaks. This allows one to isolate the impact of structural breaks on

individual parameters and use all data in estimation of parameters that are not affected by structural

breaks. Each change-point ARFIMA model is estimated conditional on 0, 1, ...,m breaks occurring.

For each of these specifications, the marginal likelihood (ML) and the deviance information criterion

(DIC) are calculated. They are then used to determine the number of change points. Specifically, we

can compare marginal likelihoods using Bayes factors, and use differences in DIC between different

specifications to compare models or to determine the number of structural breaks. It is important to

note that DIC can be considered as a compelling alternative to ML. Furthermore, calculation of DIC

in our MCMC scheme is trivial as the likelihood with st, t = 1, ..., T , integrated out is easily obtainable

using the algorithm of Chan (2013).

Our contributions in this paper are two-fold. First, we provide an efficient Markov chain Monte

Carlo sampling scheme to draw st, t = 1, ..., T , and the parameters within each regime, θk, k = 1, ...,m,

from their respective conditional posteriors. Furthermore, instead of using traditional approaches to

evaluate the likelihood function such as Chan and Palma (1998), we build upon previous works on

precision-based algorithms as in Chan and Jeliazkov (2009) and Chan (2013), using a direct approach

to evaluate the likelihood function. We believe that incorporating the precision-based algorithm of

1Many papers have studied testing for structural breaks, or directly modeling parameter change. See for instance,Andreou and Ghysels (2002), Ray and Tsay (2002) and Engle and Rangel (2005).

Page 36: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

27

Chan and Jeliazkov (2009) and Chan (2013) along with the change-point specification of Chib (1998)

within the ARFIMA setting is the most important contribution that we provide. Furthermore, we also

conduct an extensive Monte Carlo experiment to investigate if our methods work well in identifying

data generating parameters, true structural break dates and the correct number of structural breaks.

With regards to the last point, we compare the ability of ML and DIC to detect the correct number

of structural breaks. Our simulations, based on empirically reasonable scenarios, show that ML and

DIC perform very well in identifying the true number of structural breaks. The higher the number

of parameters that are affected by a break, the more likely it is that structural breaks are correctly

identified. Finally, it becomes more difficult to identify one or more breaks when only the persistence

parameter, d, changes.

Empirical results for S&P 500 volatility provide strong evidence in favor of four structural breaks

based on data from January 2nd, 2000 to December 31st, 2009, for a total of 2515 trading days. The

effect of structural breaks is mainly confined to the conditional mean and variance with weaker evidence

that the persistence parameter is also subject to structural breaks. Finally, in order to investigate if

the existence of breaks is spurious due to neglected conditional variance dynamics, we also consider

breaks in an ARFIMA-GARCH model. Again, the evidence is strong in favor of structural breaks,

and the estimated change-point dates are close to those of the change-point ARFIMA model.

The structure of this paper is as follows. In Section 2.2 we present the change-point ARFIMA

model. Bayesian estimation techniques and model comparison methods are presented in Section 2.3.

Section 2.4 presents the Monte Carlo results. We briefly review the theory behind realized volatility

in Section 2.5 . Section 2.6 is the application to S&P 500 volatility while Section 2.7 concludes. An

appendix explains how to evaluate the likelihood using the algorithm of Chan (2013).

2.2. Change-point ARFIMA Model

Consider the following ARFIMA model

yt = µ+ (1− L)−d εt, εt ∼ N(0, σ2

)(2.2.1)

for t = 1, ..., T . yt is the actual observation, L is the lag operator such that Lεt = εt−1, and d

determines the long-memory property of yt. The fractional difference operator, (1− L)−d, in (2.2.1)

is defined as (1− L)−d = Σ∞j=0Γ (j + d) (Γ (j + 1) Γ (d))−1 Lj , where Γ (.) is the Gamma function.

Equation (2.2.1) is a generalization of the moving average (MA) model to non-integer values of d.

Specifically, if d > 0, the process is said to have long memory since the autocorrelations die out at

a hyperbolic rate. For 0 < d < 0.5, (2.2.1) is a stationary long-memory process with non-summable

autocorrelation functions. For d = 0, we have that yt = µ+ εt.

There are many ways to estimate (2.2.1), see Beran (1994) and Robinson (2003). In this paper we

focus on MCMC methods (in particular Gibbs sampling) for inference. We rely on the main idea of

Chan and Palma (1998). Specifically, Chan and Palma (1998) consider an approximation of (2.2.1)

based on a truncation lag of order M . Thereafter, the likelihood is computed using the Kalman

filter. However, instead of using the Kalman filter, we take a different approach to evaluate the

likelihood function. Our approach extends previous works on precision-based algorithms, see Chan

and Jeliazkov (2009) and Chan (2013). The aforementioned method exploits the special structure of

Page 37: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

28

(2.2.1), particularly that the covariance matrix for the joint distribution of YT = (y1, ..., yT )′

is sparse,

i.e. it contains only few non-zero elements. By exploiting the sparse structure of the covariance matrix

of YT , we are able to develop an easy and fast method for evaluating the likelihood function.

Conditional on the model parameters, θ =(µ, d, σ2

)′, and M , we can write (2.2.1) as YT = u+Hε,

where u = µι, ι is a T × 1 vector of 1s, ε = (ε1, ..., εT )′∼ N (0, SY ) and SYT = σ2IT . H is a T × T

lower triangular matrix with ones on the main diagonal and

H =

1 0 0 0 0 0 · · · 0

π1 1 0 0 0 0 · · · 0

π2 π1 1 0 0 0 · · · 0...

... π1 1 0 0 · · · 0

πM πM−1... π1 1 0 · · · 0

0 πM πM−1... π1 1 · · · 0

......

. . .. . .

.... . .

. . ....

0 0 · · · πM πM−1 · · · π1 1

,

where πj = Γ (j + d) /Γ (j + 1) Γ (d). Using the algorithm of Chan (2013), it is shown in the Appendix

that p (YT | θ,M) has a closed form solution given as

log p (YT | θ,M) = −T2

log (2π)− T

2log(σ2)− 1

2(YT − u)

′Ω−1YT

(YT − u) , (2.2.2)

where ΩYT = HSYTH′and p (.) denotes the density of all random quantities. Allowing for structural

breaks in the parameters of (2.2.1) is straightforward using the change-point structure proposed by

Chib (1998). This specification uses a hidden Markov model with a restricted transition matrix to

model change points. A test for the number of breaks is then a test of the dimension of the hidden

Markov chain. θ and the unobserved states are jointly estimated conditional on a fixed number of

change points. ML and DIC are then used to compare evidence for the number of structural breaks.

In the following, we review a Gibbs sampling approach for the model in which all the elements in θ

are subject to structural breaks. Thereafter, we discuss restricted models in which some parameters

are restricted to be constant across structural breaks. Specifically, the restricted specifications allow

us to determine which parameters of the model are most likely affected by a structural break.

Assume that there are m − 1, m ∈ 1, 2, ... change points at unknown times, τ1, τ2, ..., τm−1.

Separated by those change points, there are m different phases. The density of yt depends on θk =(µk, dk, σ

2k

)′, k = 1, 2, ...m, whose value changes at the change points, τ1, τ2, ..., τm−1, and remains

constant otherwise

θt =

θ1 if t < τ1

θ2 if τ1 ≤ t < τ2

......

...

θm−1 if τm−2 ≤ t < τm−1

θm if τm−1 ≤ t

. (2.2.3)

Let S = (s1, ..., sT )′

where st = k indicates that yt is from regime k. The one-step ahead transition

Page 38: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

29

probability matrix for st is given as

P =

p11 p12 0 · · · 0

0 p22 p23 · · · 0...

......

......

...... 0 pm−1,m−1 pm−1,m

0 0 · · · 0 1

, (2.2.4)

where plk = Pr (st = k | st−1 = l) with k = l or k = l + 1 is the probability of moving from regime l

at time t − 1 to regime k at time t. Equation (2.2.4) ensures that given st = k at time t, the next

period, t+ 1, st + 1 remains in the same state or jumps to the next state. For instance, given st = k,

one has st+1 = k or st+1 = k + 1 with pk,k + pk,k+1 = 1. Once the last regime is reached, one stays

there forever, that is pm,m = 1. This structure enforces the ordering (2.2.3) on the change points2.

2.3. Bayesian Estimation

To conduct model estimation, we jointly estimate the long-memory dynamics, P and S. However,

model estimation is not straightforward. First, S is not observable. Second, there is no standard way

to draw µk or dk from its conditional posterior density. Although the joint posterior density of the

model, p (P, θ,M, S | YT ), is not a well-known density, samples from it can be obtained using Gibbs

sampling and Metropolis-Hastings (M-H).

The parameters are divided into four blocks: S, θ = θkmk=1, M and P . The Gibbs sampler requires

the following steps: first, choose starting values for P , θ and M , i.e. P (0), θ(0),M (0) and set i = 1.

Then iterate from

1. S(i) | P (i−1), θ(i−1),M (i−1), YT .

2. θ(i)k

mk=1 |M (i−1), S(i), YT .

3. M (i) | θ(i), S(i), YT .

4. P (i) | S(i).

5. Set i = i+ 1 and goto the first step.

Notice that in step 2 of iteration i of the Gibbs sampler, each element of θk is updated one-at-a-time.

After dropping a set of burn-in samples, the remaining draws are collected for inference. For N large

enough, any function of interest can be consistently estimated. For instance,

f (θ) =1

N

N∑i=1

f(θ(i))

is a consistent estimate of E [f (θ) | YT ], the posterior mean of f (θ). We run the chain from different

starting values and compute convergence diagnostics such as Geweke (1992) to ensure that the draws

2Ray and Tsay (2002) exploit the state-space representation of (2.2.1) to derive inference for long-memory processeswith random level shifts. Raggi and Bordignon (2010) consider a two-state Markov switching ARFIMA model, i.e.m = 2, p21 6= 0, 0 < p22 < 1, and use a permutation scheme to identify the two states.

Page 39: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

30

have converged to p (P, θ,M, S | YT ). Below, more details are provided on each step of the Gibbs

sampling procedure.

Step 1: Simulation of S | P, θ,M, YT . Chib (1998) shows that a joint draw of S can be achieved

using

p (S | P, θ,M, YT ) = p (sT | P, θ,M, YT )T−1∏t=1

p (st | st+1, P, θ,M, Yt) (2.3.1)

in which one samples sequentially from each density on the right-hand side of (2.3.1) beginning with

p (sT | P, θ,M, YT ), and then p (st | st+1, P, θ,M, Yt), t = T − 1, ..., 1. At each step, one conditions on

the previously drawn state, st+1, until a full draw of S is obtained. The individual densities in (2.3.1)

are obtained based on the following steps:

(a) Initialization: At t = 1, set p (s1 = 1 | P, θ,M, Y1) = 1.

(b) Compute the Hamilton (1989) filter, p (st = k | P, θ,M, Yt). This involves a prediction and an

update step in which one iterates on the following from t = 2, ..., T ,

p (st = k | P, θ,M, Yt−1) =

k∑l=k−1

p (st−1 = l | P, θ,M, Yt−1) plk, k = 1, ...,m (2.3.2)

p (st = k | P, θ,M, Yt) =p (st = k | P, θ,M, Yt−1) p (yt | θ,M, Yt−1, st = k)∑ml=1 p (st = l | P, θ,M, Yt−1) p (yt | θ,M, Yt−1, st = l)

,

k = 1, ...,m.

The last equation is obtained from Bayes’ rule. Note that in (2.3.2) the summation is only from k− 1

to k, due to the restricted nature of the transition matrix. Furthermore, p (yt | θ,M, Yt−1, st = k) has

a closed form solution, see (2.2.2).

(c) Finally, Chib (1998) shows that the individual densities in (2.3.1) are

p (st | st+1, P, θ,M, Yt) ∝ p (st | P, θ,M, Yt) p (st+1 | st, P ) .

Thus, given sT = m, st is drawn backwards over t = T − 1, T − 2, ..., 2 as

st | st+1, P, θ,M, Yt =

st+1 with probability ct

st+1 − 1 with probability 1− ct,

where

ct =p (st = k | P, θ,M, Yt) p (st+1 = k | st = k, P )∑kl=k−1 p (st = l | P, θ,M, Yt) p (st+1 = k | st = l, P )

.

Finally, note that p (s1 = 1 | s2, P, θ,M, Y1) = 1.

Step 2: Simulation of θk |M,S, YT . For each regime, the conditional posterior of θk depends only on

information in regime k. Furthermore, compared to σ2k, sampling µk and dk is more complicated since

their conditional posteriors do not have closed form. Therefore, the Metropolis-Hastings algorithm is

used. Let Yk = yt : st = k denote the observations in regime k. We sample µk and dk, k = 1, ...,m

one-at-a-time. For example, µk is sampled in the following way:

Page 40: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

31

1. Sample a candidate, µ∗k, from a random walk proposal, q(µ∗k | µ

(i−1)k

)∼ N

(i−1)k ,Σk

), where

Σk is chosen by the researcher in a manner to ensure a sufficient acceptance rate. We follow

Koop (2003, page 98) and adjust Σk to get an acceptance rate roughly around 30 to 40%. We

do this by experimenting with different values of Σk, until we find one which yields a reasonable

acceptance rate probability.

2. Define the acceptance probability of µ∗k as

aMH

(µ∗k, µ

(i−1)k

)= min

1,p(µ∗k | d

(i−1)k , σ

2(i−1)k ,M (i−1), Y

(i)k

)q(µ

(i−1)k | µ∗k

)p(µ

(i−1)k | d(i−1)

k , σ2(i−1)k ,M (i−1), Y

(i)k

)q(µ∗k | µ

(i−1)k

) . (2.3.3)

3. Draw u from U ∼ (0, 1). If u ≤ aMH

(µ∗k, µ

(i−1)k

), set µ

(i)k = µ∗k, else set µ

(i)k = µ

(i−1)k .

Finally, σ2k | µk, dk,M, Yk ∼ IG (νk/2, lk/2), where IG (./2, ./2) stands for the inverse Gamma density,

see Kim and Nelson (1999), vk = Tk + v0, lk = ε′kεk + l0, Tk is the number of observations in regime k

and εk = εt : st = k. v0 and l0 are the prior hyperparameter values.

Step 3: Simulation of M | θ, S, YT . In order to sample M from its conditional posterior, we use

the same method as in Raggi and Bordignon (2012). A truncation parameter, M∗, is proposed from a

discretized Laplace proposal, q(M∗ |M (i−1)

)= 1/2λ exp

(−λ∣∣M∗ −M (i−1)

∣∣), where λ = 0.1 in order

to obtain small moves. The Metropolis-Hastings acceptance probability is given as

aMH

(M∗,M (i−1)

)= min

1,

p(M∗ | θ(i), S(i), YT

)q(M (i−1) |M∗

)p(M (i−1) | θ(i), S(i), YT

)q(M∗ |M (i−1)

) .Step 4: Simulation of P | S. Assume that pkk ∼ Beta (a0, b0). The conditional posterior for each

diagonal component of P is then pkk | S ∼ Beta (a0 + nkk, b0 + 1), k = 1, ...,m − 1, where nkk is the

number of one-step transitions from state k to state k in a sequence of S.

2.3.1. Breaks in µ and d

Suppose that only µ and d are subject to structural breaks. Thus, we have µk and dk for k =

1, ...,m, while the conditional variance, σ2, is modeled as non time-varying parameter. Modeling this

specification is straightforward because as before, we can use the conditioning properties of the Gibbs

sampler.

Specifically, in order to sample µ(i)k and d

(i)k for k = 1, ...,m, we use S(i), σ2(i−1), M (i−1), Y

(i)k =

yt : s(i)t = k

and perform M-H to obtain µ

(i)k and d

(i)k using (2.3.3). Conditional on S(i), µ

(i)1 , ..., µ

(i)m ,

d(i)1 , ..., d

(i)m , M (i−1) and YT , we then draw σ2(i) from the inverse Gamma density. The remaining

parameters, M (i) and P (i) are also sampled conditional on S(i), µ(i)k , d

(i)k , k = 1, ...,m, σ2(i), YT using

Step 3 and Step 4 from the previous section.

2.3.2. Only breaks in σ2

Now suppose only σ2 changes between regimes, while µ and d are constant. In this case, we draw σ2(i)k

from the inverse Gamma density, IG (νk/2, lk/2) for k = 1, ...m, using S(i) from Step 1, µ(i−1), d(i−1),

M (i−1) and YT . Thereafter, we stack the σ2(i)k s using st = k, and construct a vector of time-varying

Page 41: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

32

conditional variances, σ2(i)t , t = 1, ..., T . To complete the cycle for θ(i), we sample µ(i) and then d(i)

conditional on σ2(i)1 , ..., σ

2(i)T ,M (i−1), YT .

Finally, note that we can consider breaks in the conditional variance with only partial breaks in µ

or d by combining the methods in the last two subsections.

2.3.3. Bayes factors and marginal likelihood computation

Let M denote a model parametrization in which some or all parameters are subject to breaks. The

marginal likelihood (ML) of model M is defined as

p (YT | M) =

ˆp (YT | P, θ,M,M) p (P, θ,M | M) dPdθdM. (2.3.4)

The marginal likelihood is a measure of the success the model has in accounting for the data after

parameter uncertainty has been integrated out over the prior, p (P, θ,M | M). p (YT | P, θ,M,M) is

the likelihood function with S integrated out. It is calculated as

log p (YT | P, θ,M,M) =

T∑t=1

log p (yt | P, θ,M, Yt−1,M) , (2.3.5)

where

p (yt | P, θ,M, Yt−1,M) =

m∑k=1

p (yt | θ,M, Yt−1, st = k,M) (2.3.6)

p (st = k | P, θ,M, Yt−1,M) .

The last term on the right-hand side of (2.3.6) is computed from (2.3.2). In the following steps, the

model index,M, is suppressed for conciseness. As we shall see in the next sections, (2.3.5) is essential

with regards to marginal likelihood (ML) and DIC computation.

In order to compute ML, we rely on the method of Gelfand and Dey (1994), henceforth, G-D, see

Geweke (2005), Liu and Maheu (2008)3. In general, the G-D method is based on the following quantity

1

N

N∑i=1

g(θ(i))

p(YT | θ(i),M (i)

)p(θ(i),M (i)

) → p (YT )−1 as N →∞. (2.3.7)

It applies to any posterior simulator, no matter what algorithm is used. The prior, p(θ(i),M (i)

),

can be evaluated directly and p(YT | θ(i),M (i)

)is calculated by substituting θ(i) into the likelihood

function, (2.3.5) and (2.3.6). Gelfand and Dey (1994) show that if g(θ(i))

is thin-tailed relative to

p(YT | θ(i),M (i)

)p(θ(i),M (i)

), then (2.3.7) is bounded and the estimator is consistent. Following

Geweke (2005), the truncated Normal distribution, TN (θ∗,Σ∗), is used for g (θ), where θ∗ and Σ∗ are

the posterior sample moments calculated as θ∗ = N−1ΣNi=1θ

(i) and Σ∗ = N−1ΣNi=1

(θ(i) − θ∗

) (θ(i) − θ∗

)′whenever θ(i) is in the domain of the truncated Normal. The domain, Θ, is defined as

Θ =

θ :(θ(i) − θ∗

)′(Σ∗)−1

(θ(i) − θ∗

)≤ χ2

α (z)

,

3Ideally, we would prefer to compute ML using the method of Chib (1995). However, calculating ML using the methodis Chib (1995) is computationally very demanding as we sample, µk, dk, k = 1, ...,m using M-H.

Page 42: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

33

where z is the dimension of the parameter vector and χ2α (z) is the αth percentile of the Chi-squared

distribution with z degrees of freedom. In practice, 0.75, 0.95 and 0.99 are popular selections for α.

High values of α work best since then more draws will be included in estimating (2.3.7).

We estimate (2.3.7) using the aforementioned values of α. In general, we find that different values of

α lead overwhelmingly to very similar results, see Section 2.4.2. It was also suggested to compute the

marginal likelihood using the method of Sims et al. (2008). As pointed out in Sims et al. (2008), the

G-D method may not work for models with time-varying parameters as the posterior density tends to

be non-Gaussian. We also calculate ML for the empirical part using the method of Sims et al. (2008).

However, we do not find any significant qualitative changes compared to G-D. Therefore, we choose

to retain these values. Furthermore, Monte Carlo results clearly indicate that G-D correctly identifies

the true model in the presence of structural breaks.

Once we calculate the marginal likelihood for different specifications, we can compare models across

the number of regimes, as well as the type of breaks (restricted versus unrestricted) using Bayes factors.

The Bayes factor (BF) for modelMA versus modelMB is BFMAB= p (YT | MA) /p (YT | MB). BF

is the factor by which the data considers MA more probable than MB. Kass and Raftery (1995)

suggest interpreting the evidence forMA as: not worth more than a bare mention for 1 ≤ BFMAB< 3;

positive for 3 ≤ BFMAB< 20; strong for 20 ≤ BFMAB

< 150; and very strong for BFMAB≥ 150.

Equivalently, based on a log scale, log (BFMAB) > 0 is evidence in favor of MA versus MB.

2.4. Monte Carlo

In this section, a set of Monte Carlo simulations is conducted to investigate the ability of the change-

point ARFIMA model to detect the correct number of change points. The effect of different sample

sizes and the ability of the deviance information criterion (DIC) to detect the correct number of change

points are also considered. Specifically, we compare the performance of DIC with ML to find out if

DIC is just as capable of identifying the true model from which the data is generated as ML. We

do this because computing DIC for the change-point ARFIMA model is almost trivial, see Section

2.4.4. However, as pointed out by Spiegelhalter et al. (2002), we must be cautions against using ML

as a basis against which to assess DIC. ML addresses how well the prior has predicted the observed

data, whereas DIC addresses how well the posterior might predict future data generated by the same

parameters that give rise to the observed data, YT .

Table 2.1.: Change-point model specifications

Model index parameters that change from a break

M0 NoneM1 µM2 dM3 σ2

M4 µ, dM5 µ, σ2

M6 All parameters

This table labels the various change-point specifications. The first column

is the model index. The second column lists the parameters that change

due to structural breaks.

Page 43: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

34

2.4.1. Setting

The ARFIMA model based on equation (2.2.1), in which change points affect one or more model

parameters is considered. Table 2.1 lists all the specifications used in the simulations and the empirical

application4. Specifically, M0 is the simple ARFIMA model without any structural changes, M1 to

M5 are models in which different parameters change and M6 is the model in which all parameters

change from a structural break. For each model, a sample of T = 1000 observations is generated. The

true models considered include cases of no, one and two change points. In settings with change points,

the positions of the change points follow a Uniform distribution, U . For instance, when there is one

change point, the position of this change point follows U (0.25× T, 0.75× T ). When there are two

change points, the first one follows U (0.25ÖT, 0.40ÖT ) and the second follows U (0.60ÖT, 0.80× T ).

This setting allows us to account for the randomness of the change points as well as ensuring sufficient

observations in each regime to conduct estimation.

The parameter values of the data generating process (DGP) under different scenarios are listed in

Table 2.2. For example, for M1 only µ changes, while the other parameters remain constant. In

specifications M4 to M6, the time series properties change greatly and the change points should be

identified rather easily. To make our simulation empirically realistic, we select the parameter values

of the DGP to reflect periods with increasing and decreasing levels of µ, d and σ2.

We specify the priors as: dk ∼ N (0, 100) truncated such that 0 < dk < 0.5, µk ∼ N (0, 100) and

σ2k ∼ IG (4/2, 0.02/2), k = 1, ...,m. A suitable prior for M is the Poisson truncated distribution with

M ∈ Mmin, ...,Mmax, where in this paper Mmin = 10 and Mmax = 50, see Raggi and Bordignon

(2012). Finally, we assume that pkk ∼ Beta (8, 0.1), k = 1, ...,m − 1. In this setting, priors on dk,

µk, σ2k and M are very uninformative, while the prior on pkk favors infrequent structural breaks. We

conduct a prior sensitivity analysis with regards to the S&P 500 data and report the results in Section

2.6.45.

Table 2.2.: Parameter values for Monte Carlo simulations

Regime M0 M1 M2 M3 M4 M5 M6

1 1 1 1 1 1 1 12 µ 1 2 1 1 2 2 23 1 0.8 1 1 0.8 0.8 0.8

1 0.3 0.3 0.3 0.3 0.3 0.3 0.32 d 0.3 0.3 0.45 0.3 0.45 0.3 0.453 0.3 0.3 0.05 0.3 0.05 0.3 0.05

1 0.1 0.1 0.1 0.1 0.1 0.1 0.12 σ2 0.1 0.1 0.1 0.18 0.1 0.18 0.183 0.1 0.1 0.1 0.36 0.1 0.36 0.36

This table lists the parameter values for the Monte Carlo simulations. The first column

is the index of the regimes. The second column lists the parameters. The first row is the

model index. If there is one break then the DGP parameters are first from regime 1 and

then regime 2.

4We can also include the d, σ2 combination in which both d and σ2 change from a break. However, in order to maintaina small number of models and avoid high computational costs, we choose not include this specification.

5Overall, result are robust to different hyperparameters on pkk.

Page 44: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

35

Table 2.3.: Change-point identification by marginal likelihood

DGP frequency by ML# of CP 0 CP 1 CP 2 CP 3 CP

M1

µ

012

10000

0990

01

100

000

M2

d

012

8710

0860

0595

1385

M3

σ2

012

10000

0100

1

0099

000

M4

µd

012

10000

0970

0398

002

M5

µσ2

012

10010

0992

0098

000

M6

All parameters

012

10000

0100

0

00

100

000

The first column lists the true model along with the parameters that change due

to a structural break. CP, change points; ML, marginal likelihood. The “0 CP”

displays the number of times in the repetitions when the specification with no

change point has the highest ML, ect. Each row sums to 100. In this table,

we set α = 0.99.

Page 45: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

36

Table 2.4.: Model comparison using ML

Frequency by ML for different values of α

# of CP measure 0 CP 1 CP 2 CP 3 CP

DGP, M1

0α = 0.75α = 0.95α = 0.99

100100100

000

000

000

1α = 0.75α = 0.95α = 0.99

000

999999

111

000

2α = 0.75α = 0.95α = 0.99

000

000

100100100

000

DGP, M5

0α = 0.75α = 0.95α = 0.99

100100100

000

000

000

1α = 0.75α = 0.95α = 0.99

111

999999

000

000

2α = 0.75α = 0.95α = 0.99

000

222

989898

000

DGP, M6

0α = 0.75α = 0.95α = 0.99

100100100

000

000

000

1α = 0.75α = 0.95α = 0.99

000

100100100

000

000

2α = 0.75α = 0.95α = 0.99

000

000

100100100

000

The evidence for the number of change points is determined according ML.

“0 CP” column reports the number of times in the 100 repetitions when the

specification with no change point has the best performance. Similarly for

the other columns.

Page 46: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

37

2.4.2. Change-point identification

It is assumed that model specification Mi is known but the number or dates of the structural breaks

are not known. For each draw from the data generating process (DGP), the change-point ARFIMA

model assuming 0, 1, 2 and 3 structural breaks is estimated. Evidence for the number of break points

is then ranked according to the highest marginal likelihood (ML). Thereafter, a new data from the

DGP is generated and the procedure is repeated, until 100 repetitions are completed. The frequency

over repetitions in which each specification is best according to the marginal likelihood criterion is

then reported. Initially, we calculate ML using G-D with α = 0.99.

Table 2.3 lists the results for each specification. For example, the second row for M1 says that for

a DGP with one change point, 99 times is correctly identified as one change point in terms of ML,

while 1 is identified incorrectly as two change points. The next entry in the table repeats this for

a DGP with two change points. Here, 100 times two change points is correctly identified. Overall,

the change-point ARFIMA model works very well. When there is no change point, this is correctly

selected most of the times. Looking at these cases (first row in each panel) they are: 100/100 forM1,

87/100 for M2 and 100/100 for M3-M6. When the process contains change points, the marginal

likelihood method correctly identifies existence of the change points in most cases. For example, the

probability of correctly identifying instability of the process is 0.99 (86 + 5 + 8) /100 forM2 with one

change point, and is 1 for two change points. The correct number of change points is found most

of the times. Looking at the numbers in bold, many of them are close to 100. However, relatively

smaller numbers associated withM2 show that ML is less powerful when there are changes only in d6.

Therefore, for DGPs where d is subject to structural breaks it becomes easier to identify the breaks

when more parameters undergo a change. For example, compare M2 with the better performance of

M4 in which both µ and d change from a structural break. The best results of the table correspond

to models M3-M6.

We also investigate the ability of G-D to identify the true model for different values of α. Therefore,

besides α = 0.99, we repeat our Monte Carlo experiment with α = 0.75 and α = 0.95. Results for

M1, M5 and M6 are summarized in Table 2.4. We get identical results for these specifications and

forM0,M2,M3,M4 (not reported) as well. However, the Bayes factor in favor of the true model in

each case varies slightly across the values of α. Overall, G-D provides a very reliable method for the

identification of structural breaks using the ML criteria.

2.4.3. Parameter estimates

Given a full MCMC run, we calculate the mean, median and mode of θ(i)k , i = 1, ...N for each regime.

We then take the mean of these quantities over the number of Monte Carlo repetitions. We also

consider the posterior deviation (POSDEV) of the parameters in each regime defined as

POSDEV =

√√√√ 1

R

1

N

R∑h=1

N∑i=1

(i)k,h − θk

)2,

6We also get very similar results using DIC, see Section 2.4.4 for further results on the ability of DIC to detect thecorrect number of structural breaks. In cases where a too high number of change points are selected, we find that insome cases parameter estimates in the last regime tend to suffer from biases, whereas in other cases they do not. Itdepends of course on the position of the estimated change points.

Page 47: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

38

Table 2.5.: Monte Carlo parameter estimates

Regime parameter true mean median mode POSDEV

DGP, M4, T = 5001 1 0.9880 0.9877 0.9812 0.07712 µ 2 1.9883 1.9883 1.9455 0.13933 0.8 0.7985 0.7985 0.7941 0.0269

1 0.3 0.2798 0.2782 0.2511 0.07832 d 0.45 0.4223 0.4258 0.4117 0.05193 0.05 0.0854 0.0780 0.0752 0.1194

1-3 σ2 0.1 0.0953 0.0951 0.0762 0.0074

DGP, M4, T = 10001 1 0.9967 0.9968 0.9878 0.05752 µ 2 2.0012 2.0014 1.9961 0.11643 0.8 0.8018 0.8019 0.8014 0.0245

1 0.3 0.2983 0.2974 0.2846 0.04582 d 0.45 0.4430 0.4453 0.4437 0.02673 0.05 0.0686 0.0643 0.0705 0.0336

1-3 σ2 0.1 0.0988 0.0986 0.0847 0.0047

DGP, M4, T = 20001 1 1.0033 1.0040 0.9956 0.04072 µ 2 1.9896 1.9900 1.9929 0.07353 0.8 0.7979 0.7978 0.7957 0.0101

1 0.3 0.3089 0.3082 0.2983 0.02642 d 0.45 0.4479 0.4490 0.4446 0.02393 0.05 0.0638 0.0621 0.0551 0.0241

1-3 σ2 0.1 0.0991 0.0993 0.0898 0.0030

This table reports the true value of the DGP parameters along with the mean, median, mode and

posterior deviation (POSDEV) of θk for the Monte Carlo simulations, generated from M4, 2 CP

with parameters as indicated using T = 500, T = 1000 and T = 2000.

where θ(i)k,h is the ith posterior draw of θk at the hth Monte Carlo iteration, and θk is the vector of the

true DGP parameters in regime k. We summarize results for M4 with 2 change points for T = 500,

T = 1000 and T = 2000 in Table 2.5. In each case, the first 1000 samples are discarded and the next

5000 are used for posterior inference. Overall, we see that the change-point ARFIMA model works

very well. On average, the parameter estimates are very close to their true values. In the simulations,

we estimate M at 25 to 30. Compared to T = 500, as we increase the number of observations in the

DGP to T = 1000, the POSDEV of each parameter drops on average by 10 to 40%. The POSDEV of

each parameter drops even more when we increase the sample size to T = 20007.7We find that our estimation method also correctly identifies the true position of the change points for T = 500.

Furthermore, as correctly pointed out by a referee, the period between each change point increases with T and thuscontributes to the reduction in POSDEV.

Page 48: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

39

Table 2.6.: Model comparison using different criteria

Frequency by DIC and ML

# of CP measure 0 CP 1 CP 2 CP 3 CP

DGP, M1

0DICML

100100

00

00

00

1DICML

00

9899

01

20

2DICML

00

00

98100

20

DGP, M3

0DICML

100100

00

00

00

1DICML

00

93100

00

70

2DICML

00

01

9699

40

DGP, M6

0DICML

84100

120

20

20

1DICML

00

96100

20

20

2DICML

00

00

97100

30

The evidence for the number of change points is determined according to DIC

and ML. “0 CP” column reports the number of times in the repetitions when

the specification with no change point has the best performance.

2.4.4. Deviance information criterion

Another approach to compare the evidence for the number of change points is by using the deviance

information criterion (DIC) of Spiegelhalter et al. (2002). Calculation of DIC in a MCMC scheme is

trivial. Contrary to AIC or BIC, DIC does not require maximization over the parameter space. DIC

is a combination of p (YT | P, θ,M) and a penalty term, pD, which describes the complexity of the

model serving as a penalization term that corrects deviance’s propensity towards models with more

parameters. More precisely, pD = D (P, θ,M) − D(P , θ, M

), where D (P, θ,M) is approximated by

N−1ΣNi=1 − 2 log p

(YT | P (i), θ(i),M (i)

)and D

(P , θ, M

)= −2 log p

(YT | P , θ, M

), where P , θ and M

are estimated from the Gibbs output using mean or mode of the posterior draws. The DIC is defined

as D(P , θ, M

)+ 2pD. It is worth mentioning that the best model is the one with the smallest DIC.

However, it is difficult to say what would constitute a significance difference in DIC. Very roughly,

Page 49: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

40

differences of more than 10 might definitely rule out the model with the higher DIC.

The DIC is calculated for specifications M1, M3, M6 and results are summarized in Table 2.6.

Similar results are obtained for the other cases. The number of times out of 100 repetitions that a

specific change-point model is selected as best according to DIC and ML is reported. For instance,

in the top row, for DGP M6 with no change point, 84 out of 100 repetitions the no change-point

model has the smallest DIC, 12/100 times the one change-point model is best, 2/100 times two and

three change-point models are best. When there are two structural breaks in the DGP for M6, DIC

correctly identifies the true model 97/100 times, whereas 3/100 times the three change-point model is

best. On the other hand, for this DGP, ML identifies the correct model 100/100 times.

2.4.5. Higher number of change points

In Section 2.4.2 we evaluated the performance of the change-point ARFIMA model under 0, 1 and 2

structural breaks. In this section, we evaluate the performance of the change-point ARFIMA model

when the DGP contains more change points. Specifically, we set T = 2000 and simulate data containing

4 change points. As before, the position of the change points follows a Uniform distribution. We

consider specifications M1, M3, M4 and M6. For each specification, we estimate the change-point

ARFIMA model assuming 0, ..., 5 structural breaks.

We find that DIC and ML correctly identify the true model in most cases. However, we are also

interested in whether or not our method is able to correctly identify the position of the change points.

Therefore, for each Monte Carlo iteration, we calculate the position of the change points using the

mode of S(i)Ni=1, and compare it with the true position of the change points. Specifically, for each

change point, we calculate

DIFFk =1

R

R∑h=1

∣∣∣cp(h)k − cp

(h)k

∣∣∣

Table 2.7.: Dating the change points

DIFFDGP # CP 1st CP 2nd CP 3rd CP 4th CP

M1

µ4 0.0053 0.0105 0 0

M3

σ2 4 2.2000 4.9375 4.3500 1.3250

M4

µd

4 0.3913 0.2391 0 0

M6

All parameters4 0.0822 0.6027 0.1644 0

This table reports the average difference (for each change point) between the

estimated change-point and the true change-point date. The definition of DIFF

is given in the text.

Page 50: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

41

for k = 1, ...,m − 1, where cp(h)k is the kth estimated change-point date. cp

(h)k is the true position of

the kth change point for the hth Monte Carlo iteration. For results to give meaning, we focus only on

specifications that have the correct number of change points, i.e. four change points.

Table 2.7 reports DIFF for each specification when the underlying process contains 4 change points.

For instance, we miss (on average) the correct date of the second change point by 0.6 time periods (or

0.6 ≈ 1 day, if we work with daily data) forM6. On the other hand, we correctly identify the correct

date of the last change point at every Monte Carlo iteration as∣∣∣cp(1)

4 − cp(1)4

∣∣∣ =∣∣∣cp(2)

4 − cp(2)4

∣∣∣ = ... =∣∣∣cp(R)4 − cp(R)

4

∣∣∣ = 0. The same happens for the third and forth change point for M1and M4.

2.4.6. Sample size

In order to assess the robustness of our methods with respect to different sample sizes, the more

challenging specifications are considered with sample sizes of 500, 1000 and 2000. Therefore, results

for M2 with one and two change points along with M4 with two change points are reported in Table

2.8. Increasing T improves identification of the true number of change points. The distribution is also

more concentrated on the true model. For instance, for DGPM2, 2 CP with T = 500, we identify the

correct specification only 54/100 times, whereas the for T = 1000, the correct specification is selected

95/100 times. For T = 2000, the correct specification is selected 98/100 times. We obtain very similar

results for DIC and the other model specifications which we do not report here.

Overall, we find that DIC can be considered as a very compelling alternative to ML. Furthermore,

the change-point ARFIMA model works very well in identifying the true dates of the change points.

Specifications M1, M4 and M6 are more accurate than M3.

Table 2.8.: Effect of sample size on identification of change points

Frequency by ML

Sample size 0 CP 1 CP 2 CP 3 CP

DGP, M2, 1 CP500 9 76 6 91000 1 86 5 82000 0 95 3 2

DGP, M2, 2 CP500 0 0 54 461000 0 0 95 52000 0 0 98 2

DGP, M4, 2 CP500 0 0 83 171000 0 0 98 22000 0 0 98 2

The “0 CP” column records the number of times when the model with no change

point has the highest marginal likelihood. The “1 CP” column records the number

of times when the model with one change point has the largest marginal likelihood.

Page 51: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

42

2.5. Realized Volatility

Suppose that, along day t, the logarithmic prices of a given asset follow a continuous-time diffusion

process

dp (t+ s) = µ (t+ s) dt+ σ (t+ s) dW (t+ s) , 0 ≤ s ≤ 1, t = 1, 2, ...,

where p (t+ s) is the logarithmic price at time t + s, µ (t+ s) is the drift component, σ (t+ s) is

the instantaneous volatility and W (t+ s) is a standard Brownian motion. In addition, suppose that

σ (t+ s) is orthogonal to W (t+ s), such that there is no leverage effect. This assumption is standard in

the realized volatility literature. Andersen et al. (2003) and Barndorff-Nielsen and Shephard (2002a)

show that daily returns, defined as rt = p (t)− p (t− 1) are conditionally Gaussian. That is

rt | Ft ∼ N(ˆ 1

0µ (t+ s− 1) ds,

ˆ 1

0σ2 (t+ s− 1) ds

).

The true volatility for day t is defined as IVt =´ 1

0 σ2 (t+ s− 1) ds and is known as the integrated

volatility. In the absence of microstructure noise, realized volatility is a consistent estimator of IVt as

the intraday sampling frequency goes to infinity. Realized volatility (RV) is constructed from the sum

of intraday squared returns, Σnj=1r

2j,t, where rj,t = pj,t − pj−1,t. pj,t is the j th intraday price and n is

the number of intra-daily observations. As pointed out by for example Andersen et al. (2003), RVt is

more efficient than traditional measures of volatility, such as daily squared returns.

Market microstructure dynamics contaminate the price process with noise. Hence, RVt can be a

biased and inconsistent estimator of IVt, see Hansen and Lunde (2006) for more details on the effects

of market microstructure noise on volatility estimation. In order to reduce the effects of market

microstructure noise, we employ a kernel-based estimator that utilizes autocovariances of intraday

returns. Specifically, we follow Hansen and Lunde (2006) and provide a bias correction to realized

volatility in the following way

RV qt =

n∑j=1

r2t,j + 2

q∑w=1

(1− w

1 + q

) n−w∑j=1

rt,jrt,j+w,

where q is a small positive integer, and we set q = 1. Henceforth, RV qt is referred to as RVt.

2.6. Application to S&P 500 Volatility

2.6.1. Data

The empirical application is based on S&P 500 index data using the S&P’s Depositary Receipts fund.

The data consists of 5-minutes intra-daily observations from January 2nd, 2000 to December 31st,

2009, for a total of T = 2515 trading days. The cleaning of the data is carried out using the steps in

Barndorff-Nielsen et al. (2009). After cleaning, a 5-minute grid from 9:30 to 16:00 is constructed using

previous-tick method, see Hansen and Lunde (2006). From this grid, 5-minute intraday returns are

constructed. These returns are used to construct realized volatility. Following Raggi and Bordignon

(2012), the annualized realized standard deviation, yt =√

252RVt/100, is considered.

Page 52: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

43

However, there are outliers in yt. Therefore, we risk that a single outlier can wrongly be identified as

a separate regime8. To rule this out, we follow Kim et al. (2005) and Liu and Maheu (2008), imposing

the assumption that each regime lasts at least 66 days. Specifically, we perform the following: when

we simulate a draw of S(i) which has a regime shorter than 66 days, we discard it and resample until

each regime is 66 days or more in length. We find that our results are robust to different duration

restrictions, see Section 2.6.3. The first 1000 draws are discarded and the next 5000 are used for

posterior inference.

Table 2.9.: Model comparison by marginal likelihood and DIC for S&P 500 volatility

M1 M2 M3 M4 M5 M6

# CP µ d σ2 µ, d µ, σ2 All parameters

0-1946.84 -1946.84 -1946.84 -1946.84 -1946.84 -1946.84(3860.65) (3860.65) (3860.65) (3860.65) (3860.65) (3860.65)

1-1942.64 -1942.42 -1918.87 -1951.32 -1927.45 -1929.60(3852.58) (3851.81) (3778.32) (3854.58) (3803.89) (3801.83)

2-1938.71 -1946.55 -1549.75 -1941.20 -1378.12 -1531.20(3845.17) (3859.81) (2777.14) (3836.48) (2705.06) (2701.23)

3-1933.58 -1945.08 -1504.93 -1949.16 -1283.65 -1379.35(3814.83) (3854.55) (2728.29) (3816.87) (2514.36) (2511.32)

4-1915.71 -1945.45 -1404.15 -1921.52 -1257.10 -1290.64(3781.36) (3853.67) (2562.78) (3764.23) (2360.45) (2378.46)

5-1933.43 -1945.58 -1670.04 -1937.13 -1259.75 -1999.48(3819.57) (3855.29) (2859.83) (3774.48) (2368.46) (3898.11)

6-1940.84 -1947.59 -1682.08 -1941.23 -1737.02 -2053.65(3833.59) (3858.50) (2983.70) (3801.87) (2653.20) (4109.60)

This table reports the log-marginal likelihood with α = 0.99 and DIC (indicated in the parentheses) for different ARFIMAmodels. The first row lists the index of the models. The second row lists the parameters that are subject to structuralbreaks within each specification. The first column is the number of the change points (CP) that are conditioned on.Bold numbers indicate the highest marginal likelihood and lowest DIC within each specification.

2.6.2. Results

To conduct estimation, we use the same priors as in Section 2.4.1. We investigate models under the

structural change configurations in Table 2.1. Table 2.9 displays log(ML) and DIC (indicated inside

the parentheses) for specifications with zero up to six change points.

Results suggest existence of four change points according to ML and DIC. The log-marginal likeli-

hood for no change point is −1946.84, and most specifications with structural breaks improve on this.

The difference between the best structural break specification (M5, 4 CP) and M0 is large with a

Bayes factor of exp (689.74) in favor of four structural breaks. This is very strong evidence. For all

model settings, except M2, the largest ML (lowest DIC) occurs at four change points. There is some

posterior support for four change points forM2, but it is outperformed by its one change-point coun-

terpart with a Bayes factor of exp (3.03). We also compare models across parameter specifications.

The highest log(ML) and lowest DIC model across all cases is −1257.10 and 2360.45, respectively for

M5 with four change points. Considering the second largest log(ML) in Table 2.9, which is −1290.64

fromM6 with four change points, the Bayes factor ofM5 vsM6 is exp (33.54). Therefore, we conclude

that the effect of the breaks is mainly in µ and σ2.

8This is a very common problem in RV data as realized volatility by its nature is very noisy.

Page 53: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

44

Compared to M5 and M6, our results also suggest that incorporating changes only in µ, d, σ2, or

the combination µ,d worsens ML (DIC) considerably, indicating the need to model breaks in both µ

and σ2. For instance, compare the performance ofM3 withM5. Thus, while both models point to the

4 CP specification as the best performing model, the Bayes factor of M5 versus M3 is exp (147.05).

Table 2.10.: Parameter estimates for S&P 500 volatility

M5, 4 CP ARFIMA-GARCH, 3 CP

Parameter mean RB Geweke M-Hratio

mean RB Geweke M-Hratio

d 0.4445 6.86 -0.50 0.35 0.4806 7.95 -0.19 0.32[0.4181,0.4722] [0.4612,0.4897]

µ1 0.1890 4.20 -1.22 0.37 0.1128 7.76 -0.61 0.39[0.1685,0.2089] [0.0669,0.1590]

µ2 0.0945 10.11 -0.15 0.34[0.0876,0.1015]

µ3 0.1554 6.09 -0.61 0.41[0.1266,0.1836]

µ4 0.3922 9.82 1.11 0.42[0.3270,0.4589]

µ5 0.1702 4.45 -2.19 0.34[0.1517,0.1890]

σ21 0.3238 1.03 0.76

[0.2944,0.3572]σ2

2 0.0501 1.76 0.58[0.0450,0.0549]

σ23 0.1806 0.88 1.41

[0.1515,0.2152]σ2

4 0.6809 1.02 0.01[0.5345,0.8755]

σ25 0.0822 1.38 -0.52

[0.0676,0.0999]M 31 15.41 -1.85 0.28

[24,38]γ2 1.1014 7.04 0.01 0.39

[0.8552,1.3937]γ3 1.6845 7.04 -0.01 0.37

[1.2693,2.2280]γ4 0.9303 4.53 -0.21 0.33

[0.7590,1.1251]ω 0.0029 3.71 -1.04 0.42

[0.0023,0.0036]a 0.1644 4.95 -1.95 0.42

[0.1449,0.1833]b 0.8313 4.87 1.47 0.42

[0.8124,0.8505]

DIC 2360.45 2375.11log(ML) -1257.10 -1261.83

This table reports posterior means (mean), 95% credibility intervals (indicated inside the brackets), inefficiency factors(RB), Geweke’s convergence statistics (Geweke), M-H acceptance ratios, DIC, log(ML) using α = 0.99 for M5, 4 CPand ARFIMA-GARCH, 3 CP. Parameters associated with each regime are labeled with subscript 1, ...,m.

Page 54: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

45

Figure 2.1.: Change-point dates for M5, 4 CP

00 01 02 03 04 05 06 07 08 09 100

0.25

0.5

0.75

1

1.25

1.5

RVFitted RVConditional meanChange−point dates

S&P 500 annualized volatility, fitted volatility, conditional mean and estimated change-point dates

indicated as vertical lines for M5, 4 CP.

Figure 2.1 displays the data, fitted volatility and the estimated change-point dates shown by vertical

lines using the posterior mode of S(i)Ni=1 for M5 , 4 CP. The change-point dates are given as:

04-10-2003, 07-17-2007, 09-16-2008 and 03-19-2009.

There are several events that may have contributed to the changes in volatility dynamics. For

instance, the last two break points occur during the financial crises (fall of 2008 and spring of 2009).

Furthermore, 09-16-2008 and 03-18-2009, which is the date before the last change point are both

associated with FOMC announcements. The first and second break points correspond roughly to the

beginning of the second Iraq invasion in 2003 and the beginning of the subprime crisis in the US in

20079. Finally, in the last two phases, we see a structural exponential decay of volatility. The fitted

values show that our change-point specification is also able to incorporate this feature.

Table 2.10 reports some summary statistics concerning posterior distribution of the key parameters

of M5, 4 CP and some diagnostics. Specifically, after discarding the first 1000 iterations, we collect

the final sample and compute: the posterior mean of θ and M , 95% credibility intervals (indicated

inside the brackets), inefficiency measures, RB, Geweke’s convergence statistics, Metropolis-Hastings

acceptance ratios, DIC and log(ML). RB displays the relative variance of the posterior sample draws

when adapting for correlation between iterations, as compared to the variance without accounting for

correlation. In these calculations, a bandwidth, B, of 100 is used, see Kim et al. (1998) for a further

background on this measure.

None of the density intervals for the parameters include 0. The order of integration, d, is estimated

at 0.44 (compared to 0.48 for M0). This implies that S&P 500 data exhibits long-memory behavior.

Furthermore, results suggest that there are no structural breaks in d, which can be seen from the

relative smaller ML values for M2, M4 or M6 compared to M5. When we look at the parameters

9The beginning of the second Iraq invasion is associated with a lower level of volatility which seems counterintuitive.We cannot find an explanation for this phenomenon.

Page 55: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

46

that change from a break, we find that there are sensible differences across regimes. For instance, we

see that µk increases from 0.15 to 0.39 during the regime that covers the financial crisis. Subsequently,

µk falls to 0.17 from the last change point till the end of the sample. The same happens for the

conditional volatility of realized volatility, σk. Specifically, we estimate σ4 at 0.82 during the 2008

financial crises. This estimate is twice as large as σ3, which we estimate at 0.42. Thereafter, σk falls

to 0.28 from the last change point till the end of the sample.

2.6.3. Robustness to minimum duration restrictions

In this section, we test the robustness of the minimum regime durations. We follow Liu and Maheu

(2008), and estimate the best model, M5, under different minimum duration lengths. Hence, besides

the minimum 66 days (3 months), we consider the following duration lengths: 44 days (2 months), 88

days (4 months), 110 days (5 months) and 132 days (6 months).

Table 2.11 reports ML values for M5 from 1 to 6 change points under the mentioned duration

lengths. Overall, we see that the marginal likelihoods are almost identical across different cases,

except for duration lengths of 44 and 132 days. First, when we set the minimum duration length to

44, we obtain the exact same ML values as in Table 2.9 for 1 to 4 change points. Second, for 5 and 6

change points, we get lower ML values. However, this does not change the main conclusion.

For 132 days, we find evidence in favor of 3 change points at: 04-10-2003, 07-25-2007 and 03-19-

2009. Furthermore, the ML values for M5, 4 to 6 CP differ significantly than those in Table 2.9.

However, this is understandable. The explanations is as follows: in Section 2.6.2, we found evidence

of four structural breaks. Furthermore, the last two break dates occur at 09-16-2008 and 03-19-2009.

Accordingly, there are 127 days between these to dates. Hence, when we set the minimum duration

length to 132, we are automatically forcing the model to find different break dates than those dates. In

fact, for the specification with four change points, we find that the last two breaks occur at 09-05-2008

and at 03-19-2009, whereas for the other duration lengths they occur at 09-16-2008 and 03-19-2009.

Consequently, we obtain different ML values. Furthermore, these results indicate that M5, 4 CP

estimate of the third change-point date under the minimum 132 days restriction, 09-05-2008, worsens

ML considerably.

Overall, the marginal likelihoods are almost identical across the different duration restrictions. The

first four specifications favor the four change points with exact change-point dates10.

Table 2.11.: Robustness to minimum regime duration lengths

44 days 66 days 88 days 110 days 132 days# CP (2 months) (3 months) (4 months) (5 months) (6 months)

1 -1927.45 -1927.45 -1927.45 -1927.45 -1927.452 -1378.12 -1378.12 -1380.89 -1378.12 -1378.123 -1283.65 -1283.65 -1283.65 -1283.65 -1283.654 -1257.10 -1257.10 -1257.10 -1257.10 -1662.565 -2033.78 -1259.75 -1259.75 -1259.75 -1542.716 -2036.12 -1737.02 -1737.02 -1737.02 -1702.11

This table compares log(ML) values of different minimum regime durations for M5. Each column

reports the lower bound for the number of observations in each regime.

10We obtain very similar results for DIC.

Page 56: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

47

2.6.4. Prior sensitivity analysis

In this section, sensitivity of the results to prior specification is evaluated by experimenting with

different prior hyperparameter values on the transition probabilities, pkk, keeping prior hyperparameter

values of the other parameters the same as in Section 2.4.1. pkk is one of the key parameters of the

model because it controls the duration of each regime in a sequence of S.

In Table 2.12 we experiment with different hyperparameter values on pkk, and report log(ML) for

each of these hyperparameter values by estimating M5 from 1 to 6 change points. For instance,

the first alternative prior is pkk ∼ Beta (0.5, 0.5) which is relatively flat. With this prior, we find

evidence of three change points. They are given as: 04-10-2003, 07-25-2007 and 03-19-2009. For

pkk ∼ Beta (0.5, 0.5), compared to M5, 4 CP, the Bayes factor in favor of M5, 3 CP is exp (1.35),

which is very weak evidence. Hence, contrary to the results in Table 2.9, we do not have substantial

posterior evidence in favor of 3 change points. For pkk ∼ Beta (10, 2), we find strong evidence in favor

of four change points. Furthermore, the change-point dates correspond exactly to those in Section

2.6.2. Finally, for pkk ∼ Beta (20, 0.1) which is a relatively tighter prior, we find that M5, 5 CP

performs best11. The first break occurs at 07-26-2002, the second at 04-14-2003, while the remaining

break dates correspond exactly to those in Section 2.6.2. Evidently, there remains some uncertainty

regarding the correct number of breaks for very uninformative and very tight priors on pkk. However,

results overwhelmingly suggest existence of structural breaks during the financial crisis of 2008/2009.

Finally, we also experiment with different prior hyperparameter values on µk, dk and σ2k. Overall, we

obtain very similar results.

Table 2.12.: Prior sensitivity analysis

# CP pkk ∼ Beta (0.5, 0.5) pkk ∼ Beta (8, 0.1) pkk ∼ Beta (10, 2) pkk ∼ Beta (20, 0.1)

1 -1928.12 -1927.45 -1935.21 -1927.352 -1384.52 -1378.12 -1393.53 -1379.413 -1290.85 -1283.65 -1669.50 -1616.894 -1292.20 -1257.10 -1289.23 -1444.745 -1869.91 -1259.75 -1869.29 -1237.216 -1905.09 -1737.02 -1888.65 -1604.79

This table compares log(ML) values for different prior hyperparameter values on pkk. The prior hyperparameter valuesof the other parameters are fixed according to Section 2.4.1.

2.6.5. Forecasts

In this section, we compare the out-of-sample performance ofM5 (break) withM0 (no-break). Specif-

ically, we compare the out-of-sample predictive likelihood (PL) and predictive mean between these two

models. Given data up to time t−1, Yt−1 = (y1, ..., yt−1)′, the predictive likelihood, p (yt, .., yT | Yt−1),

is the predictive density evaluated at the realized outcome, yt, ..., yT , t ≤ T , see Geweke (2005). It

contains the out-of-sample prediction record of a particular model, making it the essential quantity of

interest for model evaluation. For instance, the predictive likelihood for M5 is given as

p (yt, .., yT | Yt−1,M5) =

ˆp (yt, .., yT | P, θM, Yt−1,M5) p (P, θM | Yt−1,M5) dPdθdM. (2.6.1)

11Specifically, with pkk ∼ Beta (20, 0.1) it means that we assume the expected duration of each regime is about 201 daysbefore we see the data.

Page 57: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

48

(2.6.1) is the product of the individual predictive likelihoods

p (yt, ..., yT | Yt−1,M5) =

T∏s=t

p (ys | Ys−1,M5) .

If t = 1 this would be the marginal likelihood and (2.6.1) changes to (2.3.4). Hence, the sum of log-

predictive likelihoods can be interpreted as a measure similar to the logarithm of the marginal likeli-

hood, but ignoring the initial t−1 observations. The predictive likelihood can be used to order models

according to their predictive abilities. In a similar fashion to Bayes factors, one can also compare the

performance of models based on a specific out-of-sample period by predictive Bayes factors, PBF. The

PBF for model A versus B is given as PBFAB = p (yt, .., yT | Yt−1,MA) /p (yt, .., yT | Yt−1,MB), and

summarizes the relative evidence of the two models over the out-of-sample data, yt, ..., yT . Calculating

the predictive likelihood within a Gibbs sampling scheme is easy. We can simply use the output from

the Gibbs sampler. These draws are obtained based on the information set, Yt−1. As a new observa-

tion enters the information set, the posterior is updated through a new round of Gibbs sampling and

p (yt+1 | Yt,MA) can then be calculated.

In the context of forecasting with M5, we want the optimal change-point number to vary over

the out-of-sample data as the number of change points can increase as time goes by. Accordingly, we

adopt the following strategy: for the first out-of-sample observation at time t, we calculate the marginal

likelihood for various number of change points, (1, ...,K) using Yt−1. Thereafter, we choose the opti-

mal change-point number, Kt−1, using ML. We calculate the predictive likelihood, p (yt | Yt−1,M5),

and the predictive mean, E [yt | Yt−1,M5], using the parameters associated with specification Kt−1.

Thereafter, we increase the out-of-sample with one observation, calculate marginal likelihoods for

(1, ...,Kt−1 + 1) change points, choose the optimal change-point number, Kt, repeat the above fore-

casting procedure to obtain p (yt+1 | Yt,M5) and E [yt+1 | Yt,M5]. We choose the out-of-sample period

from January 23rd, 2006 till the end of the sample. It is also interesting to consider out-of-sample

point forecasts of yt based on the predictive mean. Therefore, we also report mean absolute error

(MAE) and root mean squared error (RMSE) for the predictive mean. The out-of-sample period

corresponds exactly to the period used to calculate PL. Furthermore, in addition to MAE and RMSE,

forecasts are also compared using the linear exponential (LINEX) loss function of Zellner (1986). This

loss function is defined as L (yt, yt) = bLINEX [exp (aLINEX (yt − yt))− aLINEX (yt − yt)− 1], where

yt is the point forecast. L (yt, yt) ranks overprediction (underprediction) more heavily for aLINEX > 0

(aLINEX < 0). We use bLINEX = 1, with aLINEX = 1 and aLINEX = −1 in our calculations.

Overall, the break model offers improvements compared to the no-break model. For one observation

out-of-sample, (T = t), log (PBF ) = 2.26, 1 month, (T = t+ 21), log (PBF ) = 5.14, 3 months,

(T = t+ 65), log (PBF ) = 59.06, each in favor of the break specification. The improvements continue

till the end of sample, see Table 2.13. Finally, Table 2.13 also displays out-of-sample results for one-day

ahead point forecasts for the no-break and the break model. The break model offers improvements in

terms of MAE and RMSE compared to the no-break model. When the LINEX loss function is used, the

break model also provides gains in terms of point forecasts. However, compared to density forecasts,

point forecasts show only modest improvements. The difference between the predictive likelihood and

the predictive mean is that the predictive likelihood criterion takes into account the whole shape of

the density, whereas the predictive mean does not.

Page 58: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

49

Table 2.13.: Out-of-sample forecasts for S&P 500 volatility

Model MAE RMSE LINEX LINEX log(PL)aLINEX = 1, bLINEX = 1 aLINEX = −1, bLINEX = 1

No-break 0.3553 0.6241 0.2582 3.2456 -923.73Break 0.3403 0.5993 0.2475 2.5680 -544.48

This table reports mean absolute error (MAE), root mean squared error (RMSE) and average LINEX for the forecastsbased on the predictive mean for one-day ahead. Furthermore, the one-day ahead log-predictive likelihood, log(PL), isalso reported.

2.6.6. Structural breaks and GARCH effects

In this section, we model changes in the volatility of realized volatility, see Bollerslev et al. (2007),

Corsi et al. (2008), Liu and Maheu (2008). To investigate if the presence of structural breaks is due

to neglected conditional variance dynamics, we consider breaks in an ARFIMA-GARCH model. The

ARFIMA model extended to include a GARCH structure is

(1− L)d (yt − µ) = γstσtet, et ∼ N (0, 1) , σ2t = ω + ae2

t−1 + bσ2t−1, (2.6.2)

where ω > 0, a > 0, b > 0 and a+ b < 1 are also imposed. The parameter, γst , is a scaling constant,

which has a direct effect on the unconditional volatility of yt. In the following, γ1 = 1, and it is

assumed that γk > 0 for k = 2, ...,m. Thus, in regime 1, this is a standard ARFIMA model with

GARCH effects, see Baillie et al. (1996), while in later regimes, the conditional variance of yt can be

larger or smaller than σ2t depending of course on γst > 1 or γst < 1. As noted by Liu and Maheu

(2008), the advantage of this specification is that one can model permanent changes in the volatility

of realized volatility but avoid the path dependence in σ2t induced by parameter changes in ω, a, and

b12. Equation (2.6.2) is estimated using the AR representation of the ARFIMA model. The likelihood

is evaluated using the method of Beran (1994). Let θ = (µ, d, ω, a, b, γ2, ...γm)′, the Gibbs sampler

requires iteration of

1. S(i) | P (i−1), θ(i−1), YT .

2. θ(i) | S(i), YT .

3. P (i) | S(i).

We use Metropolis-Hastings to sample each element of θ. For the GARCH parameters, we sam-

ple ψ = (ω, a, b)′

all-at-once using the Independence Chain Metropolis-Hastings algorithm. Specif-

ically, conditional on S, γ2, ...γm, d and µ, at each iteration of the Gibbs sampler, we maximize

the likelihood of (2.6.2) with respect to ψ, and specify the candidate generating density as q (ψ) ∼N(ψML, c · var

(ψML

)), where c ∈ R+. The priors of θ are independent Normals with mean 0,

variance 100, truncated (except for µ) to satisfy the restrictions on each parameter. Furthermore, we

ensure that a + b < 1 by resampling a(i) > 0 and b(i) > 0, until a(i) + b(i) < 1. We follow Section

2.6.2 and estimate a stable ARFIMA-GARCH model, as well as its structural break version with 1

to 5 change points. We find strong evidence in favor of structural breaks for (2.6.2). Specifically, we

find that the specification with 3 change points performs best. The change-point dates are given as:

11-20-2007, 09-04-2008 and 03-19-2009.

12Liu and Maheu (2008) consider a similar HAR-GARCH specification.

Page 59: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2. Long Memory and Structural Breaks in Realized Volatility: An Irreversible MS Approach

50

Parameter estimates of the ARFIMA-GARCH specification conditional on 3 change points are

listed in Table 2.10. Compared toM5, 4 CP, we find that d increases (from 0.44 to 0.48). The scaling

parameter, γst , also changes between regimes. Specifically, γst rises during the second and third phase

which start from late 2007 and last till 03-18-2009. γst falls from 1.68 for 09-04-2008/03-18-2009 to

0.93 for 03-19-2009/12-31-2009. Finally, the unconditional volatility of volatility,√γ2stω/ (1− a− b),

increases during the financial crises of 2008 (1.91 for the period 09-04-2008 to 03-18-2009 compared

to 0.82 for the period 11-20-2007 to 09-03-2008). Thereafter,√γ2stω/ (1− a− b) falls to 0.58 from the

last change point till the end of the sample.

2.7. Conclusion

We present a Bayesian method for joint analysis of long-memory and structural breaks using change-

point ARFIMA models. We estimate different specifications and determine the number of change

points using ML and DIC. Monte Carlo simulations demonstrate that our MCMC sampler works very

well as the estimated parameters are close to their true values. Furthermore, we find that ML and

DIC are powerful in detecting and dating structural breaks.

Applying the model to daily S&P 500 data from 2000 to 2009 shows that there is robust evidence

in favor of four structural breaks. We demonstrate that accounting for structural breaks improves

density and point forecasts. Finally, an ARFIMA model with GARCH effects is also considered. This

model provides evidence in favor of three structural breaks.

Page 60: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of Particle Markov Chain Monte Carlo

Techniques for Unobserved Component Time

Series Models

Author: Nima Nonejad

Abstract: This paper details particle Markov chain Monte Carlo (PMCMC) techniques for anal-

ysis of unobserved component time series models using several economic data sets. The objective of

this paper is to explain the basics of the methodology and provide computational applications that

justify applying PMCMC in practice. For instance, we use PMCMC to estimate stochastic volatil-

ity models with leverage effect, Student-t distributed errors and serial dependence. We also model

changing time series characteristics of monthly US inflation rate by considering a long-memory au-

toregressive fractionally integrated moving average (ARFIMA) model with its conditional variance

modeled by a stochastic volatility process with Gaussian and Student-t distributed errors.

Keywords: Bayes, Gibbs, Metropolis-Hastings, particle filter, unobserved components

(JEL: C11, C22, C63)

Page 61: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

52

3.1. Introduction

This paper is inspired by Bos (2011), Creal (2012), Flury and Shephard (2011). It provides some

contribution in the field of computational sequential Monte Carlo methods1. Specifically, we analyze

different economic data sets using particle Markov chain Monte Carlo (PMCMC) techniques, see

Andrieu et al. (2010), Flury and Shephard (2011). The aim of this paper is not to focus on the

properties of PMCMC compared to Gibbs sampling and maximum likelihood, nor to provide very

thorough analyses of empirical data. Instead, we aim to describe the basic steps of PMCMC, together

with details on implementation of some of the key algorithms2. These algorithms are chosen for the

insights that they provide. They are not always the most advanced, quickest, or most efficient way

of programming. Rather, we aim to show that PMCMC provides a very compelling framework for

estimating unobserved component (UC) time series models. We illustrate these methods on different

problems, producing rather generic methods.

In traditional MCMC (particularly Gibbs sampling) applications of UC models, we use either: a

single-state procedure, see Jacquiera et al. (1994), mixture samplers as in Kim et al. (1998), Omori

et al. (2007) or an accept-reject Metropolis-Hastings procedure as in Chan (2014) to draw the latent

states. In the PMCMC framework, we can estimate these types of models by (a): targeting the

posterior with the latent states integrated out. Thereafter, we can sample the model parameters

using Metropolis-Hastings, or (b): we can sample the latent states directly using the nonlinear or

non-Gaussian framework via a conditional particle filter. In general, (a) is referred to as the particle

marginal Metropolis-Hastings sampler. Andrieu et al. (2010) show that when an unbiased estimated

likelihood is used inside a Metropolis-Hastings algorithm then the estimation error makes no difference

to the equilibrium distribution of the algorithm, see Flury and Shephard (2011) for more details. In

a similar fashion, Andrieu et al. (2010) term the second estimation procedure, (b), the particle Gibbs

sampler, in which one sequentially samples the latent states (using a particle approximation of the

conditional posterior of the latent states) and the model parameters. In this paper we focus mainly

on implementation of the particle marginal Metropolis-Hastings sampler, PMMH. However, we also

provide an empirical application where we apply particle Gibbs, PG3. We believe that applying the

PMCMC methodology to unobserved component models is the most important contribution that we

provide. As we shall see, for these types of models, PMCMC requires limited design effort on the

user’s part, especially if one desires to change some features in a particular model.

The main concepts of PMCMC are briefly introduced in Section 3.2. The initial model in Section

3.3 is the standard stochastic volatility (SV) model with Gaussian errors applied to a financial data

set concerning daily OMXC20 returns. Next, we consider different well-known extensions of the SV

model. The first extension is a SV model with Student-t distributed errors. In the second extension,

we incorporate a leverage effect by modeling a correlation parameter between measurement and state

errors. In the third extension, we implement a model that has both stochastic volatility and moving

average errors, see Chan (2013). The fourth extension is PMMH implementation of the stochastic

1Bos (2011) and Creal (2012) provide excellent introductions to particle filtering both in terms of theory and computa-tion. In fact, both authors provide the software associated with their work. Hence, the idea behind this article is todo the same, however, for PMCMC which is gaining traction in time series econometrics. This way other researcherscan replicate our results using the codes that we can provide.

2The choice of Ox is mainly because it is a popular program among econometricians.3Furthermore, we address possible technical complications of PG by using the conditional particle filter with ancestor

sampler of Lindsten et al. (2012), which has been shown to be computationally more robust and flexible.

Page 62: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

53

volatility in mean model of Koopman and Hol Uspensky (2002). In this specification, the unobserved

volatility process appears in both the conditional mean and the conditional variance. We show that

estimating this specification is also very straightforward using PMMH. Finally, we consider a two factor

stochastic volatility model as in Harvey et al. (1994) and Shephard (1996). We show that PMMH

provides a straightforward procedure for estimation and marginal likelihood (ML) calculation of these

models. Specifically, computing the marginal likelihood is relatively easy using the method of Gelfand

and Dey (1994) as the integrated likelihood is easily available from the particle filter. Thereafter,

we reconsider the unobserved components model of the US inflation rate, see Stock and Watson

(2007), Grassi and Proietti (2010). We estimate different specifications of the unobserved components

model using PMMH. Model selection is again carried out by comparing marginal likelihoods between

models. Results indicate that the specification in which the volatility of both the regular and irregular

components of inflation evolve according to SV processes performs best.

Finally, we show that it is also relatively easy to estimate more complicated models using PMCMC.

First, we estimate an autoregressive fractionally integrated moving average (ARFIMA) model with

time-varying volatility using monthly postwar US core inflation data from 1957 to 2013. Our initial

model is an ARFIMA model with time-varying volatility modeled as a Gaussian SV process, see Bos

et. al (2012). Next, we consider a well-known extension, and model volatility using a SV model with

Student-t distributed innovations. Furthermore, model estimation is comparatively fast, so we are

able to obtain rolling estimates necessary for forecasting and parameter sensitivity analysis. Second,

we follow Chan (2014), and develop an unobserved components model where the stochastic volatility

process has a direct and time-varying impact on the variable of interest. However, Chan (2014) uses

a traditional Gibbs sampling approach, whereas we use particle Gibbs. We show that we are able

to sample the latent states all-at-once using the nonlinear state-space form. On the other hand,

Chan (2014) draws these latent states sequentially. Furthermore, Chan (2014) is forced to implement

an accept-reject Metropolis-Hastings procedure to draw the volatility process from its conditional

posterior as the conditional state-space model cannot be written in linear form.

The main concepts of PMCMC with computational focus on Metropolis-Hastings and the particle

filter are presented in Section 3.2. In Sections 3.3 to 3.6, we present several applications to demonstrate

the performance of the algorithms. Finally, the last section concludes.

3.2. Markov Chains and Particle Filters

Consider the simplest formulation of the stochastic volatility (SV) model

yt = exp (αt/2) εt, εt ∼ N (0, 1) (3.2.1)

αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (3.2.2)

where yt is the observed data, α1:T = (α1, ..., αT )′

are the unobserved log-volatilities, µ is the drift in

the state equation, σ is the volatility of log-volatility and ρ is the persistence parameter. Typically,

we would impose that |ρ| < 1 so that we have a stationary process with the initial condition, α1 ∼N(µ, σ2/

(1− ρ2

)). Let θ =

(µ, ρ, σ2

)′and YT = (y1, ..., yT )

′. This model has been heavily analyzed in

time series econometrics, see for example Kim et al. (1998). Equations (3.2.1)-(3.2.2) are an example

of a nonlinear state-space model where the measurement equation is nonlinear in α1:T . Furthermore,

Page 63: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

54

while sampling θ ∼ p (θ | α1:T , YT ) is relatively easy, sampling α1:T ∼ p (α1:T | θ, YT ) is often difficult.

Within the Gibbs sampling framework, the most popular approach to estimating (3.2.1)-(3.2.2) is

the so-called auxiliary mixture sampler, see Kim et al. (1998). The idea is to approximate the SV

model using a mixture of linear Gaussian models. Specifically, we can square both sides of (3.2.1), take

the logarithm such that y∗t = αt+ε∗t , where y∗t = log y2

t and ε∗t = log ε2t . Kim et al. (1998) show that ε∗t

can be approximated by a seven-component Gaussian mixture density. We can then write this mixture

density in terms of an auxiliary random variable, zt ∈ 1, ..., 7, that serves as the mixture component

indicator. Hence, ε∗t | zt = i ∼ N(mi, s

2i

), where Pr (zt = i) = ωi. The values of mi, s

2i and ωi,

i = 1, ..., 7 are all fixed and given in Kim et al. (1998). Using this Gaussian mixture approximation,

we can express the SV model as a linear Gaussian state-space model. Bayesian estimation can be

performed using standard Gibbs sampling techniques, see Kim et al. (1998). Finally, using this

approach, we sample from the augmented posterior, p (θ, α1, ..., αT , z1, ..., zT | YT ), i.e. augmented

to include z1, ..., zT and not from p (θ, α1, ..., αT | YT ). Within the PMCMC framework, specifically

PMMH, we need not use the above approximation. On the contrary, we can approach estimating

(3.2.1)-(3.2.2) by using the predictive decomposition

p (YT | θ) =

T∏t=1

p (yt | Yt−1, θ) . (3.2.3)

In most general cases, we do not have a closed form expression for p (yt | Yt−1, θ). We can therefore only

approximate it. In this paper, we use simulations to unbiased estimate each term on the right side of

(3.2.3). This is carried out using a particle filter, see Section 3.2.1. We then use one of the main results

of Andrieu et al. (2010). Their result states that when an unbiased estimated likelihood, p (YT | θ),is used inside a MCMC algorithm then the estimation error makes no difference to the equilibrium

distribution of the algorithm, the posterior distribution, p (θ | YT ) ∝ p (YT | θ) p (θ). Thus, using

(3.2.3) and the prior, p (θ), we can sample θ using Metropolis-Hastings (M-H), see equation (13) of

Andrieu et al. (2010)4. The general M-H algorithm follows the steps:

1. Initialize, start with a vector of parameters, θ(0), set i = 1.

2. Draw a candidate value, θ∗ ∼ q(θ | θ(i−1)

).

3. Obtain p (YT | θ∗) via θ∗ and a particle filter. Accept θ∗, p (YT | θ∗) with probability

aMH

(θ∗, θ(i−1)

)= min

1,p (YT | θ∗)p (θ∗) q

(θ(i−1) | θ∗

)p

(YT | θ(i−1)

)p(θ(i−1)

)q(θ∗ | θ(i−1)

) . (3.2.4)

4. If θ∗ is accepted, set θ(i) = θ∗, p(YT | θ(i)

)= p (YT | θ∗), else retain θ(i−1) and p

(YT | θ(i−1)

).

Set i = i+ 1 and repeat from Step 2.

The candidate density, q(θ | θ(i−1)

), can be chosen freely, though a density which is related to the

target density would lead to better acceptance rates. We start by using the Random Walk Metropolis-

Hastings algorithm, see Koop (2003). Thus, we generate θ∗ from q(θ | θ(i−1)

), a multivariate Normal

4We follow the same framework as Flury and Shephard (2011). Thus, we sample θ and not the pair (θ;α1:T ) as inAndrieu et al. (2010), which requires a very minor modification in the codes. Furthermore, there are no differencesin estimation results between these two approaches. However, the first framework is computationally little easier.

Page 64: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

55

density with expectation, θ(i−1), and a prespecified covariance matrix, Σq. We follow Koop (2003,

page 98) and adjust Σq to get acceptance rates roughly around 30 to 40%. Furthermore, to allow for

faster convergence and even better mixing, we follow So et al. (2005). Thus, we do the following: we

perform the Random Walk M-H algorithm for the first N1 of the total N PMMH iterations, form the

sample mean and covariance forθ(i)N1

i=1,(θ, Σ

). Then, we perform the following Independent Chain

M-H algorithm for the remaining N2 = N −N1 iterations. Hence,

1. At iteration i, where i ≥ (N1 + 1), generate θ∗ from N(θ, Σ

).

2. Obtain p (YT | θ∗) via θ∗ and a particle filter. Accept θ∗, p (YT | θ∗) with probability

aMH

(θ∗, θ(i−1)

)= min

1,p (YT | θ∗)p (θ∗) q

(θ(i−1)

)p

(YT | θ(i−1)

)p(θ(i−1)

)q (θ∗)

, (3.2.5)

where q(θ(i−1)

)∼ N

(θ, Σ

)evaluated at θ(i−1). Set i = i+ 1 and repeat from Step 1.

Finally, we specify the priors as: µ ∼ N (0, 1), (ρ+ 1) /2 ∼ Beta (20, 1.5) and σ2 ∼ IG (4/2, 0.02/2),

where IG (v0/2, s0/2) denotes the inverse Gamma density, see Kim and Nelson (1999).

Table 3.1.: A particle filter in Ox

funcParticleFilter(const vhpf, const vESS, const vlogpdf, const vparam, const vy,

const iN)

...

vw=ones(iN,1)*(1/iN);

vlogpdf[0][0]=0; // No likelihood contribution

[vh]=funcInParticles(vparam,iN); // Initial particles

for(i=1; i<it; i++)

vA=funcResample(vw’,iN); // Resample

vh=funcDrawP(vh[vA’],vparam,vy[i][0],iN); // Draw from q()

vE=vy[i]./(exp(0.5*vh));

vtau=exp(-0.5.*vh).*densn(vE); // Compute likelihood

vlogpdf[0][i][0]=log(meanc(vtau)); // Take logs

vw=vtau./sumc(vtau); // Normalize

if (ismissing(vw)) // Reset weights if missing

vw=ones(iN,1)/iN;

vhpf[0][i][0]=sumc(vw.*vh); // Store mean

vESS[0][i][0]=1/sumc(vw.^2); // Store ESS

if(vESS[0][i][0]<0.5*iN)

// If vESS[0][i][0]<0.5*iN resample

vA=funcResample(vw’,iN);

vh=funcDrawP(vh[vA’],vparam,vy[i][0],iN);

vw=ones(iN,1)*(1/iN);

return 1;

Page 65: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

56

3.2.1. A Particle filter

As stated in Section 3.2, we obtain p (YT | θ) by employing a particle filter. The particle filter is a

sequential simulation device for filtering of non-Gaussian, nonlinear state-space models. It can be

thought of as a generalization of the Kalman filter. Both the particle and Kalman filters produce

filtered estimates of α1:T and p (yt | Yt−1, θ) for t = 1, ..., T . In the Kalman filter case all these

quantities are exact, whereas in the particle filter case they are simulation-based estimates.

The main idea of the particle filter is to sample a cloud of particles, α(j)t , j = 1, ...,M , such that

they together describe the density of the state variable at time t conditional on Yt. At each t, we

propagate the particles, α(j)t , and update their associate weights. This way, we prevent accumulation

of errors by eliminating unpromising particles. In the following, we give a brief description of a very

general particle filter that we use throughout this paper. The reader is referred to Doucet et al. (2000)

and Creal (2012) for more details on particle filtering. Our particle filter scheme is as follows:

1. Set t = 1 and l0 = 0. Draw α(1)t , ..., α

(M)t from αt | θ.

2. Compute τ(j)t = p

(yt | α(j)

t , Yt−1, θ)

and w(j)t = τ

(j)t /

(ΣMk=1τ

(k)t

)for j = 1, ...,M .

3. Resampleα

(1)t , ..., α

(M)t

with probabilities w

(1)t , ..., w

(M)t . First, draw u ∼ U (0, 1). Let x(j) =

uM + (j − 1) /M for j = 1, ...,M , and find indices i(1), ..., i(M) such that Σi(j−1)

k=1 w(k)t < x(j) ≤

Σi(j)

k=1w(k)t . We refer to this step as the “ (multinomial) resampling step” for future references.

4. Sample α(j)t+1 ∼ αt+1 | α

(i(j))t , θ for j = 1, ...,M .

5. Compute lt (θ) = lt−1 (θ) + log(M−1ΣM

j=1τ(j)t

). Set t = t+ 1 and goto step 2.

From the particle filter, output of the filtered particles can be collected. We can use αt = ΣMj=1w

(j)t α

(j)t

as an estimate for E [αt | α1:t−1, Yt−1, θ]. An intuitive implementation of a very basic particle filter

used throughout this paper is provided in Table 3.1.

Del Moral (2004, Theorem 7.4.2) shows that E[

p (YT | θ) = exp (lT (θ))]

= p (YT | θ). Therefore,

since the particle filter provides us with an unbiased estimate of p (YT | θ), we can use the result

in Andrieu et al. (2010) and replace p (YT | θ) with p (YT | θ) inside a MCMC sampling scheme.

Thereafter, we sample θ, which has a lower dimension using M-H, see Flury and Shephard (2011).

Basic code for evaluating aMH

(θ∗, θ(i−1)

)itself could be implemented as in Table 3.2. The routine

funcMetropolisHastings() draws θ∗, evaluates aMH

(θ∗, θ(i−1)

)using (3.2.4) for i = 1, ..., N1 and

(3.2.5) for i = N1 + 1, ..., N . The function funcPosterior() evaluates p (θ∗ | YT ) as θ∗ is generated.

In order to evaluate p (θ∗ | YT ), we need to run the particle filter using θ∗. For i = 1, ..., N1, we specify

the random walk variance, Σq, as vtune. As stated before, in order to get reasonable acceptance

rate probabilities, we experiment with several values for vtune. For instance, for the Gaussian SV

model, we set: 4µ(i) = 0.2828ε(i)1 , 4ρ(i) = 0.01ε

(i)2 and 4σ2(i) = 0.01ε

(i)3 , where εk ∼ N (0, 1) for k =

1, ..., 3. Thereafter, we follow Section 3.2 and perform Independent Chain Metropolis-Hastings for the

remaining N2 draws. The denominator is evaluated using θ(i−1), p(θ(i−1)

)and p

(YT | θ(i−1)

), which

are available from the previous iteration. We complete the M-H step by drawing u from the standard

Uniform distribution, U (0, 1). If aMH

(θ∗, θ(i−1)

)≥ u, we set θ(i) = θ∗ and p

(YT | θ(i)

)= p (YT | θ∗),

else we retain θ(i−1) and p(YT | θ(i−1)

). Thereafter, we take another PMMH iteration and move along

the chain. After dropping a set of burn-in samples, the remaining draws are collected for inference.

Page 66: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

57

Table 3.2.: PMMH scheme in Ox

funcMetropolisHastings(const mprior, const vy, const vparamo, const vparamq,

const mcov, const dlogpdfo, const iN, const idumth)

...

// Draw the candidate, vparamp

[vparamp]=funcDrawparam(vparamq,mcov,idumth);

// Evaluate the posterior at the new (dnum) and old (dden) parameter value

[dnum,dden,dlogpdfp]=

funcPosterior(vparamp,vparamo,dlogpdfo,vy,mcov,mprior,iN);

idum=0;

vparamnew=vparamo; // vparamo is the old parameter value

dlogpdfnew=dlogpdfo; // dlogpdfo is the old likelihood value

// Set the new parameters equal to the old

// If dalpha>du they will be replaced

dalpha=min(1,exp(dnum-dden)); // Calculate aMH

du=ranu(1,1); // Draw a random uniform number

if (dalpha>du) // Accept if dalpha>du

vparamnew=vparamp;

dlogpdfnew=dlogpdfp;

idum=1; // Count that the draw was accepted

return vparamnew,dlogpdfnew,idum;

3.2.2. Bayes factors and marginal likelihood computation

The main output from the particle filter is the log-likelihood contribution of yt with αt integrated out.

The sum of the log-likelihood contributions delivers the estimated log-likelihood of the data with the

unobserved states integrated out, p (YT | θ). This quantity can then be used to compute the marginal

likelihood (ML) for model M. ML is a measure of the success the model has in accounting for the

data after the parameter uncertainty has been integrated out over the prior. It is defined as

p (YT | M) =

ˆΘp (YT | θ,M) p (θ | M) dθ.

In the following steps, the model index, M, is suppressed for conciseness. Gelfand and Dey (1994)

propose a very compelling and general method to calculate ML. It is efficient and utilizes the same

routines when calculating ML for different models. The Gelfand-Dey (G-D) estimate of the marginal

likelihood is based on

1

N

N∑i=1

g(θ(i))

p(YT | θ(i)

)p(θ(i)) → p (YT )−1 as N →∞, (3.2.6)

Page 67: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

58

whereas before, N is the number of PMMH iterations. G-D show that if g(θ(i))

is thin-tailed relative to

p(YT | θ(i)

)p(θ(i))

then (3.2.6) is bounded and the estimator is consistent. Following Geweke (2005),

a (truncated) Normal distribution, N (θ∗,Σ∗), is used for g (θ). θ∗ and Σ∗ are the posterior sample

moments calculated as

θ∗ =1

N

N∑i=1

θ(i) and Σ∗ =1

N

N∑i=1

(θ(i) − θ∗

)(θ(i) − θ∗

)′whenever θ(i) is in the domain of the truncated Normal. The domain, Θ, is defined as

Θ =

θ :(θ(i) − θ∗

)′(Σ∗)−1

(θ(i) − θ∗

)≤ χ2

a (z)

,

where z is the dimension of the parameter vector and χ2a (z) is the ath percentile of the Chi-squared

distribution with z degrees of freedom. In practice, 0.75, 0.95 and 0.99 are popular selections for a.

Once the marginal likelihood for different specifications has been calculated, we can compare them

using Bayes factors, BF. The relative evidence for MA versus MB is

BFMAB=

p (y1, ..., yT | MA)

p (y1, ..., yT | MB).

Kass and Raftery (1995) recommend considering 2 logBFMABfor model comparison. They suggest a

rule-of-thumb of support forMA based on 2 logBFMAB: 0 to 2 not worth more than a bare mention,

2 to 6 positive, 6 to 10 strong, and greater than 10 as very strong.

3.3. Stochastic Volatility Models

In this section, we estimate the standard stochastic volatility (SV) model along with different exten-

sions. The first of these extensions, we label the SVt model. Thus, εt ∼ St (v), where St stands for

the Student-t distribution with v > 2 degrees of freedom. In the second extension, we incorporate

a leverage effect by letting ϕ denote the correlation coefficient between εt and ηt. We shall refer to

this model as SVL. Notice that in both cases only small adjustments are needed in the codes. With

regards to the SVt model, we can follow Bollerslev (1987) and set

p (yt | αt, Yt−1, θ) =Γ(v+1

2

)Γ(v2

)√(v − 2)π

1

σt

(1 +

y2t

(v − 2)σ2t

)−(v+1)/2

.

We then follow the sampling steps as before. On the other hand, if we were to use “pure” Gibbs

sampling to estimate the SVt model, we would be forced to convert the model into a conditionally

Gaussian state-space model by letting εt = λ−1/2t et, where et ∼ N (0, 1) and λt ∼ G (v/2, v/2).

We would then follow the steps in Chib et al. (2002) and sample from the augmented posterior,

p (θ, v, α1:T , z1, ..., zT , λ1, ..., λT | YT ), whereas before, zt serves as the mixture component indicator.

For the SVL model, we need only to rewrite (3.2.2) as

αt+1 = µ+ ρ (αt − µ) + σϕyt exp (−αt/2) + σ√

(1− ϕ2)ξt.

Page 68: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

59

Here, we use that ηt = ϕεt +√

(1− ϕ2)ξt, where ξt ∼ N (0, 1), and we need to sample an additional

parameter, ϕ. We choose the same priors as in Section 3.2 for µ, ρ and σ2. With regards to the

additional parameters, we let v ∼ Exp (0.2), where Exp denotes the Exponential distribution, and we

let ϕ ∼ TN]−1,1[ (0, 1), where TN refers to the truncated Normal distribution on the domain [., .]. The

prior on ϕ assumes that ϕ lies between −1 and 1. Furthermore, we ensure that∣∣ϕ(i)

∣∣ < 1(v(i) > 2

)by resampling ϕ(i)

(v(i))

until∣∣ϕ(i)

∣∣ < 1(v(i) > 2

).

We can also expand the plain stochastic volatility model by allowing the errors in the measurement

equation to follow a moving average (MA) process of order m, see Chan (2013). This means that the

errors in the measurement equation are no longer serially independent as for the plain SV model. In

this paper we choose a simpler specification and set m = 1. Hence, our model is given as

yt = eαt/2εt + ψ1eαt−1/2εt−1, εt ∼ N (0, 1) ,

where αt follows (3.2.2). As before, we impose that |ρ| < 1 and α1 ∼ N(µ, σ2/

(1− ρ2

)). We also

ensure that the root of the characteristic polynomial associated with the MA coefficient, ψ1, is outside

the unit circle. Notice that under the standard stochastic volatility model, the conditional variance of

yt is simply eαt . However, under the moving average variant, the conditional variance of yt is given by

eαt +ψ21eαt−1 , see Chan (2013). Thus, the conditional variance for SV-MA(1) is time-varying through

two channels: the moving average term of eαt and αt which evolves according to (3.2.2). Estimating

this model is straightforward as we again only need to make small adjustments in the codes. With

regards to ψ1, we let p (ψ1) ∼ TN]−1,1[ (0, 1).

The flexibility of PMCMC, specifically PMMH can be used to model other attractive specifications

of the stochastic volatility model. For instance, consider the stochastic volatility in mean (SVM)

model of Koopman and Hol Uspensky (2002), where eαt/2 appears in both the conditional mean and

the conditional volatility. We define the SVM model as

yt = λ exp (αt/2) + exp (αt/2) εt, εt ∼ N (0, 1) ,

where αt follows (3.2.2). Estimation of this extension is nontrivial using “pure” Gibbs sampling.

This is because drawing α1:T ∼ p (α1:T | θ, λ, YT ) is computationally more demanding since the model

cannot be written in linear state-space form. However, within the PMMH context, estimating the

SVM model is straightforward. In fact, we note that p (yt | αt, Yt−1, θ, λ) ∼ N (λ exp (αt/2) , exp (αt)).

Incorporating this specification is very easy in the particle filter. We only need to modify step 2 of the

algorithm. Thus, we use τ(j)t = N

(λ exp

(j)t /2

), exp

(j)t

)), j = 1, ...,M . Furthermore, we sample

an additional parameter in the M-H step, namely, λ, where p (λ) ∼ N (0, 1). We present simulation

results for SV-MA(1) and SVM in the Appendix. Finally, we estimate a two factor SV model, TFSV.

yt = exp

2+αt + α2t

2

)εt, εt ∼ N (0, 1)

αt+1 = ραt + σηt, ηt ∼ N (0, 1)

α2t+1 = ρ2α2t + σ2η2t, η2t ∼ N (0, 1)

|ρ| < 1 , α1 ∼ N(0, σ2/

(1− ρ2

))|ρ2| < 1 , α21 ∼ N

(0, σ2

2/(1− ρ2

2

)).

Page 69: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

60

Estimating the two factor SV model using PMMH is straightforward. First, we collect all the pa-

rameters in θ =(µ, ρ, ρ2, σ

2, σ22

)′. Then, we only need to modify our particle filter such that we

draw two sets of particles (one for αt and one for α2t) instead of one. This part is also very easy

and does not cost anything in terms of computation. We specify the random walk increments in the

M-H part as 4µ(i)= 0.3162ε

(i)1 , 4σ2(i) = 0.01ε

(i)2 and 4σ2(i)

2 = 0.03ε(i)3 . With regards to sampling ρ

and ρ2, we follow Fouque et al. (2010). First, we put the restriction ρ(i) > ρ(i)2 . This restriction is

needed for identification. We then draw ρ and ρ2 from the truncated Normal density. For instance,

ρ∗ ∼ TN]−1,1[

(b, B−1

), where b = (1/B) ΣT

t=1α(i−1)

t α(i−1)

t−1 and B = ΣTt=1α

2(i−1)

t−15. Simultaneous with ρ∗

and ρ∗2, we generate µ∗, σ2∗ and σ2∗2 . Thereafter, we accept or reject θ∗. We specify the priors as

µ ∼ N (0, 1) ,

ρ ∼ TN]−1,1[ (0, 1) , ρ2 ∼ TN]−1,1[ (0, 1)

σ2 ∼ IG(

4

2,0.02

2

), σ2

2 ∼ IG(

4

2,0.02

2

).

The top panel of Figure 3.1 displays the daily OMX Copenhagen 20 (OMXC20) index for the period

1/2/2006-12/30/2010, followed by the daily returns and the filtered estimates of σt = exp (αt/2),

t = 1, ..., T for SVL. From these figures strong differences in return and volatility are immediately

apparent6. For instance, the top left panel shows a sharp decrease in the OMXC20 index towards the

end of 2008. At the same time, the third panel in the top row of Figure 3.1 shows an increase in σt.

We set M = 1000, and run our samplers for N = 20000 M-H iterations7. After discarding the first

10000 iterations, we collect the final sample and compute: the posterior mean of θ, θ , 95% credibility

intervals (indicated inside the brackets), the inefficiency measures, RB, the log-likelihood that results

from the particle filter, the logarithm of the marginal likelihood, log(ML), for a = 0.75, a = 0.99 and

the M-H acceptance ratio. The inefficiency measures display the relative variance of the posterior

sample draws when adapting for correlation between iterations, as compared to the variance without

accounting for correlation. In these calculations, we follow Bos (2011), choosing a bandwidth, B, of

100, see Kim et al. (1998).

Results are summarized in Table 3.3. Overall, we find that the Gaussian SVL model performs best in

terms of the marginal likelihood criterion. The 2logBF of SVL versus SV is 7.8. This indicates strong

evidence in favor of the SVL model8. Compared to SVt, the 2logBF in favor of SVL is 12.8, which

is also very strong evidence. The values of ρ close to one confirm strong daily volatility persistence,

in accordance with typical estimates reported in the literature. Notice that the persistence increases

slightly as the fat-tailed error distribution is introduced (ρ = 0.9810 and ρ = 0.9852 for SV and

SVt, respectively), and drops from ρ = 0.9810 for SV to ρ = 0.9777 for SVL. In the SVt model, the

distribution of the degrees of freedom parameter is centered around 14.20 with a standard deviation of

2.28. The posterior mean of ϕ for the SVL model is −0.22, and negative as expected. We also find that

5We could also choose a Beta (., .) prior for ρ and ρ2. However, we find that ρ2 is very sensitive with regards to thehyperparameter values of Beta (., .). Therefore, we choose a more uninformative prior, i.e. TN]−1,1[ (0, 1).

6OMX Copenhagen 20 (OMXC20) is the Copenhagen Stock Exchange’s leading share index. The index consists of the20 most actively traded shares on the Copenhagen Stock Exchange.

7We experiment with different values of M to find out its effect by examining the corresponding Markov chain, see theAppendix. We can also set M to obtain a specified level of the variance of p (YT | θ) for a given θ.

8An advantage of using Bayes factors is that they automatically include Occam’s razor in that they penalize highlyparametrized models that do not deliver improved content.

Page 70: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

61

Tab

le3.

3.:

Est

imat

ion

resu

lts

ofS

Vm

od

els

SV

SV

tSV

LS

V-M

A(1

)SV

MT

FSV

Para

met

erθ

(RB

(RB

(RB

(RB

(RB

(RB

)

µ0.3

526

(7.5

4)0.2

562

(7.3

8)0.

4221

(6.8

9)0.

3779

(7.7

8)0.

3754

(7.4

3)0.

3598

(6.2

3)[0

.111

2,0

.589

5]

[-0.

0482

,0.5

508]

[0.2

046,

0.64

36]

[0.0

831,

0.67

03]

[0.0

685,

0.67

08]

[0.0

821,

0.63

32]

ρ0.

9810

(8.2

7)0.

9852

(5.3

3)0.

9777

(6.9

7)0.

9817

(8.8

6)0.

9805

(9.6

3)0.

9776

(7.1

3)[0

.9743

,0.9

887

][0

.978

3,0.

9914

][0

.971

6,0.

9838

][0

.973

0,0.

9893

][0

.972

7,0.

9882

][0

.967

1,0.

9881

20.

0458

(10.

66)

[-0.

0642

,0.1

576]

σ2

0.0296

(9.1

3)0.

0221

(6.1

1)0.

0319

(7.3

6)0.

0285

(7.9

7)0.

0311

(9.1

6)0.

0373

(6.5

9)[0

.0199

,0.0

394

][0

.014

9,0.

0297

][0

.023

1,0.

0401

][0

.018

3,0.

0395

][0

.020

3,0.

0421

][0

.023

6,0.

0512

2 20.

0981

(6.3

1)[0

.027

6,0.

1728

-0.2

216

(7.0

4)[-

0.32

02,-

0.12

26]

ψ1

0.00

45(7

.63)

[-0.

0296

,0.0

376]

λ0.

0563

(6.3

7)[0

.022

3,0.

0892

]v

14.2

019

(8.0

6)[9

.959

1,18

.871

2]

log(

L)

-211

3.6

-211

4.2

-210

6.1

-211

3.4

-211

2.1

-211

3.2

log(

ML

),-2

125.

4-2

127.

9-2

121.

5-2

129.

1-2

128.

0-2

186.

3a

=0.

75

log(

ML

),-2

125.

1-2

127.

6-2

121.

2-2

129.

1-2

127.

7-2

186.

1a

=0.

99

M-H

rati

o0.

350.

360.

350.

330.

330.

34

This

table

rep

ort

ses

tim

ati

on

resu

lts

for

diff

eren

tst

och

ast

icvola

tility

model

spec

ifica

tions.RB

isin

dic

ate

din

side

the

pare

nth

eses

.lo

g(L

):lo

g-l

ikel

ihood,

log(M

L):

log-m

arg

inal

likel

ihood

usi

ng

the

corr

esp

ondin

gva

lue

ofa.

M-H

rati

o:

Met

rop

olis-

Hast

ings

acc

epta

nce

rati

o.

Page 71: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

62

ψ1 = 0.0045 for the SV-MA(1) model. Compared to SV-MA(1), the 2logBF in favor of SVL is 15.8.

On the other hand, we estimate λ at 0.0563 with a standard deviation of 0.0168. However, compared

to SV or SVL, SVM does not offer any improvements in terms of ML. Finally, TFSV performs worst

in terms of ML. The estimates of µ, ρ and σ2 are relatively close to those for the plain SV model. On

the other hand, the estimates of ρ2 and σ22 show that the second factor, α2t, is very close to being a

white noise process.

We report the Markov chain output for µ | YT , ρ | YT , σ2 | YT , ϕ | YT along with posterior density

and the evolution of ACF for these parameters in Figure 3.1. The chain mixes well with relatively fast

decaying autocorrelation functions.

Figure 3.1.: Estimation results, Gaussian SVL model

OMXC20

2006 2008 2010 2012200

400

600OMXC20 Daily returns

2006 2008 2010 2012−20

0

20Daily returns E(σt|y1,...,yT)

2006 2008 2010 20120

2

4

6E(σt|y1,...,yT)

Loglike (PF)

0 2500 5000 750010000−2175

−2150

−2125

−2100

Loglike (PF)

µ |YT

0 2500 5000 750010000−0.5

0.5

1.5µ |YT

ρ |YT

0 2500 5000 7500100000.90

0.95

1.00

ρ |YT

σ2|YT

0 2500 5000 7500100000.00

0.04

0.08σ2|YT ψ |YT

0 2500 5000 750010000−0.5

0.0

0.5ψ |YT

µ |YT

−0.5 0.5 1.50

1

2

3 µ |YT ρ |YT

0.90 0.95 1.000

50

100

150ρ |YT σ2|YT

0.00 0.04 0.08

25

50

75σ2|YT ψ |YT

−0.5 0.0 0.5

2.5

5.0

7.5ψ |YT

µ

0 100 200 300 400 500−1

0

1µ ρ

0 100 200 300 400 500−1

0

1ρ σ2

0 100 200 300 400 500−1

0

1σ2 ψ

0 100 200 300 400 500−1

0

Markov chain, posterior density and autocorrelation function of the parameters for the Gaussian SVL model. Notice thatthe first 10000 iterations are considered as the burn-in period and are therefore discarded. Note: for graphical output,the professional version of Ox is needed. Alternatively, the updated GnuDraw package of Bos (2013) can be used.

3.4. Unobserved Components Model of US Inflation

In this section, we reconsider the unobserved components model of Stock and Watson (2007), and

provide a computational framework for this model using PMMH. This model provides a simple but yet

sufficient framework for discussing the main stylized facts concerning inflation. Specifically, the model

postulates the decomposition of observed inflation into two components: the regular component, which

captures the trend in inflation, and the irregular component, which captures the deviations of inflation

from its trend value. We start from a specification where components are driven by disturbances whose

variance are constant over time. Thereafter, we consider specifications in which the components are

driven by disturbances whose variance evolves over time according to a stationary stochastic volatility

Page 72: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

63

process. Finally, we carry out systematic model selection by comparing the marginal likelihoods

implied by the different models of inflation volatility.

We focus on quarterly inflation rates constructed from the seasonally adjusted consumer price index,

made available by FRED (Federal Reserve Economic Data). We denote the quarterly series by CPIt.

The annualized quarterly inflation rate, yt, t = 1, ..., T is computed as yt = 400 ln(

CPItCPIt−1

). For the

analysis, we use data from 1952:q1-2013:q1. In the following, the most general specification of the

unobserved component (UC) model is defined as

yt = αt + εt, εt ∼ N(0, σ2

ε

)(3.4.1)

αt+1 = αt + ηt, ηt ∼ N(0, σ2

η

), (3.4.2)

see Grassi and Proietti (2010). This model contains two parameters, θ =(σ2ε , σ

)′along with a vector

of the unobserved states, α1:T . We let p(σ2ε

)∼ IG (4/2, 0.02/2) and p

(σ2η

)∼ IG (4/2, 0.02/2). We

also provide extensions of (3.4.1) and (3.4.2) by incorporating stochastic volatility effects in σ2ε , or

both in σ2ε and σ2

η. First, let εt ∼ N(0, eh1t

), where

h1t+1 = µ1 + ρ1 (h1t − µ1) + σ1η1t, η1t ∼ N (0, 1) .

Hence, we add a second unobserved state which describes the evolution of the volatility to the irregular

component of inflation. We shall refer to this model as UC-SVm. Finally, we add a third unobserved

state which describes the volatility of αt, i.e. ηt ∼ N(0, eh2t

). Henceforth, we refer to this model

as UC-SV9. UC-SV and UC-SVm both have a special structure. For instance, for the UC-SV model,

conditional on hkt, k = 1, 2, the remaining model is a linear Gaussian state-space model where α1:T

can be integrated out analytically using the Kalman filter. This is known as Rao-Blackwellization

in the literature because it is an implication of the Rao-Blackwell theorem, see Robert and Casella

(2004). When this is possible, the state vectors can be separated. Particles are only simulated for

h(j)kt , k = 1, 2, and conditional on these αt, t = 1, ..., T can be integrated out analytically using the

prediction and updating steps of the Kalman filter.

In the PMMH procedure, we modify our particle filter using the approach of Creal (2012). There-

after, we sample θ in one block. On the other hand, estimating UC-SV using “pure” Gibbs sampling

would require more programing effort. For instance, using a Gibbs sampling approach, we would

proceed by cycling through the following steps

1. α1:T | h1,1:T , h2,1:T , YT .

2. h1,1:T | µ1, ρ1, σ21, α1:T , z1,1:T , YT , where hk,1:T = hk1, ..., hkT .

3. h2,1:T | µ2, ρ2, σ22, α1:T , z2,1:T .

4. z1,1:T | h1,1:T , α1:T , YT .

5. z2,1:T | h2,1:T , α1:T , see Kim et al. (1998).

6. Finally, we sample θ element-by-element, see Kim et al. (1998), Grassi and Proietti (2010).

9The specification of the stochastic volatility processes differs only slightly from Stock and Watson (2007) who assumea random walk process for hkt, k = 1, 2. In this paper we follow Grassi and Proietti (2010).

Page 73: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

64

Below, we provide the steps of the modified particle filter for the UC-SV model, see Creal (2012) for

further background on Rao-Blackwellization.

1. For t = 1, draw α(1)1 , ..., α

(M)1 , P

(1)1 , ..., P

(M)1 , where Pt is the covariance of αt and is obtained

from the Kalman filter, see Kim and Nelson (1999). Draw h(1)k1 , ..., h

(M)k1 for k = 1, 2 and set

τ(1)1 , ..., τ

(M)1 = 1/M .

2. for t = 2, ..., T , use the prediction step of the Kalman filter. Thereafter, obtain prediction errors

and variances,v

(j)t , f

(j)t

Mj=1

.

3. Compute τ(j)t ∼ N

(v

(j)t , f

(j)t

)and normalize w

(j)t = τ

(j)t /ΣM

k=1τ(k)t .

4. ResampleM particlesα

(j)t , P

(j)t , h

(j)1t , h

(j)2t

Mj=1

with probabilitiesw

(j)t

Mj=1

and set w(j)t = 1/M .

5. Draw h(1)kt , ..., h

(M)kt , k = 1, 2, and run the updating step of the Kalman filter on each of these

particles to obtainα

(j)t , P

(j)t

Mj=1

.

We compare the performance of UC, UC-SVm and UC-SV using the marginal likelihoods. Results

reported in Table 3.4 point that the UC-SV model performs best.

Table 3.4.: Estimation results of unobserved components (UC) models

UC UC-SVm UC-SVParameter θ RB θ RB θ RBµ1 0.0602 4.08 -0.0764 8.47

[-0.4104, 0.5197] [-0.3733,0.2223]µ2 -0.5886 4.95

[-0.9465,-0.2391]ρ1 0.9456 4.54 0.9541 7.84

[0.9067,0.9792] [0.9278,0.9779]ρ2 0.9815 7.21

[0.9676,0.9938]σ2ε 3.2280 5.09

[2.9026,3.5680]σ2

1 0.1153 3.93 0.0844 8.97[0.0408,0.2215] [0.0383,0.1406]

σ22 2.0021 4.57 0.5090 3.59 0.0141 7.75

[1.2772,2.7866] [0.3449,0.6826] [0.0051,0.0255]

log(L) -592.28 -469.35 -465.07log(ML), a = 0.75 -638.30 -497.77 -484.86log(ML), a = 0.99 -638.03 -497.50 -484.58M-H ratio 0.42 0.48 0.40

This table reports estimation results for different UC models. log(L): log-likelihood, log(ML): log-marginal likelihoodusing the corresponding value of a. M-H ratio: Metropolis-Hastings acceptance ratio. Note σ2

2 = σ2η for UC and UC-SVm.

The filtered estimate of α1:T and the filtered estimates of the volatilities are all available from the

particle filter. They are pictured in Figure 3.2, together with yt. These estimates confirm largely the

results of Stock and Watson (2007) and Grassi and Proietti (2010). The volatility of the irregular

component, exp (h1t/2), increases during the high periods of inflation in the 1970s, while the volatility

Page 74: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

65

of the regular component, exp (h2t/2), is relatively more stable. Specifically, exp (h2t/2) has been

decreasing substantially after 1982. The decrease in exp (hkt/2), k = 1, 2, since the early 1980s

and throughout the 1990s has been documented in a range of studies. It is often labeled as “the

Great Moderation”. Furthermore, exp (h1t/2) shows that the increase in volatility of the inflation rate

during the last recession is mainly concentrated in the irregular component. Finally, we find that

incorporating the steps in So et al. (2005) contribute to reducing RB significantly. Specifically, RB for

each parameter drops by a factor of 2 to 3. For instance, compare PMMH results of this section with

PG results of Section 3.6, where θ is sampled element-by-element using standard Gibbs techniques.

Figure 3.2.: Estimation results, UC-SV model

Inflation Filtered trend

1952 1962 1972 1982 1992 2002 2012−16

−6

4

14

24Inflation Filtered trend 1st difference of inflation

1952 1962 1972 1982 1992 2002 2012−16

−6

4

14

241st difference of inflation

Irregular volatility

1952 1962 1972 1982 1992 2002 20120

2

4

6

8Irregular volatility Regular volatility

1952 1962 1972 1982 1992 2002 20120.0

0.4

0.8

1.2

1.6Regular volatility

Top panels: inflation, filtered estimates of the trend inflation and the 1st difference of inflation. Bottom panels: volatilityof the irregular and regular components of inflation.

3.5. Long Memory with Stochastic Volatility

In this section, we model changing time series characteristics of US inflation rate by considering a

heteroskedastic ARFIMA model similar to Bos et al. (2012). However, Bos et al. (2012) apply

maximum likelihood methods, whereas we use PMCMC. To the best of our knowledge, no work has

been done on estimating long-memory models with SV effects using PMMH. Our ARFIMA(p,d,q)-SV

model for the time series, yt, is given by

Φ (L) (1− L)d (yt − τt) = Ψ (L)σεtεt, εt ∼ N (0, 1) (3.5.1)

σεt = exp (αt/2)

αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (3.5.2)

Page 75: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

66

where Φ (L) = (1− φ1L− ...− φpLp), Ψ (L) = (1 + ψ1L+ ...+ ψqLq) are autoregressive (AR), moving

average (MA) polynomials in the lag operator, L, with Lkyt = yt−k for k = 0, 1, ... and integer

orders p ≥ 0, q ≥ 0. Initially, the disturbance term, εt, is normally and identically distributed

with expectation 0 and variance 1. However, we also consider a case where εt follows a heavy-tailed

distribution, see Section 3.5.2. The fractional difference operator, (1− L)d, with d ∈ R is given by

(1− L)d =∞∑j=0

(d

j

)(−L)j .

Here, we assume that −1 < d < 0.5. We further assume that the equations Ψ (x) = 0 and Φ (x) = 0

for unknown x do not have common roots. In the standard ARFIMA model, the means and variances

are constant through time, that is τt = τ and σ2εt = σ2

ε for t = 1, ..., T . In this paper we assume

specific time-varying functions for τt and σ2εt. We specify τt as a regression model to capture seasonal

variation in the time series through autoregressive coefficients. Hence, we set τt = X′tβ, where Xt is

a m× 1 vector of lagged values of yt, and β is a m× 1 vector of unknown autoregressive coefficients.

As stated before, we model σεt in (3.5.1) as a SV process. Hence, we let σεt = exp (αt/2), where αt

follows (3.5.2).

3.5.1. Monte Carlo

We conduct a set of Monte Carlo simulations to judge the performance of PMMH for the ARFIMA-

SV model. We successively generate data from the ARFIMA-SV model, re-estimate and compare the

estimated parameters with the parameters under the data generating process, DGP.

We focus on the ARFIMA(p,d,0 )-SV model and start with a data series of T = 500 observa-

tions which is in line with the actual data in the empirical application. As we are mostly inter-

ested in the ability of PMMH to estimate the persistence parameter of inflation, d, we focus on the

ARFIMA(1,d,0 )-SV model. We follow Bos et al. (2012), and set the true DGP parameters for yt as:

d = 0.3 and φ = 0.4. The mean of the stochastic volatility sequence is set to µ = −1.2. The conditional

variance of the volatility of yt is set to σ2 = 0.05 and the persistence parameter of volatility, ρ, is set

to 0.97.

For each DGP, we first generate αt according to SV dynamics with α1 ∼ N(µ, σ2/

(1− ρ2

)).

Thereafter, we define εt ∼ N(0, σ2

εt

), and generate yt through ARFIMA dynamics using d and φ.

For each of the R = 100 Monte Carlo iterations, we estimate the ARFIMA-SV model using 10000

M-H iterations with a burn-in of 1000, M = 1000, conditional on the following priors for the model

parameters

d ∼ TN]−1,0.5[ (0, 1) , φ ∼ TN]−1,1[ (0, 1)

µ ∼ N (0, 1) ,ρ+ 1

2∼ Beta (20, 1.5)

σ2 ∼ IG(

4

2,0.02

2

).

Given a full run, we calculate the mean, median and mode of the posterior draws, θ(i), i = 1, ...N . We

then take the mean of these quantities over R. Finally, we also consider the root mean squared error

Page 76: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

67

(RMSE) for each parameter defined as

RMSE =

√√√√ 1

R

R∑h=1

(1

N

N∑i=1

θ(i)h − θ

)2

.

θ(i)h is the ith posterior draw of the hth Monte Carlo iteration and θ is the vector of the true DGP

parameters. Results are summarized in Table 3.5. The mean, median and mode of d, φ, µ , σ2 and

d+φ are close to their values corresponding to the true DGP parameters. As we increase T , we obtain

more precise estimates of the parameters. Compared to T = 500, the RMSE of each parameter drops

on average by 10 − 40% for T = 1000. From the simulations it is very clear that d and φ are very

related. The correlation between d and φ is on average −0.75. This shows that distinguishing between

short-run and long-run correlation, as modeled by either d and φ, is inherently difficult, see Sowell

(1992) and Bos et al. (2012). We repeat this simulation with T = 200 to see if a relatively smaller

sample size leads to more disperse estimates. Still, we find that d + φ is well identified. On average,

d + φ = 0.68 with similar results for median and mode. However, we find that it becomes relatively

harder to handle d and φ separately. Furthermore, we are able to get decent estimates of µ and ρ,

while σ2 displays high variability between simulations. Compared to T = 500, the RMSE of σ2 is 0.03.

Table 3.5.: Monte Carlo evidence for ARFIMA-SV

Parameter true value mean median mode RMSE

T = 200d 0.30 0.2111 0.2245 0.2011 0.1590φ 0.40 0.4660 0.4578 0.4653 0.1511µ -1.20 -0.8008 -0.8551 -0.6575 0.6531ρ 0.97 0.9522 0.9601 0.9639 0.0528σ2 0.05 0.0223 0.0183 0.0123 0.0308d+ φ 0.70 0.6771 0.6772 0.6664 0.0796

T = 500d 0.30 0.2675 0.2751 0.2403 0.1017φ 0.40 0.4403 0.4334 0.4570 0.1122µ -1.20 -0.9761 -1.0248 -0.9559 0.4214ρ 0.97 0.9715 0.9742 0.9761 0.0123σ2 0.05 0.0356 0.0322 0.0259 0.0210d+ φ 0.70 0.7077 0.7079 0.7035 0.0444

T = 1000d 0.30 0.2906 0.2931 0.2823 0.0675φ 0.40 0.4136 0.4114 0.4202 0.0862µ -1.20 -1.1917 -1.1166 -1.1121 0.2502ρ 0.97 0.9736 0.9749 0.9760 0.0107σ2 0.05 0.0412 0.0388 0.0341 0.0163d+ φ 0.70 0.7043 0.7047 0.7029 0.0378

This table reports the true values of the DGP together with mean, median, mode

and root-mean-squared error (RMSE) of the estimated parameters for the Monte

Carlo simulations, generated from ARFIMA(1,d,0)-SV with model parameters as

indicated using samples of T = 200, T = 500 and T = 1000.

Page 77: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

68

Table 3.6.: Model comparison by marginal likelihood

G-D(a = 0.75)

G-D(a = 0.95)

G-D(a = 0.99)

T = 200M1 M2 M1 M2 M1 M2

0 100 0 100 0 100(28.07) (28.16) (28.09)

T = 500M1 M2 M1 M2 M1 M2

0 100 0 100 0 100(67.72) (67.75) (67.69)

T = 1000M1 M2 M1 M2 M1 M2

0 100 0 100 0 100(142.11) (142.17) (142.03)

This table reports the number of times in the repetitions when each model

obtains the highest ML. In each case, the data is generated according to

ARFIMA(1,d,0)-SV. M1 : ARFIMA(1,d,0),M2 : ARFIMA(1,d,0)-SV. a

is the ath percentile of the Chi-squared distribution used to calculate ML.

The numbers in the parentheses indicate the average logBF in favor of

ARFIMA(1,d,0)-SV over the number of repetitions.

Finally, we also investigate the ability of G-D to identify the true model. Therefore, for each Monte

Carlo iteration, we also estimate a homoscedastic ARFIMA(1,d,0) model using “pure” Gibbs sampling

and compare the MLs10. We report the frequency over repetitions in which each specification is best

according to ML using a = 0.75, 0.95 and 0.99. Results are summarized in Table 3.6.

Overall, we see that G-D performs very well. For each T , G-D identifies ARFIMA-SV as the best

performing model 100 out of 100 times. The numbers inside the parentheses report the average logBF

of ARFIMA-SV versus ARFIMA over the 100 Monte Carlo iterations for each a. These numbers

are fairly similar across different values of a. As expected, the logBFs clearly indicate a better fit of

ARFIMA-SV compared to ARFIMA.

3.5.2. US core inflation

We apply our model to a monthly time series of inflation, using the US City Average core consumer

price index of the Bureau of Labor Statistics (BLS). This series, which is labeled CUUR0000SA0L1E,

excludes the direct effects of price changes for food and energy. We denote the series by Pt and use

data from 1957:1 until 2013:5 for a total of 676 months11. Following Bos et al. (2012), we construct the

monthly US core inflation as pt = 100 log (Pt/Pt−1). To adapt for part of the seasonality in the series,

we regress the inflation on a series of seasonal dummies, D, as in p = Dβ + u. Instead of using the

original inflation, pt, we use yt = ut+p, where ut is the residual of adapting the inflation for the major

10Throughout this paper, we estimate the homoscedastic ARFIMA specifications using “pure” Gibbs sampling.11Our sample is considerably longer than the sample used in Bos et al. (2012). We choose the longer sample period,

1957:1-2013:5, mainly because it provides us with what we think is a sufficient number of observations for each window(265 months) with regards to forecasting and recursive analysis.

Page 78: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

69

seasonal effects at time t, and p is the average inflation level, Bos et al. (2012). Figure 3.3 shows time

series plots of the price index, Pt, yt, the sample autocorrelation functions of yt and changes in yt, ∆yt.

Clearly, time variation in the mean and volatility are apparent from the time series plots of yt. The

autocorrelation function shows strong persistence and seasonal patterns even after regressing on the

seasonal dummies. Therefore, we also include seasonal effects in our models through AR coefficients.

As previously mentioned, yt exhibits time-variation in its conditional mean and volatility. Intuitively,

the 1970s show the highest level, volatility, persistent in the mean and volatility of inflation. The

reduction in the volatility of inflation since the mid 1980s is very noticeable. There are also outliers

in yt, for example 1980:7. We use a dummy variable for this month in the estimation procedure, Bos

et al. (2012).

Figure 3.3.: Time series characteristics of the US core inflation rate

1957 1969 1981 1993 20050

80

160

240

320(a)

1957 1969 1981 1993 2005−0.5

0.0

0.5

1.0

1.5(b)

0 20 40 60 80 100

0.25

0.50

0.75

1.00(c)

0 20 40 60 80 100−1.0

−0.5

0.0

0.5

1.0(d)

Panel (a): monthly time series of US core price index, Pt (1982-84=100), (b): monthly core inflation adjusted for fixedseasonals, yt, (c): sample autocorrelation function of yt (100 lags), (d): sample autocorrelation function of 4yt.

Besides ARFIMA-SV, we also provide an extension by assuming that εt follows a heavy-tailed distribu-

tion. We model this feature by assuming that εt ∼ St (v), whereas before, St stands for the Student-t

distribution with v > 2 degrees of freedom. We label this model as ARFIMA-SVt. As before, we

set M = 1000, and run the sampler for N = 20000 M-H iterations. After discarding the first 10000

iterations, we collect the final sample and compute: the posterior mean of θ, 95% credibility intervals,

RBs, the log-likelihood that results from the particle filter, the logarithm of the marginal likelihood,

log(ML) for a = 0.75, a = 0.99, and the M-H acceptance ratio. Table 3.7 presents estimation results

for: a homoscedastic ARFIMA model, ARFIMA-SV and ARFIMA-SVt. We use the same priors as

in Section 3.5.1 for d, µ, ρ and σ2. With regards to the additional parameter in the ARFIMA-SVt

Page 79: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

70

Table 3.7.: Estimation results of ARFIMA models

ARFIMA ARFIMA-SV ARFIMA-SVtParameter θ RB θ RB θ RBd 0.3799 8.14 0.2931 7.04 0.2946 6.28

[0.3209,0.4407] [0.2549,0.3297] [0.2631,0.3251]φ11 0.1305 22.32 0.1446 5.04 0.1459 6.51

[0.0655,0.1980] [0.1064,0.1810] [0.1157,0.1770]φ12 0.2542 21.70 0.3791 5.65 0.3747 5.83

[0.1861,0.3213] [0.3407,0.4172] [0.3447,0.4060β0 0.1639 7.12 0.2252 5.63 0.2169 6.83

[0.0278,0.2949] [0.1409,0.3101] [0.1443,0.2904]β(i80:7) -1.0294 9.77 -1.1333 6.06 -1.1113 6.13

[-1.3461,-0.7176] [-1.3727,-0.8897] [-1.3058,-0.9236]µ -3.4733 6.92 -3.4761 7.58

[-3.8827,-3.0422] [-3.8589,-3.0660]ρ 0.9845 7.55 0.9837 8.48

[0.9766,0.9898] [0.9748,0.9896]σ2 0.0191 6.94 0.0188 8.92

[0.0129,0.0268] [0.0127,0.0256]σ2ε 0.0318 1.39

[0.0285,0.0354]v 21.9569 7.68

[14.4626,29.9132]

log(L) 201.93 264.53 261.79log(ML), a = 0.75 183.59 236.13 228.91log(ML), a = 0.99 183.54 236.38 229.17M-H ratio 0.38 0.38

This table reports estimation results for different ARFIMA(12,d,0) type models. β(i80:7) is included in the mean equation.log(L): log-likelihood, log(ML): log-marginal likelihood using the corresponding value of a. M-H ratio: Metropolis-Hastings acceptance ratio.

model, we let p (v) ∼ Exp (5). Furthermore, we experiment with different AR lags. We find φ11 and

φ12 to be very significant. Thus, we let p (φ1) ∼ N (0, 1) , ..., p (φ12) ∼ N (0, 1), and employ rejection-

sampling to ensure that the roots of(

1− φ(i)1 L− ...− φ(i)

12L12)

= 0 lie outside the unit circle. However,

φ1, ..., φ10 are not significant in our applications and are therefore fixed at zero.

We find that the Gaussian ARFIMA-SV model performs best in terms of the marginal likelihood

criterion. In fact, 2logBF in favor of ARFIMA-SV versus ARFIMA is 105.68, which is interpreted as

a very significant improvement. In the ARFIMA-SVt model, the distribution of v is centered around

22 with a standard deviation of 3.94. However, compared to ARFIMA-SVt, the marginal likelihood

indicates strong evidence in favor of the ARFIMA-SV model. For the ARFIMA model, the order of

integration, d, is estimated at 0.38. This implies that US core inflation exhibits long-memory behavior.

φ12 captures the main seasonal effects. The mean inflation, β0, is estimated at 0.16%. The residual

standard error, σε, is large at 0.18% per month. The inflation rate in 1980:7 is a negative additive

outlier and very significant. When we compare ARFIMA with the ARFIMA-SV model, we find that

d drops from 0.38 to 0.29. Furthermore, φ11, φ12 increase from 0.13 and 0.25 to 0.15 and 0.38. The

estimate of β0 is also affected, being more precisely estimated at a slightly higher value. The SV

component itself is nearly nonstationary as the autoregressive coefficient of volatility, ρ, is close to

Page 80: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

71

one, and the conditional volatility of volatility, σ, is well identified at 0.15. The average volatility,

exp (µ/2), is at 0.17% per month. We plot the filtered estimates of σεt, t = 1, ..., T in the top left panel

of Figure 3.4. The volatility decrease since the early 1980s is noticeable and persistent. As stated

earlier, this period is often referred to as “the Great Moderation”.

Table 3.8.: ARFIMA-SV, ARIMA-SV and IMA-SV results for 4ytARFIMA-SV ARIMA-SV IMA-SV

Parameter θ RB θ RB θ RBd− 1 -0.6615 6.34

[-0.6866,-0.6376]ψ1 -0.8422 4.14 -0.8374 3.20

[-0.8701,-0.8148] [-0.8593,-0.8141]φ11 0.1506 6.87 0.1021 4.13

[0.1251,0.1768] [0.0768,0.1268]φ12 0.3808 6.61 0.3494 4.19

[0.3540,0.4089] [0.3237,0.3732]β(i80:7) -1.0278 5.50 -1.2310 4.87 -1.0449 3.72

[-1.1657,-0.8890] [-1.3352,-1.1250] [-1.2352,-0.8491]β(i80:8) 1.0356 4.75 0.9648 4.72 0.7907 4.33

[0.8930,1.1806] [0.8477,1.0846] [0.5908,0.9863]µ -3.5039 7.12 -3.3937 3.58 -3.4204 4.26

[-3.8044,-3.2207] [-3.7573,-3.0070] [-3.6665,-3.1639]ρ 0.9849 10.11 0.9807 4.55 0.9746 3.87

[0.9799,0.9895] [0.9702,0.9892 ] [0.9629,0.9853]σ2 0.0181 7.66 0.0249 4.25 0.0242 3.49

[0.0126,0.0235] [0.0169,0.0341] [0.0179,0.0308]

log(L) 260.22 256.57 208.36log(ML), a = 0.75 228.76 223.90 185.31log(ML), a = 0.99 229.01 224.16 185.56M-H ratio 0.39 0.40 0.40

This table reports estimation results for: ARFIMA(12,d,0)-SV, ARIMA(12,0,1)-SV and IMA(0,1)-SV. β(i80:7) andβ(i80:8) are included in the mean equation. log(L): log-likelihood, log(ML): log-marginal likelihood using the correspondingvalue of a. M-H ratio: Metropolis-Hastings acceptance ratio.

We follow Bos et al. (2012), and compare the estimates of the ARFIMA-SV model with three other

specifications. First, we present estimates for the same ARFIMA-SV model as in Table 3.7, but now

for changes in inflation rate, ∆yt. Hence, in this specification, we estimate d − 1. Furthermore, β0

drops out of the model. With regards to β80:7, we choose to separate the dummy variable and its lag.

The second column of Table 3.8 reports results for this model. We find that d−1 equals −0.66, nearly

equivalent to d = 0.29 for the ARFIMA-SV model. Furthermore, other parameter estimates are very

similar to parameter estimates of the ARFIMA-SV model for yt. We also compare our estimates with

those from an ARIMA-SV model, i.e. (3.5.1) with p = 12, d = 1 and q = 1. This model corresponds to

an ARIMA(12,0,1)-SV model for ∆yt. As before, we find φ1, ..., φ10 to be non-significant. Therefore,

they are set to zero. As noted in Stock and Watson (2007), changes in the long-run persistence of this

model are captured by the MA parameter, ψ1. The forth column of Table 3.8 reports results for the

ARIMA(12,1,1)-SV model. We find that ψ1 = −0.84. Finally, the last two columns of Table 3.8 report

results for an IMA(1,1)-SV model. This specification is equivalent to the unobserved components (UC)

Page 81: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

72

model of Stock and Watson (2007). The estimate of ψ1 is equal to −0.83, which is almost identical to

the estimate of ψ1 for the ARIMA(12,1,1)-SV model. However, we see that leaving out φ11 and φ12

results in a considerably lower ML value.

Figure 3.4.: Estimation results on US core inflation rate, ARFIMA(12,d,0)-SV

1957 1969 1981 1993 20050.1

0.2

0.3

0.4(a) d

φ12 φ11

0 20 40 60 80 100−0.2

0.2

0.6

1.0(b) d

φ12 φ11

d

0 2000 4000 6000 8000 100000.0

0.2

0.4

0.6(c) d φ11 φ12

0 2000 4000 6000 8000 100000.0

0.2

0.4

0.6(d) φ11 φ12

Panel (a): Filtered volatility estimates, ARFIMA(12,d,0)-SV, (b): autocorrelation functions of d, φ11 and φ12, (c): d | YTagainst iteration (after a burn-in of 10000), and (d): φ11 | YT , φ12 | YT against iteration (after a burn-in of 10000).

3.5.3. Subsample estimation

In order to show the effects of the SV extension on the estimates and demonstrate presence of structural

breaks in θ, we present estimation results for several models for two different samples. Table 3.9 reports

estimation results for: a homoscedastic ARFIMA(12,d,0), ARFIMA(12,d,0)-SV and ARIMA(12,1,1)-

SV from 1957:1-1980:12 and 1981:1-2013:5, respectively12.

For the first sample, 1957:1-1980:12, we find that d is now closer to the nonstationary value of 0.5,

(0.42 for ARFIMA and 0.39 for ARFIMA-SV). Furthermore, the estimates of ρ are smaller than for

the full sample for both ARFIMA-SV and ARIMA-SV. We find that β0s are also higher for both

models, but so are their posterior standard deviations. There is still a very significant difference in

the marginal likelihood values between the ARFIMA model and the ARFIMA-SV model. Compared

to the full sample, the logBF in favor of ARFIMA-SV is however smaller.

Finally, the right side of Table 3.9 displays results for the period 1981:1-2013:5. The d parameter

is much smaller than the d parameter for the first period. On the other hand, ρ rises from 0.73 for

the first sample period to 0.96 for the second sample period. The unconditional volatility of volatility,

12Contrary to Bos et al. (2012), we choose to include 1983 and 1984 in the second subsample period.

Page 82: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

73

Tab

le3.

9.:

Sub

sam

ple

esti

mat

ion

resu

lts

fory t

AR

FIM

AA

RF

IMA

-SV

AR

IMA

-SV

AR

FIM

AA

RF

IMA

-SV

AR

IMA

-SV

Para

met

erθ

(RB

(RB

(RB

(RB

(RB

(RB

)19

57:1

-198

0:12

1981

:1-2

013:

5

d0.4

155

(2.6

1)

0.38

99(3

.04)

0.24

44(3

.73)

0.21

95(2

.44)

[0.3

361,

0.48

95]

[0.3

457,

0.43

49]

[0.1

581,

0.34

33]

[0.1

737,

0.26

38]

ψ1

-0.7

296

(2.5

8)-0

.937

5(2

.86)

[-0.

7733

,-0.

6848

][-

0.96

39,-

0.90

81]

φ11

0.0

889

(2.3

2)

0.15

23(3

.26)

0.06

80(2

.17)

0.14

54(3

.44)

0.11

91(2

.85)

0.09

28(2

.84)

[-0.0

201,0

.200

1]

[0.0

899,

0.21

35]

[0.0

203,

0.11

30]

[0.0

697,

0.22

47]

[0.0

794,

0.15

90]

[0.0

534,

0.13

30]

φ12

0.1

602

(2.7

7)

0.27

43(2

.79)

0.20

85(2

.25)

0.35

85(3

.37)

0.43

97(2

.95)

0.44

47(2

.78)

[0.0

481,0

.268

6]

[0.2

093,

0.33

70]

[0.1

606,

0.25

55]

[0.2

764,

0.43

94]

[0.3

982,

0.48

07]

[0.4

027,

0.48

51]

β0

0.2

609

(2.2

0)0.

4612

(2.9

2)0.

1140

(4.6

4)0.

1995

(2.3

2)[0

.028

6,0

.480

7]

[0.2

830,

0.64

03]

[0.0

544,

0.17

88]

[0.1

422,

0.25

65]

β(i

80:7

)-1

.0307

(1.9

7)

-1.0

504

(2.8

4)-1

.236

9(2

.26)

[-1.4

249,-

0.6

557]

[-1.

2715

,-0.

8450

][-

1.39

63,-

1.07

72]

β(i

80:8

)0.

9572

(2.1

5)

[0.7

865,

1.13

45]

µ-3

.159

7(3

.02)

-3.1

921

(2.1

1)-4

.272

3(2

.49)

-4.2

619

(2.6

2)[-

3.25

36,-

3.06

48]

[-3.

2775

,-3.

1038

][-

4.39

46,-

4.14

97]

[-4.

3971

,-4.

1287

0.72

93(3

.38)

0.75

43(2

.71)

0.95

51(2

.86)

0.95

95(2

.83)

[0.6

107,

0.84

00]

[0.6

414,

0.85

79]

[0.9

459,

0.96

41]

[0.9

516,

0.96

72]

σ2

0.00

61(3

.51)

0.06

80(3

.13)

0.00

66(2

.78)

0.00

70(3

.43)

[0.0

027,

0.01

08]

[0.0

203,

0.11

30]

[0.0

035,

0.01

05]

[0.0

038,

0.01

09]

σ2 ε

0.0

485

(1.8

1)

0.01

80(1

.11)

[0.0

411,0

.057

2]

[0.0

156,

0.02

07]

log(

L)

25.8

245

.71

49.0

422

1.74

240.

3623

2.95

log(

ML

),a

=0.

75

8.1

723

.54

24.2

420

7.31

213.

3520

7.34

log(

ML

),a

=0.

99

8.0

823

.79

24.7

620

7.32

213.

6120

7.58

M-H

rati

o0.

500.

520.

570.

56

This

table

rep

ort

ses

tim

ati

on

resu

lts

for

diff

eren

tm

odel

sov

ertw

osu

bsa

mple

s,1957:1

-1980:1

2and

1981:1

-2013:5

.β(i80:7)

andβ(i80:8)

are

incl

uded

inth

em

ean

equati

on.RB

isin

dic

ate

din

side

the

pare

nth

eses

.lo

g(L

):lo

g-l

ikel

ihood,

log(M

L):

log-m

arg

inal

likel

ihood.

M-H

rati

o:

Met

rop

olis-

Hast

ings

acc

epta

nce

rati

o.

Page 83: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

74

σ/√

1− ρ2, rises from 0.12 for the first subsample to 0.28 for the second subsample13. The AR

parameters also show changes compared to the first subsample and the full sample. Finally, according

to the marginal likelihood criterion, we find that the contribution of the SV component is relatively

smaller in the second subsample. We also present results for the ARIMA-SV model from 1957:1-

1980:12 and 1981:1-2013:5 in Table 3.9. These results should be compared with Table 3.8. We see

that the most significant change is the shift in ψ1 which changes from −0.73 for 1957:1-1980:12 to

−0.94 for 1981:1-2013:5. The marginal likelihood of the ARIMA-SV model is slightly higher than

the ARFIMA-SV marginal likelihood for the first sample and lower than the ARFIMA-SV marginal

likelihood for the second sample.

Overall, we have shown that PMMH provides a very compelling and computationally fast frame-

work for estimation and model comparison. We conclude that we can qualitatively replicate already

established facts on structural changes in θ since the Great Moderation.

3.5.4. Forecasts

In this section, we perform a recursive out-of-sample forecasting exercise to evaluate the performance

of the models listed in Table 3.7. Given data up to time t − 1, Yt−1 = (y1, ..., yt−1)′, the predictive

likelihood, p (yt, .., yT | Yt−1), is the predictive density evaluated at the realized outcome, yt, ..., yT ,

t ≤ T , see Geweke (2005). The predictive likelihood contains the out-of-sample prediction record of

a particular model, making it the essential quantity of interest for model evaluation. The predictive

likelihood for model MA is given as

p (yt, .., yT | Yt−1,MA) =

ˆΘA

p (yt, .., yT | Yt−1, θA,MA) p (θA | Yt−1,MA) dθA. (3.5.3)

Notice that the terms on the right-hand side of (3.5.3) have parameter uncertainty integrated out. If

t = 1, this would be the marginal likelihood and (3.5.3) changes to

p (y1, ..., yT | MA) =

ˆΘA

p (y1, ..., yT | θA,MA) p (θA | MA) dθA,

where p (y1, ..., yT | θAMA) is the likelihood and p (θA | MA) is the prior for model MA. Hence,

the sum of log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the

marginal likelihood, but ignoring the initial t − 1 observations. The predictive likelihood (PL) can

be used to order models according to their predictive abilities. In a similar fashion to Bayes factors

which is based on all of the data, one can also compare the performance of models based on a specific

out-of-sample period by predictive Bayes factors. The predictive Bayes factor for model A versus B

is PFBAB = p (yt, .., yT | Yt−1,MA) /p (yt, .., yT | Yt−1,MB), and summarizes the relative evidence of

the two models over the out-of-sample data, yt, ..., yT . Equation (3.5.3) is simply the product of the

individual predictive likelihoods

p (yt, ..., yT | Yt−1,MA) =T∏s=t

p (ys | Ys−1,MA) . (3.5.4)

13At first, this may seem counterintuitive. However, we believe that this result is due to the choice of our subsamples.Furthermore, we also find that var (YT ) = 47.78 for 1957:1-1980:12 and var (YT ) = 87.83 for 1981:1-2013:5.

Page 84: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

75

Calculating (3.5.4) within a PMMH sampling scheme is easy. We can use the predictive decomposition

along with the output from M-H. Specifically, each term on the right-hand side of (3.5.4) can be

consistently estimated as

p (ys | Ys−1,MA) ≈ 1

N

N∑i=1

p(ys | Ys−1, θ

(i)A ,MA

). (3.5.5)

We can also compare forecasts of models based on the predictive mean. Similar to the predictive

likelihood, the predictive mean, E [yt | Yt−1,MA], can be computed using θ(1)A , ..., θ

(N)A . For instance,

in the context of the ARFIMA-SV model, we have that

(yt − β0) =

t−1∑l=1

πl (yt−l − β0) + σεtεt and

t−1∑l=1

πlLl =

Φ (L)

Ψ (L)(1− L)d .

Thus, we can calculate the predictive mean of yt based on Yt−1 as

E [yt | Yt−1,MA] ≈ 1

N

N∑i=1

t−1∑l=1

β(i)0 + π

(i)l

(yt−l − β

(i)0

).

For each model, we produce h-step ahead forecasts with h = 1, h = 4 and h = 8 using a rolling

window with a width of 265 months. We choose the out-of-sample period from 1979:3 till the end of

the sample for a total of 411 observations. Specifically, given Yt, t ≥ 265, we implement our sampling

scheme, obtain posteriors draws of θA and compute p (yt+h | Yt,MA), h = 1, 4, 8 using (3.5.5). We

also calculate E [yt+h | Yt,MA], using at each step h = 2, ... previously obtained forecasts until h-1.

As a new observation enters the information set, the posterior is updated through a new round of

sampling and the forecasting procedure is implemented.

The predictive likelihood of the models in Section 3.5.2 along with several benchmark alternatives

are displayed in the left panel of Table 3.10. The ARFIMA-SV models are estimated using PMCMC,

i.e. PMMH, while AR(5), AR(10), ARMA(1,1) and ARFIMA(12,d,0) are estimated using “pure”

Gibbs sampling.

The log-predictive likelihood of ARFIMA(12,d,0)-SV is larger than all the other models for h = 1.

The evidence for h = 4 and h = 8 is also strong. However, the performance of ARFIMA(12,d,0)-SV

deteriorates as forecast horizon lengthens. We also find that ARMA(1,1) is dominated by all the other

models, and it has poor performance compared with AR(5) or AR(10).

Although we focus on the predictive likelihood to measure predictive content, it is also interesting

to consider the out-of-sample point forecasts based on the predictive mean. Therefore, the right panel

of Table 3.10 reports the root mean squared error (RMSE) for the predictive mean. The out-of-sample

period corresponds exactly to the period used to calculate the predictive likelihood. As before, we find

that ARFIMA(12,d,0)-SV performs very well against all the other models. It is the top performer

for h = 4 and h = 8 and second best for h = 1. Compared to the homoscedastic ARFIMA model,

ARFIMA(12,d,0)-SV and ARFIMA(12,d,0)-SVt offer improvements in terms of out-of-sample point

forecasts. However, the improvements they offer are quite modest.

In order to perform a joint evaluation of the forecasts, the methodology of Hansen et al. (2011),

termed the Model Confidence Set (MCS) is applied. The appealing feature of the MCS approach is that

Page 85: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

76

it allows for a user-defined criterion of “best”. Furthermore, it does not require a benchmark model for

comparison. In terms of implementation, the Ox package, Mulcom, provided by the authors is used.

As before, results clearly indicate that ARFIMA(12,d,0)-SV and ARFIMA(12,d,0)-SVt perform very

well. In terms of density forecasts, ARFIMA(12,d,0)-SV is the only model that belongs to the 5% MCS

for h = 1, h = 4 and h = 8, i.e. ARFIMA(12,d,0)-SV performs significantly better than all the other

models. In terms of point forecasts, there is no significant difference between ARFIMA(12,d,0)-SV

and ARFIMA(12,d,0)-SVt. These models are the only models that belong to the 5% MCS.

Table 3.10.: Out-of-sample forecast results, yt+hlog(PL) RMSE

Model h = 1 h = 4 h = 8 h = 1 h = 4 h = 8

AR(5) 127.77 62.48 60.15 0.1828 0.2152 0.2149AR(10) 128.61 78.21 89.55 0.1826 0.2069 0.2025ARMA(1,1) 111.28 44.19 34.22 0.1893 0.2219 0.2249ARFIMA(12,d,0) 158.97 124.86 125.18 0.1721 0.1913 0.1885

ARFIMA(12,d,0)-SV 199.69(∗) 159.75(∗) 150.93(∗) 0.1699(∗) 0.1882(∗) 0.1861(∗)

ARFIMA(12,d,0)-SVt 192.88 151.91 139.95 0.1698(∗) 0.1887(∗) 0.1869(∗)

This table reports the log-predictive likelihood, log(PL), and out-of-sample root mean squared error (RMSE) for thepredictive mean. The out-of-sample period is from 1979:3 to 2013:5. An asterisk, (∗), signifies that the model belongsto the 5% MCS of Hansen et al. (2011).

Figure 3.5.: Rolling window parameter estimates, ARFIMA(12,d,0)-SV

1957 1963 1969 1975 1981 1987 19930.0

0.2

0.4

0.6

0.8

1.0(a)

1957 1963 1969 1975 1981 1987 1993

0.1

0.2

0.3

0.4

0.5 (b)

1957 1963 1969 1975 1981 1987 1993−0.1

0.0

0.1

0.2

0.3(c)

1957 1963 1969 1975 1981 1987 19930.0

0.2

0.4

0.6

0.8(d)

Panel (a): β0, (b): d, (c): φ11 and (d): φ12. Window width: 265 months. First period: 1957:1-1979:2, last period1991:5-2013:5. The solid lines represent parameter estimates. The dashed lines denote the 95% credibility intervals.

Page 86: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

77

3.5.5. Parameter sensitivity analysis

In this section, we follow Bos et al. (2012) and perform parameter sensitivity analysis using rolling

estimates of θ. We use rolling estimates for window lengths of 265 months. We show recursive

estimates of β0, d, φ11 and φ12 along with their respective 95% credibility intervals in Figure 3.5. The

fractional integration order, d, captures the long-memory behavior, φ11 and φ12 capture the short-

memory behavior including seasonality. The values for 1957:1 correspond to the estimation period

1957:1-1979:2. Panel (a) shows that β0 really does not change much. It fluctuates around 0.4 until

the mid 1980s and thereafter drops to 0.2. Recursive estimates of d show that d gradually drops from

0.4 at the start of the sample to about 0.1 towards the end of the sample. The estimate of d is 0.16

for the last subsample which runs from 1990:5 to 2013:5. It is cautiously evident that long-memory

characteristics of US inflation might not have remained significant after the Great Moderation. It is

also clear that current long-run persistence is lower than in the period before the Great Moderation.

On the other hand, φ12 increases almost as the mirror image of d. Finally, except for a small peak in

the early 1980s, φ11 remains relatively constant around 0.1 throughout the sample.

Overall, recursive estimates of the parameters show slowly drifting trends. As expected, most

significant changes occur during the Great Moderation. We seem to agree with Stock and Watson

(2007) that US inflation may have become harder to forecast as the persistence and hence the month-

to-month memory has dropped.

3.6. Unobserved Components Model with SVM Effects

In this section, we follow Chan (2014) and Section 3.4. As before, the goal of this section is to

demonstrate the flexibility of PMCMC to adapt to more complicated model structures. We define the

unobserved components model with stochastic volatility in mean effects, UC-SVM, as

yt = αt + λt exp (ht/2) + exp (ht/2) εt, εt ∼ N (0, 1)

αt+1 = αt + σηηt, ηt ∼ N (0, 1)

λt+1 = λt + σζζt, ζt ∼ N (0, 1)

ht+1 = µ+ ρ (ht − µ) + σηt, ηt ∼ N (0, 1) .

This specification extends the UC-SVm model discussed in Section 3.4. In UC-SVM, the log-volatility,

ht, enters the conditional mean equation with a time-varying loading coefficient, λt. Modeling this

specification within the PMCMC framework is straightforward. For instance, we can implement the

particle Gibbs sampler of Andrieu et al. (2010). Alternatively, we can use a Rao-Blackwellization

scheme similar to Section 3.4. Conditional on ht the remaining model is a linear Gaussian state-space

model where (α1:T , λ1:T ) can be integrated out analytically using the Kalman filter, and we can sample

θ using M-H. Furthermore, estimating the above model is very difficult using “pure” Gibbs sampling,

see Chan (2014). For instance, Chan (2014) first samples (α1:T , λ1:T ) conditional on h1:T , θ and YT .

However, conditional on (α1:T , λ1:T ), θ and YT , the model cannot be written in linear state-space form.

Therefore, there is no obvious way to sample h1:T from its conditional posterior. Consequently, Chan

(2014) adopts an accept-reject M-H procedure and samples h1:T conditional on α1:T , λ1:T , θ, YT .

On the other hand, within the PG framework, we act as if we are operating within a traditional

Page 87: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

78

Gibbs sampling scheme. Furthermore, in step 1, we can draw (α1:T , λ1:T , h1:T ) all-at-once using for

example the conditional SMC algorithm of Andrieu et al. (2010). Thus, we proceed by cycling through

the following steps.

1. α1, ..., αT , λ1, ..., λT and h1, ..., hT | θ, YT .

2. µ | ρ, σ2, h1, ..., hT .

3. ρ | µ, σ2, h1, ..., hT .

4. σ2 | µ, ρ, h1, ..., hT .

5. σ2η | α1, ..., αT .

6. σ2ζ | λ1, ..., λT .

As state above, we can use the particle Gibbs (PG) sampler of Andrieu et al. (2010) to estimate the

above model by cycling through steps 1-6. However, Lindsten et al. (2012) propose a relatively new

method which is computationally as elegant and in some cases even more robust than PG, namely,

particle Gibbs with ancestor sampling, PG-AS. This approach builds on the PG sampler proposed by

Andrieu et al. (2010). In PG, we start by running a sequential Monte Carlo (SMC) sampler in which

one particle trajectory is set deterministically to a reference trajectory that is specified a priori. After

a complete run of the SMC algorithm, a new trajectory is obtained by selecting one of the particle

trajectories with probabilities given by their importance weights. The effect of the reference trajectory

is that the target distribution of the resulting Markov kernel remains invariant, regardless of M , see

Andrieu et al. (2010). However, in some cases, PG can suffer from a serious drawback, which is that

the underlying mixing can be very poor when there is path degeneracy in the SMC sampler14. PG-AS

alleviates possible path degeneracy problems in a very computationally elegant fashion. Specifically,

the original PG kernel is modified using a so-called ancestor sampling step. Even though this is a

small modification of the algorithm, improvements in mixing can be quite considerable, see Lindsten

et al. (2012). The reader is referred to Lindsten et al. (2012) and the Appendix for more details

regarding PG-AS.

Accordingly, at iteration i of PG-AS, we consider the nonlinear framework of our UC model and

sample(α

(i)1:T , λ

(i)1:T , h

(i)1:T

)∼ p (α1:T , λ1:T , h1:T | θ, YT ) using the conditional particle filter with ancestor

sampling (CPF-AS) of Lindsten et al. (2012), see the Appendix. The elements in θ(i) are sampled

one-at-a-time using standard Gibbs sampling techniques conditional on α(i)1:T , λ

(i)1:T , h

(i)1:T , YT

15.

We assume independent priors for θ. Specifically, µ ∼ N (0, 1) and (ρ+ 1) /2 ∼ Beta (20, 1.5). As

before, the variance parameters are IG distributed with v0 = 4 and s0 = 0.02. Our data consists of UK

quarterly seasonally adjusted CPI inflation from 1955q1 to 2013q1. Specifically, given the quarterly

CPI number, CPIt, we use yt = 400ln (CPIt/CPIt−1) as the CPI inflation rate. Table 3.11 reports

posterior mean estimates, 95% credibility intervals (indicated inside the brackets) and inefficiency

factors for:14In some cases this problem can be addressed by adding a backward simulation step to the PG sampler, yielding a

method denoted as PG with backward simulation, see Lindsten and Schon (2013). In order to avoid this step andprovide an overall robust method for estimation, we choose to follow Lindsten et al. (2012). Therefore, we use theconditional particle filter with ancestor sampling of Lindsten et al. (2012).

15We could also first sample α(i)1:T , λ

(i)1:T | h

(i−1)1:T , θ(i−1), YT and then h

(i)1:T | α

(i)1:T , λ

(i)1:T , θ

(i−1), YT . However, we find thatthis approach leads to very similar results as our original approach.

Page 88: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

79

1. M1: UC: a plain UC model, i.e. (3.4.1)-(3.4.2).

2. M2: UC-SVm: an UC model with SV effects in σ2ε .

3. M3: UC-SVM-λ-const: an UC model with SV in mean effects where λ1 = λ2 = ... = λT .

4. M4: UC-SVM: an UC model with SV in mean effects.

Table 3.11.: Estimation results, unobserved components models, UK inflation

UC UC-SVm UC-SVM-λ-const UC-SVMParameter θ (RB) θ (RB) θ (RB) θ (RB)

λ 1.8612 (28.56)[1.5312,2.6859]

µ 0.7530 (1.82) 0.7891 (1.95) 0.7374 (1.95)[-0.2216,3.6716] [-0.2991,3.1355] [-0.0892,3.2277]

ρ 0.9482 (18.95) 0.9708 (11.45) 0.9508 (17.25)[0.8996,0.9979] [0.9437,0.9998] [0.9063,0.9991]

σ2 0.2345 (36.49) 0.1039 (30.41) 0.1361 (33.26)[0.0953,1.1590] [0.0527,0.4320] [0.0714,0.4013]

σ2ε 10.1848 (1.18)

[8.6739,15.6103]σ2η 0.7456 (22.28) 0.2914 (34.50) 0.0116 (50.37) 0.0051 (39.67)

[0.4113,3.7689] [0.1282,1.2156] [0.0033,0.1065] [0.0023,0.0266]σ2ζ 0.0264 (49.50)

[0.0095,0.2204]

log(L) -610.20 -534.87 -523.53 -513.22log(ML), a = 0.75 -663.49 -566.22 -541.07 -531.19log(ML), a = 0.99 -663.24 -565.96 -540.93 -530.94

This table reports estimation results for different UC models. RB is indicated inside the parentheses. log(L): log-likelihood, log(ML): log-marginal likelihood for the corresponding value of a.

Furthermore, the PG-AS sampler provides us with p (YT | θ,Mk), k = 1, ..., 4. Thus, as before, we

are able to compute the marginal likelihood values for each model using G-D. On the other hand,p (YT | θ,Mk) is not as easily obtainable within a traditional Gibbs sampling scheme. For instance,

Chan (2014) adopts the approach in Koop et al. (2010) and instead computes the dynamic posterior

probabilities for M4 versus M2 over the data.

We see that UC-SVM performs best in terms of ML. For instance, compared to UC-SVm, which is

equivalent to UC-SVM when λt = 0, t = 1, ..., T , the logBF in favor of UC-SVM is 35.02. This is

very strong evidence. Furthermore, compared to UC-SVM-λ-const, the logBF in favor of UC-SVM

is 9.99, which confirms posterior evidence in favor of time-variation in λ. In Figure 3.6, we plot the

evolution of αt, λt and exp (ht). Intuitively, the estimates of exp (ht) are as expected. The inflation

volatility increases during the 1970s and subsequently stabilizes since the beginning of the 1980s. We

also report posterior estimates of αt and λt, t = 1, ..., T . Evidently, there is substantial time-variation

in these estimates. Estimates of λt highlight the importance of including this component in the model.

During the 1970s, λt is between 1.5 and 3.5, whereas λt becomes much smaller after the early 1980s16.

16However, for PG-AS, (on average) the RB of each parameter is higher than for PMMH. As stated in Section 3.4, thisis because we incorporate the steps in So et al. (2005) within the PMMH procedure, thus automatically reducing RB .Furthermore, the inefficiency factors of θ using PG-AS are relatively close to those of the Gibbs sampling approachof Chan (2014).

Page 89: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

3. A Survey of PMCMC Techniques for Unobserved Component Time Series Models

80

Figure 3.6.: Estimation results, UC-SVM, UK inflation rate

Inflation Filtered trend

1955 1965 1975 1985 1995 2005 2015−5

5

15

25

35Inflation Filtered trend exp(ht)

1955 1965 1975 1985 1995 2005 20150

10

20

30

40exp(ht)

αt

1955 1965 1975 1985 1995 2005 2015−0.10

0.02

0.14

0.26

0.38

0.50αt λt

1955 1965 1975 1985 1995 2005 2015−0.5

0.5

1.5

2.5

3.5

4.5λt

Top panels: inflation, filtered trend and estimates of the conditional variance of inflation, exp (ht). Bottom panels:evolution of αt and λt, t = 1, ..., T , with their respective 95% credibility intervals.

3.7. Conclusion

In this paper we present algorithms and implementations for analyzing different data sets using Ox in

combination with particle Markov chain Monte Carlo techniques. We briefly describe implementation

of these techniques in Section 3.2. We provide several examples in Sections 3.3 to 3.6. We show how

to estimate stochastic volatility models with different extensions. Thereafter, we focus on PMMH

estimation of unobserved components models with time-varying volatility using US inflation data.

Results using quarterly inflation data show that extending the unobserved components model towards

a model with time-varying volatility both in the irregular and regular components of inflation provides

improvements in terms of the marginal likelihood criterion.

We also show that it is relatively easy to estimate more complicated models using PMCMC. For

instance, we estimate different ARFIMA-SV type models. We show that our methods work very

well in identifying the true data generating parameters through Monte Carlo simulations. We find

that the ARFIMA-SV model performs best in terms of ML. Overall, we find that the SV part of

the model clearly captures the Great Moderation. Sensitivity analysis using rolling estimates of the

parameters for the ARFIMA-SV model provides a clear distinction between parameter changes in the

level, long-run dynamics and changes in parameters for the short-run dynamics. In terms of out-of-

sample forecasts, ARFIMA(12,d,0)-SV obtains the highest PL for h = 1, h = 4 and h = 8. In terms

of RMSE, ARFIMA(12,d,0)-SV is the top performer for h = 4 and h = 8.

Page 90: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. Particle Gibbs with Ancestor Sampling for

Stochastic Volatility Models with: Heavy Tails,

in Mean Effects, Leverage, Serial Dependence

and Structural Breaks

Author: Nima Nonejad

Abstract: Particle Gibbs with ancestor sampling (PG-AS) is a new tool in the family of sequen-

tial Monte Carlo methods. We apply PG-AS to the challenging class of stochastic volatility models

with increasing complexity, including leverage and in mean effects. We provide applications that

demonstrate the flexibility of PG-AS under these different circumstances and justify applying it in

practice. We also combine discrete structural breaks within the stochastic volatility model framework.

For instance, we model changing time series characteristics of monthly postwar US core inflation rate

using a structural break autoregressive fractionally integrated moving average (ARFIMA) model with

stochastic volatility. We allow for structural breaks in the level, long and short-memory parameters

with simultaneous breaks in the level, persistence and the conditional volatility of the volatility of

inflation.

Keywords: ancestor sampling, Bayes, particle filtering, structural breaks

(JEL: C11, C22, C58, C63)

Page 91: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

82

4.1. Introduction

Stochastic volatility (SV) models are widely used for modeling financial time series data, see Jacquiera

et al. (1994), Kim et al. (1998), Chib et al. (2002), Koopman and Hol Uspensky (2002), Berg et al.

(2004), Omori et al. (2007), Nakajima and Omori (2012), and Chan and Grant (2014). Furthermore,

time series models with SV specification have also become important in macroeconometric modeling

as they provide a flexible framework for estimation and interpretation of time variation in the volatility

of macroeconomic time series, see for instance Cogley and Sargent (2005), Primiceri (2005), Koop and

Korobilis (2010), Chan (2013) and Chan (2014).

Bayesian inference of these models generally relies on MCMC techniques. Typically, the main

practical difficulty in estimating SV models lies in simulating from the conditional posterior of the

latent volatility process. In general, there is no unified way to draw these latent volatilities. For

instance, methods that work for simple SV specifications do no work for specifications that allow for

leverage or SV in mean effects, see Omori et al. (2007) and Chan (2014). Modifying the algorithms

which can accommodate these features often requires much programming effort, and can in some cases

prove inefficient producing highly autocorrelated draws.

Recently, in their seminal paper, Andrieu et al. (2010) propose a novel combination called Particle

MCMC (PMCMC) which uses sequential Monte Carlo (SMC) techniques to design efficient high-

dimensional proposal distributions for MCMC algorithms. One of the main features of the PMCMC

method is that it can use the output of an SMC method, targeting the marginal density of the model

parameters as a proposal distribution for a Metropolis-Hastings update. Among recent works, Flury

and Shephard (2011) and Whiteley et al. (2010) use this technique to estimate stochastic volatility

and other time series models.

In this paper, we provide a unified Bayesian methodology in order to estimate stochastic volatility

(and models with time-varying volatility modeled as a SV process) models with heavy tails, in mean

effects, leverage and structural breaks. We apply a relatively new tool in the family of SMC methods,

which is particularly useful for inference in SV models, namely, particle Gibbs with ancestor sampling

(PG-AS), suggested in Lindsten et al. (2012). PG-AS is similar to the particle Gibbs (PG) sampler

proposed by Andrieu et al. (2010). In PG, we start by running a sequential Monte Carlo sampler in

which one particle trajectory is set deterministically to a reference trajectory that is specified a priori.

After a complete run of the SMC algorithm, a new trajectory is obtained by selecting one of the

particle trajectories with probabilities given by their importance weights. The effect of the reference

trajectory is that the target distribution of the resulting Markov kernel remains invariant, regardless

of the number of particles used in the underlying SMC algorithm. However, PG can suffer from a

serious drawback, which is that the underlying mixing can be very poor when there is path degeneracy

in the SMC sampler. In most cases, this problem can be addressed by adding a backward simulation

step to the PG sampler, yielding a method denoted as PG with backward simulation, see Whiteley

et al. (2010) and Lindsten and Schon (2013)1. PG-AS alleviates the path degeneracy problem in a

1In Andrieu et al. (2010), Whiteley suggests adding a backward step that enables exploration of all possible ancestrallineages. This approach can be considered as an alternative to the ancestor sampling part. However, this step willincrease computation time. Furthermore, the purpose of this paper is not compare the performance of PG-AS withother methods. On the contrary, we seek to show that PG-AS provides a general, simple and unified way to drawthe latent states without the need to add or modify backward steps. Finally, it is a consensus in the community thatPG-AS outperforms or at the very least performs as well as the original PG with its modifications, see Lindsten etal. (2012) and Lindsten et al. (2014).

Page 92: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

83

very computationally elegant fashion. Specifically, the original PG kernel is modified using a so-called

ancestor sampling step. This way the same effect as backward sampling is achieved, but without the

need to run an explicit backward pass.

The main objective of this paper is to show that PG-AS provides a very compelling and computa-

tionally easy framework for estimating rather advanced models. We illustrate this on three specific

problems, producing rather generic methods. Accordingly, as we shall see, for both financial and

macroeconomic models that we consider, PG-AS requires limited design effort on the practitioner’s

part, especially if one desires to change some features in a particular model. On the other hand,

estimating the same types of models using “traditional” Gibbs sampling would require relatively more

programming effort2. Furthermore, contrary to typical Gibbs sampling applications which often re-

quire additional sampling steps to obtain the integrated likelihood, this quantity is easily obtainable

using PG-AS as the integrated likelihood is directly available from the conditional particle filter with

ancestor sampling output, see Section 4.3.

The initial model in Section 4.4 is the standard stochastic volatility (SV) model with Gaussian errors

applied to a financial data set concerning daily Dow Jones Industrial Average (DJIA) returns. Next,

we consider different well-known extensions of the SV model. First, we incorporate a leverage effect

by modeling a correlation parameter between measurement and state errors. The third extension is

PG-AS implementation of the stochastic volatility in mean model of Koopman and Hol Uspensky

(2002). In this specification, the unobserved volatility process appears in both the conditional mean

and the conditional variance. In the fourth extension, we implement a model that has both stochastic

volatility and moving average errors, see Chan (2013). Finally, we also consider a SV model with

Student-t distributed errors, a heavy-tailed SV model with stochastic volatility in mean effects, and

a SV model with heavy-tailed moving average errors, see Chan and Hsiao (2013). We show that

PG-AS provides a straightforward procedure for estimation, marginal likelihood (ML) and deviance

information criterion (DIC) calculation of these models3.

In the second part of the paper, we provide extensions where we combine discrete structural breaks

within the SV framework using macroeconomic time series data. We provide a general methodology

for modeling and forecasting in the presence of structural breaks caused by unpredictable changes

to model parameters. In our settings, structural breaks are modeled through irreversible Markov

switching, or so-called change-point dynamics, see Chib (1998). We estimate model parameters, log-

volatilities and change-points dates conditional on a fixed number of change points. For each of these

specifications, ML and DIC are calculated. They are then used to determine the optimal number of

change points, see Liu and Maheu (2008).

First, we consider modeling the real US GDP growth rate and document a structural break in

its volatility since the 1980s, see Gordon and Maheu (2008), among others. The flexibility of PG-

AS allows us to incorporate the GDP rate GDP volatility relationship by incorporating stochastic

volatility in mean effects within the change-point model. Overall, besides a one-time structural break

in the volatility of real US GDP growth rate in 1984, our results also point to a gradual volatility

2With“traditional”Gibbs sampling, we refer to sampling methods that draw the latent states using either: a single-stateprocedure, see Jacquiera et al. (1994), mixture samplers as in Kim et al. (1998), Omori et al. (2007) in the contextof the SV model with leverage effect, or an accept-reject Metropolis-Hastings procedure as for instance Chan (2014)in the context of the SV model with in mean effects.

3See Chan and Grant (2014) for a broader discussion on the advantages of using DIC based on the integrated likelihoodinstead of the conditional DIC, i.e. DIC conditional on the latent volatilities.

Page 93: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

84

reduction in the 1960s followed by a subsequent increase in the 1970s. Furthermore, we find that the

GDP rate GDP volatility relationship, as measured by the SV in mean feedback has a more serious

and negative impact on the GDP growth rate after the structural break4. In terms of point forecasts,

structural break specifications tend to dominate their constant parameter counterparts. However,

these improvements are quite modest.

Second, we model changing time series characteristics of postwar monthly US core inflation rate

using a structural break autoregressive fractionally integrated moving average model with stochastic

volatility. We allow for structural breaks in the level, autoregressive (AR), moving average (MA)

parameters, long-memory parameter, d, contemporaneously with breaks in the level, persistence and

the conditional volatility of the volatility of inflation. We find evidence of structural breaks in the

dynamics of US core inflation rate and show that we can qualitatively reproduce well-known empirical

facts regarding the dynamics of US inflation rate. As expected, most significant changes in the

model parameters occur during the Great Moderation. Furthermore, it is also cautiously evident

that the long-memory characteristics of US inflation might not have remained significant after the

Great Moderation. In comparison to a model that assumes no breaks, we find that our break model

performs better in terms of density and point forecasts.

Overall, we believe that applying PG-AS to time series SV models, especially structural break

specifications is the most important contribution that we provide. To our knowledge, no attempts

have been made to use PG-AS in the econometric analysis of these types of models. The remaining of

this paper is as follows. In Section 4.2 we intuitively explain the advantages of PG-AS, mainly from

a computational point of view. Section 4.3 describes the steps of PG-AS. We present our empirical

applications in Sections 4.4 and 4.5. Finally, the last section concludes.

4.2. Why is PG-AS Useful?

In this section we provide an intuitive justification for applying PG-AS in practice. We identify two

common weaknesses in existing Gibbs sampling methods, see (1) and (2) below. We then argue how

PG-AS can provide solutions to these problems. In Section 4.3 we provide technical details on PG-AS,

especially the conditional particle filter with ancestor sampling, CPF-AS.

Given a time series of observations, YT = (y1, ..., yT )′, a very plain stochastic volatility (SV) model

consists of a measurement equation,

yt = µ+√γ exp (ht/2) εt, εt ∼ N (0, 1) , (4.2.1)

that describes the distribution of the data given the log-volatilities, h1:T = (h1, ..., hT )′, where

ht = µh + φh (ht−1 − µh) + σhζt, ζt ∼ N (0, 1) , (4.2.2)

Equation (4.2.2) models the period-to-period variation of the volatilities as a Markov process. Typi-

cally, we let corr (εt, ζt) = 0. The parameter, µh is the drift term in the state equation, γ = exp (µh)

plays the role of a constant scaling factor, σh is the volatility of log-volatility and φh is the persistence

4Our change-point stochastic volatility in mean model is not by any means restricted only to GDP data. In fact, webelieve that this model can be used to analyze US inflation rate data, providing a flexible alternative to recent workson quarterly US inflation rate data such as Chan (2014) and Eisenstat and Strachen (2014).

Page 94: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

85

parameter. In most cases, we impose that |φh| < 1 such that we have a stationary process with the

initial condition, h1 ∼ N(µh, σ

2h/(1− φ2

h

)). For identification reasons, we must either set γ equal to

1, leave µh unrestricted, or fix µh at zero and estimate γ > 0, see Kim et al. (1998). Thus, we choose

to set γ = 1 and leave µh unrestricted. Finally, we collect all model parameters in θ =(µ, µh, φh, σ

2h

)′.

It has been observed that the plain SV model is too restrictive in many financial applications. For

instance, the SV model that is studied in the bulk of the literature typically assumes that the measure-

ment and state equation disturbances are Gaussian and uncorrelated. Accordingly, it is interesting to

extend the SV model in different ways, for instance: (a) with a fat-tailed distribution of the conditional

mean innovations, εt, t = 1, ..., T , (b) with a “leverage” effect, in which εt and ζt are correlated, (c)

with SV in mean effects and (d) allowing for breaks in the model parameters.

Over the years, many papers have dealt with incorporating these extensions, see Chib et al. (2002),

Berg et al. (2004), Jacquiera et al. (2004), Omori et al. (2007), Nakajima and Omori (2012) and Chan

(2014). However, from a computational point of view, practitioners often face two main challenges

when dealing with the above extensions.

1. While sampling the model parameters, θ, from their conditional posteriors is relatively easy,

sampling h1:T ∼ p (h1:T | θ, YT ) is often difficult. Specifically, the degree of difficulty which

one encounters with regards to sampling h1:T ∼ p (h1:T | θ, YT ) depends on the specific model

structure at hand. For instance, we can estimate SV and its fat-tailed extension by using the

so-called auxiliary mixture sampler, see Kim et al. (1998) and Chib et al. (2002). However,

this approach is rather model specific, and cannot be easily generalized to estimate SV models

with leverage or with in mean effects. Over the years, some papers have dealt with Gibbs

sampling estimation of these extensions by either extending the auxiliary mixture sampler of

Kim et. al. (1998), see Omori et al. (2007), or adopting an accept-reject Metropolis-Hastings

procedure to draw h1:T from its conditional posterior, see Chan (2014). However, these extensions

require relatively more programming effort. More importantly, one cannot use a unified sampling

algorithm in order to accommodate (a), (b) and (c).

2. Even though sampling h1:T ∼ p (h1:T | θ, YT ) is possible within a Gibbs sampling scheme, ob-

taining p (YT | θ) which is necessary for ML and DIC computation is usually very cumbersome.

Often, one resorts to running additional sampling algorithms in order to obtain p (YT | θ), see

Kim et al. (1998), Chib et al. (2002) and Chan and Grant (2014).

On the other hand, for PG-AS, we can maintain the same program structure, incorporating minor

changes in the codes by directly changing the measurement or state equations inside the conditional

particle filter algorithm, see Section 4.3. At the same time, p (YT | θ) is directly available from the

particle filter, which can then be used to compute ML and DIC, see Section 4.3.1. Furthermore,

as stated in Section 4.1, PG-AS alleviates possible path degeneracy problems encountered in other

PMCMC algorithms in a very computationally elegant fashion through the ancestor step.

As correctly suggested by a referee, in our settings, we have the flexibility to sample some parameters

conditional on h1:T and YT , whereas we can sample other parameters marginally, i.e. conditional only

on YT . For instance, for the SV model, we can start by drawing h(i)1:T | θ(i−1), YT using the conditional

particle filter with ancestor sampling, see Section 4.3. Then, we can draw µ(i) | h(i)1:T , YT . However,

instead of sampling µ(i)h | φ

(i−1)h , σ

2(i−1)h , h

(i)1:T , φ

(i)h | µ

(i)h , σ

2(i−1)h , h

(i)1:T and σ

2(i)h | µ(i)

h , φ(i)h , h

(i)1:T element-

by-element, we can sample µ(i)h , φ

(i)h , σ

2(i)h | µ(i), YT in one-block without conditioning on h

(i)1:T using the

Page 95: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

86

particle marginal Metropolis-Hastings approach of Andrieu et al. (2010). Specifically, we draw the

candidate, ϕ∗ =(µ∗h, φ

∗h, σ

2∗h

)′, from a random walk proposal, ϕ∗ = ϕ(i−1) + cε(i). The prior of ϕ,

p (ϕ) ∝ p (µh) p (φh) p(σ2h

), is evaluated directly. Finally, for the PMMH step, we run two additional

particle filters to obtain the estimates of p(YT | µ(i), ϕ∗

)and p

(YT | µ(i), ϕ(i−1)

)in order to perform

Metropolis-Hastings (M-H) to either accept or reject ϕ∗5, see equation (13) in Andrieu et al. (2010)

and Flury and Shephard (2011).

As we shall see, the above approach is very advantageous in the context of estimating the SV model

with leverage effect, SVL, see Section 4.3. Specifically, due to the leverage effect, ρh, the conditional

posteriors of µh, φh and σ2h for the SVL model are different than those given in Kim et al. (1998), see

Nakajima and Omori (2012). Therefore, we need to perform a good deal of modifications in the codes

if we are to draw these parameters element-by-element. However, the PG-AS/PMMH combination

avoids the need to perform major modifications in the codes.

4.3. Particle Gibbs with Ancestor Sampling

In the following, we describe the steps of the conditional particle filter with ancestor sampling (CPF-

AS) which is used to draw h1:T from p (h1:T | θ, YT ), see Lindsten et al. (2012)6. Consider (4.2.1)-

(4.2.2), let i = 1, ..., N denote the number of Gibbs sampling iterations, j = 1, ...,M denote the number

of particles, and let p (yt | θ, ht, Yt−1) denote the density of yt given θ, ht and Yt−1. Finally, let h(i−1)1:T

be a fixed reference trajectory of h1:T sampled at iteration i − 1 of the Gibbs sampler. The steps of

CPF-AS (particle filter conditional on h(i−1)1:T ) are as follows

1. if t = 1

(a) Draw h(j)1 | θ for j = 1, ...,M − 1 and set h

(M)1 = h

(i−1)1 .

(b) Set w(j)1 = τ

(j)1 /ΣM

k=1τ(k)1 , where τ

(j)1 = p

(y1 | θ, h(j)

1 , Y0

)for j = 1, ...,M .

2. else for t = 2 to T do

(a) Resample h(j)t−1

M−1j=1 using indices a

(j)t , where p

(a

(j)t = k

)∝ w(k)

t−1.

(b) Draw h(j)t | h

(a(j)t

)t−1 , θ for j = 1, ...,M − 1.

(c) Set h(M)t = h

(i−1)t .

(d) Draw a(M)t from p

(a

(M)t = j

)∝ w(j)

t−1p(h

(i−1)t | h(j)

t−1, θ)

.

(e) Set h(j)1:t =

(h

(a(j)t

)1:t−1 , h

(j)t

), w

(j)t = τ

(j)t /ΣM

k=1τ(k)t , where τ

(j)t = p

(yt | θ, h(j)

t , Yt−1

).

3. end for

4. Sample h(i)1:T | θ, YT with p

(h

(i)1:T = h

(j)1:T | θ, YT

)∝ w(j)

T .

5Intuitively, the latter approach should be more efficient in the sense that sampling ϕ marginally in one-block willprobably produce draws of ϕ that are relatively less autocorrelated. In fact, we estimate the SV model using bothmethods and compare the inefficiency factors for each procedure. Overall, compared to sampling the volatilityparameters element-by-element conditional on h1:T , we find that sampling ϕ all-at-once using PMMH reduces theinefficiency factors of φh and σ2

h. On the other hand, the inefficiency factor of µh increases.6As correctly pointed out by a referee, it is important to note that we are actually drawing from p

(h1:T , a

(M)1:T | θ, YT

),

where we draw a(M)1:T in step (d). Thus, from a technical point of view, we are not drawing from the true conditional

posterior, p (h1:T | θ, YT ), but from a close approximation, p(h1:T , a

(M)1:T | θ, YT

). However, in order to ease the

notation burden and avoid unnecessary confusion, we use the notation p (h1:T | θ, YT ) in the text.

Page 96: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

87

Notice that CPF-AS is akin to a standard particle filter, but with the difference that hM1:T is specified

a priori and serves as a reference trajectory. Hence, we use only M − 1 particles at each step.

Furthermore, whereas in the particle Gibbs algorithm of Andrieu et al. (2010), we set a(M)t = M , in

PG-AS, we sample a new value for the index variable, a(M)t , in an ancestor sampling step, (d). Even

though this is a small modification of the algorithm, improvements in mixing can be quite considerable,

see Lindsten et al. (2012) and Lindsten et al. (2014).

Extending the plain SV model within the PG-AS framework is often very straightforward. For

instance, assume that εt ∼ St (v), where St stands for the Student-t distribution with v > 2 degrees

of freedom. For this specification, at the ith iteration of the PG-AS sampler, we can follow Bollerslev

(1987) and use

p (yt | θ, ht, Yt−1) =Γ(v+1

2

)Γ(v2

)√(v − 2)π

1

σt

(1 +

(yt − µ)2

(v − 2)σ2t

)−(v+1)/2

(4.3.1)

inside the CPF-AS algorithm and obtain h(i)1:T . Conditional on h

(i)1:T and YT , we can perform In-

dependence Chain Metropolis-Hastings (M-H) to sample µ and v. For instance, in order to sample

v(i) ∼ p(v | µ(i), h

(i)1:T , YT

), we generate a candidate, v∗, from q (v) ∼ TN]2,∞[ (vML, V ), where TN]2,∞[

stands for the truncated Normal density on the domain ]2,∞[. vML is obtained by maximizing (4.3.1)

with respect to v using the already obtained values of h(i)1:T and µ(i). We set V = c · var (vML), where

c ∈ R+ and fine-tune V by adjusting c such that we can get a decent M-H acceptance ratio around 50

to 60%. The M-H acceptance probability is given as

aMH

(v∗, v(i−1)

)= min

1,p(v∗ | µ(i), h

(i)1:T , YT

)q(v(i−1)

)p(v(i−1) | µ(i), h

(i)1:T , YT

)q (v∗)

. (4.3.2)

We draw u from U (0, 1) and accept v∗, i.e. v(i) = v∗, if aMH

(v∗, v(i−1)

)> u, else v(i) = v(i−1).

Thereafter, we sample ϕ element-by-element, see Chib et al. (2002). For the SV models with heavy-

tailed errors, we also sample(µh, φh, σ

2h

)′using PMMH. However, we do not find any differences to

drawing these parameters conditional on h1:T using standard Gibbs sampling techniques.

Incorporating leverage or stochastic volatility in mean effects is also very straightforward using

PG-AS. First, we note that when the disturbances are conditionally Gaussian, we can write ζt as

ζt = ρhεt +√(

1− ρ2h

)ξt, ξt ∼ N (0, 1) and corr (εt, ξt) = 0. Thus, (4.2.2) can be reformulated as

ht = µh + φh (ht−1 − µh) + ρhσh (yt−1 − µ) exp (−ht−1/2) + σh

√(1− ρ2

h

)ξt.

Hence, the model adopts the above Gaussian nonlinear state-space form where the parameter, ρh,

measures the leverage effect. Alternatively, we can write the measurement error, εt, as εt = ρhζt +√(1− ρ2

h

)ξt. This way yt | ζt ∼ N

(ρh exp (ht/2) ζt,

(1− ρ2

h

)exp (ht)

), where ht follows (4.2.2),

see Malik and Pitt (2011). Thereafter, we proceed to sample µ | h1:T , YT and then sample ϕ =(µh, φh, σ

2h, ρh

)′all-at-once conditional on µ and YT using PMMH. We also ensure that

∣∣∣φ(i)h

∣∣∣ < 1,

σ2(i)h > 0 and

∣∣∣ρ(i)h

∣∣∣ < 1 by resampling these parameters until the conditions are satisfied. Furthermore,

we use the following random walk proposals for the parameters: 4µ(i)h = 0.3162ε

(i)1 , 4φ(i)

h = 0.01ε(i)2 ,

Page 97: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

88

4σ2(i)h = 0.01ε

(i)3 and 4ρ(i)

h = 0.03ε(i)4 , where εk ∼ N (0, 1) for k = 1, ..., 4. This way, we obtain a M-H

acceptance ratio around 35 to 40%, see also Flury and Shephard (2011). However, contrary to Flury

and Shephard (2011), we sample ϕ all-at-once and not element-by-element. The latter approach is

computationally very demanding as after drawing h1:T and µ, we need to run a particle filter scheme

eight times at each PG-AS iteration in order to sample ϕ.

In a similar fashion, incorporating SV in mean effects (SVM) within the PG-AS context is also

very easy. For this specification, we note that p (yt | µ, λ, ϕ, ht, Yt−1) ∼ N (µ+ λ exp (ht) , exp (ht)).

Thus, we only need to modify step (e) of CPF-AS. We set τ(j)t = N

(µ+ λ exp

(h

(j)t

), exp

(h

(j)t

)),

j = 1, ...,M − 1 instead of τ(j)t = N

(µ, exp

(h

(j)t

))in case of the plain SV model. We then sample

ϕ =(µh, φh, σ

2h

)′element-by-element conditional on h1:T . The pair (µ, λ) is sampled in one block

from its Gaussian conditional posterior. Finally, we can combine SVM effects with heavy tails by

using (4.3.1), (4.3.2) and modifying step (e) of CPF-AS.

4.3.1. Model comparison using the output from PG-AS

One of the main outputs from CPF-AS is the log-likelihood of YT with h1:T integrated out, p (YT | θ).This quantity is the product of the individual integrated likelihood contributions

p (YT | θ) =T∏t=1

p (yt | θ, Yt−1) . (4.3.3)

For instance, (4.3.3) can be used to compute the marginal likelihood (ML) for a particular model. The

marginal likelihood is defined as

p (YT ) =

ˆΘp (YT | θ) p (θ) dθ. (4.3.4)

Equation (4.3.4) is a measure of the success the model has in accounting for the data after the

parameter uncertainty has been integrated out over the prior, p (θ). Gelfand and Dey (1994) propose

a very compelling and general method to calculate ML. It is efficient and utilizes the same routines

when calculating ML for different models. The Gelfand-Dey (G-D) estimate of ML is given as

1

N

N∑i=1

g(θ(i))

p(YT | θ(i)

)p(θ(i)) → p (YT )−1 as N →∞, (4.3.5)

where an estimate of p (YT | θ), p (YT | θ) = ΠTt=1M

−1ΣMj=1τ

(j)t , is directly available from CPF-AS

and E[

p (YT | θ)]

= p (YT | θ), see Flury and Shephard (2011). Gelfand and Dey (1994) show that

if g(θ(i))

is thin-tailed relative to p(YT | θ(i)

)p(θ(i)), then (4.3.5) is bounded and the estimator is

consistent. Following Geweke (2005) the truncated Normal distribution, TN (θ∗,Σ∗), is used for g (θ).

θ∗ and Σ∗ are the posterior sample moments calculated as

θ∗ =1

N

N∑i=1

θ(i) and Σ∗ =1

N

N∑i=1

(θ(i) − θ∗

)(θ(i) − θ∗

)′

Page 98: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

89

whenever θ(i) is in the domain of the truncated Normal. This domain, Θ is defined as

Θ =

θ :(θ(i) − θ∗

)′(Σ∗)−1

(θ(i) − θ∗

)≤ χ2

α (z)

,

where z is the dimension of the parameter vector and χ2α (z) is the αth percentile of the Chi-

squared distribution with z degrees of freedom. In practice, 0.5, 0.75, 0.95 and 0.99 are popular

selections for α. Once the marginal likelihood for different specifications has been calculated, we

can compare them using Bayes factors, BF. The relative evidence for model MA versus MB is

BFMAB= p (YT | MA) /p (YT | MB). Kass and Raftery (1995) recommend considering twice the

logarithm of the Bayes factor for model comparison and suggest a rule-of-thumb of support for MA

based on 2 logBFMAB: 0 to 2 not worth more than a bare mention, 2 to 6 positive, 6 to 10 strong,

and greater than 10 as very strong.

We can also use p (YT | θ) and compute the deviance information criterion (DIC) of Spiegelhalter et

al. (2002). DIC is a compelling alternative to AIC or BIC, and it can be applied to nested or non-nested

models. Calculation of DIC in a PG-AS scheme is trivial. Contrary to AIC or BIC, DIC does not

require maximization over the parameter space. DIC is a combination of p (YT | θ) and a penalty term,

pD. This term describes the complexity of the model and serves as a penalization term that corrects

deviance’s propensity toward models with more parameters. More precisely, pD = D (θ)−D(θ), where

D (θ) is approximated by N−1ΣNi=1 − 2 log p

(YT | θ(i)

)and D

(θ)

= −2 log p(YT | θ

). θ is estimated

from the PG-AS output using the mean or mode of the posterior draws. The DIC is defined as

DIC = D(θ)

+ 2pD. It is worth noting that the best model is the one with the smaller DIC. Very

roughly, for differences of more than 10, we might definitely rule out the model with the higher DIC.

Furthermore, as pointed out by Spiegelhalter et al. (2002), we must be cautions against using ML

as a basis against which to assess DIC. ML addresses how well the prior has predicted the observed

data, whereas DIC addresses how well the posterior might predict future data generated by the same

parameters that give rise to the observed data.

4.4. Dow Jones Industrial Average

We summarize our stochastic volatility models in Table 4.1. Besides the models discussed in Section

4.3, we also consider versions of the moving average model with SV errors, i.e. yt = µ+ εt+ψεt−1, see

Chan (2013) and Chan and Hisao (2013). Furthermore, for comparison, we also estimate a standard

GARCH(1,1) model

yt = µ+ σtεt, εt ∼ N (0, 1) , σ2t = ω + aε2

t−1 + bσ2t−1, (4.4.1)

where ω > 0, a > 0, b > 0 and a + b < 1. For this model, we also choose to sample ϕ = (ω, a, b)′

all-at-once using the Independence Chain Metropolis-Hastings algorithm. Furthermore, we also ensure

that a(i) + b(i) < 1 by resampling a(i) > 0 and b(i) > 0, until a(i) + b(i) < 1. We assume the same

priors as in Kim et al. (1998) for µh, φh and σ2h. Furthermore, we assume that µ, λ ∼ N (0, 10),

ρh, ψ ∼ TN]−1,1[ (0, 10) and v ∼ U (2, 128), where U stands for the Uniform distribution with lower

(upper) endpoint of 2(128), see Chib et al. (2002). Finally, the priors on θ for the GARCH(1,1) model

are independent Normals with mean 0, variance 10, truncated (except for µ) to satisfy the restrictions.

Page 99: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

90

Table 4.1.: Model specifications

Model specification description

1 GARCH(1,1) plain GARCH(1,1) model, (4.4.1).

2 SV plain stochastic volatility model, (4.2.1)-(4.2.2).

3 SVL stochastic volatility model with leverage effect.

4 SVM stochastic volatility in mean model.

5 SV-MA(1) SV model where the measurement error, εt, follows a MA(1) process.

6 SVt stochastic volatility model with Student-t distributed errors, i.e. εt ∼ St (v).

7 SVMt stochastic volatility in mean model with Student-t distributed errors.

8 SVt-MA(1) SVt model where the measurement error, εt ∼ St (v), follows a MA(1) process.

This table presents the SV model specifications. The second column lists the model label and the third column briefly

explains the model characteristics.

The top right panel of Figure 4.1 displays the daily DJIA index for the period 01/02/2007-12/31/2013,

for a total of T = 1740 observations, followed by the daily returns, and the posterior estimates

of σt = exp (ht/2), t = 1, ..., T for SVt-MA(1). From these figures strong differences in return and

volatility are immediately apparent. As expected, the conditional volatility drastically increases during

the financial crises of 2008. For each model, we set M = 100 (throughout this paper) and run each

model based on 20 parallel chains each of which is of length 20000 after a burn-in period of 1000, for a

total of 200000 posterior draws7. We also experiment with different values of M to find out its effects

on estimation results, see the Appendix.

Figure 4.1.: PG-AS sampler for SVt-MA(1), DJIA daily returns, 2007 to 2013

07 08 09 10 11 12 13 140.5

0.9

1.3

1.7

2.1x 10

4 (a)

07 08 09 10 11 12 13 14−10

−4

2

8

14(b)

07 08 09 10 11 12 13 140

2

4

6

8(c)

1

3

5

7

9

PG−AS, M=100

(d)

95%−tileσ

t

5%−tile

Graph (a): daily DJIA index, (b): daily DJIA returns, (c): posterior estimates of σt, t = 1, ..., T , for SVt-MA(1), (d):box plot of the inefficiency factors of h1:T for SVt-MA(1).7In order to calculate the numerical standard errors of ML and DIC, we use “brute force”, which is re-estimating the

models 20 times and estimating the numerical standard errors of ML and DIC by their sample standard deviations,see Berg et al. (2004) and Chan and Grant (2014). Overall, we find numerical standard errors of around 0.8.

Page 100: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

91

Parameter estimates, ML and DIC values for the models in Table 4.1 are reported in Table 4.2. Overall,

we find that SVt-MA(1) performs best in terms of ML and DIC. In general, fat-tailed errors, volatility

in mean effects, moving average errors and the leverage effect all seem to be useful additions to the

plain SV model.

Table 4.2.: Posterior means and standard deviations (in parentheses), DJIA daily returns

Parameter GARCH(1,1) SV SVL SVM SV-MA(1) SVt SVMt SVt-MA(1)

µ 0.067 0.089 0.115 0.125 0.089 0.088 0.126 0.091

(0.019) (0.019) (0.021) (0.023) (0.018) (0.017) (0.020) (0.016)

λ -0.061 -0.058

(0.023) (0.019)

ψ -0.070 -0.069

(0.025) (0.025)

µh -0.123 -0.364 -0.119 -0.163 -0.126 -0.127 -0.126

(0.293) (0.343) (0.295) (0.323) (0.371) (0.367) (0.379)

φh 0.979 0.979 0.979 0.983 0.986 0.986 0.987

(0.006) (0.006) (0.006) (0.006) (0.005) (0.005) (0.005)

σ2h 0.052 0.051 0.051 0.043 0.034 0.033 0.032

(0.012) (0.013) (0.011) (0.010) (0.008) (0.009) (0.008)

ρh -0.367

(0.074)

v 9.362 9.456 9.264

(2.050) (2.057) (2.014)

ω 0.024

(0.002)

a 0.105

(0.020)

b 0.875

(0.020)

log(ML), α = 0.50 -2525.175 -2489.643 -2477.678 -2488.327 -2477.303 -2473.272 -2471.481 -2468.169

log(ML), α = 0.75 -2524.769 -2489.237 -2477.273 -2487.922 -2476.898 -2472.867 -2471.076 -2467.764

log(ML), α = 0.95 -2524.533 -2489.001 -2477.037 -2487.685 -2476.662 -2472.630 -2470.839 -2467.527

log(ML), α = 0.99 -2524.492 -2488.960 -2476.995 -2487.644 -2476.620 -2472.589 -2470.798 -2467.486

DIC 5035.615 4958.287 4939.868 4950.292 4939.034 4934.034 4929.033 4926.707

Rank 8 7 5 6 4 3 2 1

This table reports posterior means and standard deviations for various SV models using DJIA daily returns. log(ML):logarithm of the marginal likelihood for the corresponding value of α. DIC: deviance information criterion. Rank: rankof the model based on ML and DIC. Total number of observations, T = 1740.

Furthermore, SVt, SVMt and SVt-MA(1) outperform their Gaussian counterparts both in terms of

ML and DIC. Finally, the plain SV model outperforms the GARCH(1,1) model. For instance, the

logBF (DIC) of SV versus GARCH(1,1) is 35.53 (77.32).

We report posterior means and standard deviations of the model parameters in Table 4.2. It can

be seen that the estimated means and standard deviations of the parameters appear quite reasonable

and comparable with previous estimates reported in the literature. Typically, the volatility process

is estimated to be highly persistent. For SVL, the posterior mean of ρh is −0.36 with a posterior

standard deviation of 0.07. This suggests that the leverage effect is an important feature for the

DJIA returns. Furthermore, the posterior estimates of λ and ψ both support in mean and serial

Page 101: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

92

dependence extensions. For instance, for ψ = 0, the SV-MA(1) model reduces to the plain SV model.

The posterior mean of ψ is estimated at −0.07 with a posterior standard deviation of 0.02, i.e. the

posterior distribution of ψ has little mass around zero. Furthermore, SV-MA(1) outperforms SV. For

models with Student-t distributed errors, we estimate the posterior mean of v at around 9, similar to

the values in Chib et al. (2002) and Berg et al. (2004), respectively.

We report the inefficiency factors of h1:T for SVt-MA(1) in panel (d) of Figure 4.1. The inefficiency

factor, RB, is defined as RB = 1 + 2∑B

l=1 ρ (l), where ρ (l) is the sample autocorrelation at lag l,

and B is the bandwidth, see Kim et al. (1998) for a further background on this measure. In these

calculations, we choose a bandwidth, B, of 100. Furthermore, note that ht is of length T , so we have

a total of T inefficiency factors. Therefore, we use box plots to report this information. The middle

line of the box denotes the median, while the lower and upper lines represent the 25% and 75%-tiles,

respectively. For instance, the box plot indicates that about 75% of the log-volatilities have inefficiency

factors of less than 5, and the maximum is close to 6.8. Overall, given M = 100, we see that PG-AS

is very capable of producing draws of h1:T that are not highly autocorrelated, see the Appendix.

4.5. Structural Breaks and PG-AS

In this section, we combine PG-AS with the change-point (or structural break) specification of Chib

(1998). We believe that this combination is the most important contribution that we provide in this

paper. In fact, only relatively few papers have addressed combining models with SV effects with what

in the literature is considered as a “state-of-the-art” structural break model.

We start by using a change-point autoregressive model with stochastic volatility in mean effects,

CP(m)-AR(p)-SVM. We follow Liu and Maheu (2008) and estimate our model conditional on 0, 1, ...,m

breaks. For each of these, we calculate ML, DIC and use them to determine the optimal number of

change points. Specifically, we can compare ML using Bayes factors, and use differences in the DIC

between different models to determine the number of structural breaks. Furthermore, in our analyses,

we do not get conflicting results with regards to change-point identification using ML and DIC.

Assume that there are m − 1, m ∈ 1, 2, ... change points at unknown times, τ1, τ2, ..., τm−1.

Separated by those change points, there are m different regimes. The m-state change-point linear

regression model with simultaneous breaks in the SVM coefficients is given as

yt = Xt−1βst + λstγst exp (ht) +√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.1)

ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) , (4.5.2)

where γst = exp (µh,st), st = 1, ...,m, s1:T = (s1, ..., sT )′

and st = k indicates that yt is from regime

k. As before, YT = (y1, ..., yT )′

is T × 1, XT is a T × n, matrix of regressors with row Xt−1 which can

also include lags of yt and βst is n× 1. The one-step ahead transition matrix for st is

P =

p11 p12 0 · · · 0

0 p22 p23 · · · 0...

......

......

...... 0 pm−1,m−1 pm−1,m

0 0 · · · 0 1

, (4.5.3)

Page 102: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

93

where plk = Pr (st = k | st−1 = l) with k = l or k = l+ 1. plk is the probability of moving from regime

l at time t − 1 to regime k at time t. P ensures that given st = k at time t, the next period t + 1,

st + 1 remains in the same state or jumps to the next state. Once the last regime is reached, we stay

there forever, that is pm,m = 1. This structure enforces the following ordering

θt =

θ1 if t < τ1

θ2 if τ1 ≤ t < τ2

......

...

θm−1 if τm−2 ≤ t < τm−1

θm if τm−1 ≤ t

on the change points. Let θk =(β′k, λk, γk, φh,k, σ

2h,k

)′, k = 1, ...,m denote the parameters in regime k.

Modeling (4.5.1)-(4.5.2) using (4.5.3) is straightforward. Specifically, we proceed by cycling through

the following steps8

1. s1:T | βkmk=1, λkmk=1, γkmk=1, φh,kmk=1, σ2h,kmk=1, P, h1:T , XT , YT .

2. h1:T | βkmk=1, λkmk=1, γkmk=1, φh,kmk=1, σ2h,kmk=1, s1:T , XT , YT .

3. βk, λkmk=1 | γkmk=1, s1:T , h1:T , XT , YT .

4. γkmk=1 | βkmk=1, λkmk=1, s1:T , h1:T , XT , YT .

5. φh,kmk=1 | σ2h,kmk=1, s1:T , h1:T .

6. σ2h,kmk=1 | φh,kmk=1, s1:T , h1:T .

7. P | s1:T .

In step 1, we use the algorithm of Chib (1998) to draw s1:T , see Liu and Maheu (2008). h1:T is sampled

using CPF-AS from Section 4.3 conditional on the newly draws of s1:T and θ1, ..., θm from the previous

Gibbs iteration. The parameters within each regime: βk, λkmk=1, φh,kmk=1 and σ2h,kmk=1 are sampled

using standard Gibbs sampling techniques conditional on the newly draws of s1:T and h1:T . However,

for (4.5.1)-(4.5.2), the conditional posterior of γk does not have a closed form solution. Therefore, we

sample γk, k = 1, ...,m, using the Independence Chain Metropolis-Hastings algorithm. Specifically,

let hk = ht : st = k, Xk = Xt−1 : st = k, Yk = yt : st = k denote observations in regime k. At

iteration i, we sample γ∗k ∼ TN]0,∞[ (γk,ML, Vk), where γk,ML is obtained by maximizing the likelihood

of (4.5.1) conditional on β(i)k , λ

(i)k , h

(i)k , X

(i)k and Y

(i)k . As before, we set Vk = ck · var (vk,ML). The

M-H acceptance probability of γ∗k is given as

aMH

(γ∗k , γ

(i−1)k

)= min

1,p(γ∗k | β

(i)k , λ

(i)k , h

(i)k , X

(i)k , Y

(i)k

)q(γ

(i−1)k

)p(γ

(i−1)k | β(i)

k , λ(i)k , h

(i)k , X

(i)k , Y

(i)k

)q(γ∗k) .

8It is also possible to generate h1:T and s1:T simultaneously in one step. However, this approach requires majormodifications inside CPF-AS. We choose to sample h1:T and s1:T sequentially because it is computationally easierand more intuitive. Indeed, using the latter approach, we only need to incorporate a procedure to draw s1:T . Thus,we do not need to modify anything inside CPF-AS. In fact, this underlines one of the main points of this paper,namely, that we can estimate complicated models without the need for major modifications inside CPF-AS.

Page 103: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

94

The conditional posterior of pkk, k = 1, ...,m− 1, is Beta (a0 + nkk, b0 + 1), where nkk is the number

of one-step transitions from state k to state k in a given sequence of s1:T .

We let(β′k, λk

)∼ N

(0(n+1), I(n+1)

), γk ∼ IG

(42 ,

0.22

), (φh,k + 1) /2 ∼ Beta (20, 1.5), σ2

h,k ∼IG(

42 ,

0.22

), whereas before, IG

(.2 ,

.2

)stands for the Inverse-gamma density, see Kim and Nelson

(1999). Furthermore, we let pkk ∼ Beta (a0 = 20, b0 = 0.1) for k = 1, ...,m − 1. In this setting, most

priors are very uninformative, while the prior for pkk favors infrequent structural breaks. Finally,

notice that in order to obtain p (YT | θ, P ) in step 1, we use that

log p (YT | θ, P ) =T∑t=1

log p (yt | θ, P, Yt−1) , where

p (yt | θ, P, Yt−1) =

m∑k=1

p (yt | θ, Yt−1, st = k) p (st = k | θ, P, Yt−1) .

The first term, p (yt | θ, Yt−1, st = k), is obtained from CPF-AS. The last term is computed from

p (st = k | θ, P, Yt−1) =

k∑l=k−1

p (st−1 = l | θ, P, Yt−1) plk, k = 1, ...,m

p (st = k | θ, P, Yt) =p (st = k | θ, P, Yt−1) p (yt | θ, Yt−1, st = k)∑ml=1 p (st = l | θ, P, Yt−1) p (yt | θ, Yt−1, st = l)

, k = 1, ...,m.(4.5.4)

The last equation is obtained from Bayes’ rule. Note that in (4.5.4), the summation is only from k−1

to k, due to the restricted nature of the transition matrix.

4.5.1. Simulation example

As a simple illustration, consider data generated according to the following model

yt = µst + λstγst exp (ht) +√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.5)

ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) (4.5.6)

with st = 1, 2, µ1 = 1, µ2 = 0.1, λ1 = 0.2, λ2 = 1.2, γ1 = 1.4, γ2 = 0.2, φh,1 = 0.92, φh,2 = 0.87,

σ2h,1 = 0.04, σ2

h,2 = 0.02 and t = 1, ..., T = 500. The true date of the structural break is t = 230.

We estimate (4.5.5)-(4.5.6) conditional on m = 0, 1, 2 breaks. Both ML and DIC point that the

model with one break performs best. The top panel of Figure 4.2 compares the predictive mean of our

model (break) to a recursive OLS specification (no-break) along with yt and the estimated change-

point date. Both predictive means are basically similar before the break at t = 230. However, after

the break, we see a quick reduction in the predictive mean from the break model, while the predictive

mean from the no-break model remains high for a long time. Using the posterior mode of s(i)1:T Ni=1,

the estimated change-point date is t = 230. Clearly, our model is able to detect the correct date of

the break.

The marginal posteriors of γk, k = 1, 2 are bell shaped and centered around their means, 1.14 for

γ1 and 0.18 for γ2, respectively. In panels (c) and (d) of Figure 4.2, we plot the marginal posteriors

of λk, k = 1, 2. Similar to γk, the marginal posteriors of λk are bell shaped and centered around their

means, 0.48 for λ1 and 1.35 for λ2, respectively.

Page 104: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

95

Figure 4.2.: Simulation results

0 100 200 300 400 500−2

0

2

4

6(a)

0 100 200 300 400 5000

1.5

3

4.5

6(b)

−1.4 −0.2 1 2.2 3.4 4.60

0.3

0.6

0.9

1.2(c)

−1.4 −0.2 1 2.2 3.4 4.60

0.25

0.5

0.75

1(d)

yt

BreakNo−breakCP−date

True95%−tile

σ2t

5%−tile

Graph (a): data, predictive means and the estimated change-point date, (b): true σ2t process, posterior mean (solid line)

and 90% credibility intervals (dashed lines) of σ2t , (c) and (d): marginal posterior distributions of λk, k = 1, 2.

4.5.2. Real output

Recent literature documents a structural break in the volatility of US GDP growth rate, see Kim and

Nelson (1999), and Gordon and Maheu (2008). We follow Gordon and Maheu (2008) and consider

structural break estimates from structural break AR(2) models in real US GDP growth rate.

Let yt = 100 [log (qt/qt−1)− log (pt/pt−1)], where qt is quarterly seasonally adjusted US GDP and

pt is the US GDP price index. Our data ranges from 1947q4 to 2013q4, for T = 265 observations. In

the following, we compare the performance of (4.5.1)-(4.5.2) with 2 lags of yt, CP(m)-AR(2)-SVM to

CP(m)-AR(2)-SV and a simple change-point AR(2) model, henceforth CP(m)-AR(2)

yt = β1,st + β2,styt−1 + β3,styt−2 + σstεt, εt ∼ N (0, 1) . (4.5.7)

Equation (4.5.7) is estimated using Gibbs sampling, see Liu and Maheu (2008). We estimate (4.5.1)-

(4.5.2), (4.5.7) conditional on m = 0, 1, 2 change points. Thereafter, we determine the optimal number

of change points using ML and DIC. We also compute the marginal likelihood for the change-point

SV(M) models using the method of Sims et al. (2008). As pointed out in Sims et al. (2008), the G-D

method may not work for models with time-varying parameters as the posterior density tends to be

non-Gaussian. However, we do not find any significant changes compared to G-D. Thus, we choose

to retain these values. We also conduct a Monte Carlo analysis (not reported) generating data from

(4.5.1)-(4.5.2) for 0, .., 3 change points and comparing ML between different specifications.

Page 105: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

96

Table 4.3.: Posterior means and standard deviations (in parentheses), US quarterly GDP growth rate

Parameter AR(2) AR(2)-SV AR(2)-SVM CP(1)-AR(2) CP(1)-AR(2)-SV CP(1)-AR(2)-SVM

β1,1 0.461 0.436 0.436 0.560 0.574 0.772

(0.078) (0.071) (0.088) (0.123) (0.128) (0.313)

β2,1 0.337 0.299 0.299 0.323 0.325 0.321

(0.061) (0.065) (0.065) (0.084) (0.086) (0.084)

β3,1 0.093 0.159 0.156 0.052 0.056 0.049

(0.061) (0.063) (0.064) (0.083) (0.086) (0.083)

β1,2 0.275 0.365 0.740

(0.080) (0.090) (0.193)

β2,2 0.340 0.258 0.145

(0.090) (0.098) (0.111)

β3,2 0.255 0.258 0.217

(0.087) (0.088) (0.087)

λ1 0.007 -0.181

(0.105) (0.360)

λ2 -1.302

(0.645)

σ21 0.804 1.229

(0.070) (0.147)

σ22 0.282

(0.038)

γ1 0.686 0.682 1.081 0.976

(0.384) (0.350) (0.232) (0.215)

γ2 0.235 0.187

(0.059) (0.046)

φh,1 0.950 0.949 0.829 0.820

(0.026) (0.027) (0.101) (0.106)

φh,2 0.831 0.842

(0.091) (0.083)

σ2h,1 0.078 0.078 0.058 0.059

(0.034) (0.035) (0.030) (0.029)

σ2h,2 0.086 0.089

(0.051) (0.046)

log(ML), α = 0.50 -362.802 -305.252 -303.962 -312.651 -287.933 -279.568

log(ML), α = 0.75 -362.396 -304.847 -303.556 -312.246 -287.527 -279.162

log(ML), α = 0.95 -362.160 -304.610 -303.320 -312.009 -287.291 -279.926

log(ML), α = 0.99 -362.119 -304.569 -303.278 -311.968 -287.249 -279.885

DIC 701.187 657.393 656.923 666.927 643.983 639.892

Rank 6 4 3 5 2 1

This table reports posterior means and standard deviations for various AR(2) and CP(m)-AR(2) models. The parametersassociated with each regime are labeled with subscript 1, ...,m. log(ML): log-marginal likelihood for the correspondingvalue of α. DIC: deviance information criterion. Rank: rank of the model based on ML and DIC. Total number ofobservations, T = 265.

Page 106: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

97

Results indicate that the G-D method correctly identifies the true number of structural breaks9. We

find that all change-point models produce similar results suggesting that one structural break has

occurred. Compared to AR(2)-SVM, the logBF in favor of CP(1)-AR(2)-SVM is 23, see Table 4.3.

CP(1)-AR(2)-SVM also dominates its constant parameter counterpart in terms of DIC. Accordingly,

change-point SV specifications perform better than (4.5.7). At the same time, CP(1)-AR(2)-SVM

performs better than CP(1)-AR(2)-SV.

The posterior density of the change point for CP(1)-AR(2)-SVM is plotted in Figure 4.3. Using the

mode of s(i)1:T Ni=1, CP(1)-AR(2) and CP(1)-AR(2)-SV point that the break date is 1983q3, identical

to the break date of Gordon and Maheu (2008) and close to the break date of Kim and Nelson (1999).

Specifically, Kim and Nelson (1999) find evidence of a break in 1984q1 using data from 1953q2 to

1997q1. On the other hand, the mode of s(i)1:T Ni=1 for CP(1)-AR(2)-SVM points that the break date

is 1984q2. Evidently, structural break models indicate a significant one-time drop in the volatility

of yt. For CP(1)-AR(2), the first regime implies an unconditional variance for the US GDP growth

rate of 1.22, while for the second regime it is 0.28. For CP(1)-AR(2)-SVM, we find that γ2 = 0.18 is

estimated at a lower rate than γ1 = 0.97, confirming a significant fall in the average volatility of real

US GDP growth rate since the 1980s. For all change-point models, we also find that the unconditional

mean of yt falls after the break. Furthermore, for CP(1)-AR(2)-SVM, we estimate λ2 at −1.30 with a

posterior standard deviation of 0.64, whereas we estimate λ1 at −0.18 with a relatively larger posterior

standard deviation, see panel (d) of Figure 4.3.

Figure 4.3.: PG-AS sampler for US quarterly real GDP growth rate from 1947q4 to 2013q4

50 60 70 80 90 00 10−3

−1

1

3

5(a)

50 60 70 80 90 00 100

0.8

1.6

2.4

3.2(b)

50 60 70 80 90 00 100

0.06

0.12

0.18

0.24(c)

−4 −3 −2 −1 0 1 20

0.4

0.8

1.2

1.6(d)

95%−tile

σ2t

5%−tile

λ1

λ2

Graph (a): US quarterly real GDP growth rate, (b): posterior estimates of σ2t , t = 1, ..., T , for CP(1)-AR(2)-SVM, (c):

change-point density for CP(1)-AR(2)-SVM, (d): marginal posterior distribution of λk, k = 1, 2.

9We estimate the marginal likelihood of (4.5.7) for m = 0, 1, 2 structural breaks using the method of Chib (1995), seeLiu and Maheu (2008).

Page 107: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

98

Accordingly, results point that volatility feedback has a much more negative and deeper impact on the

GDP growth rate after the structural break, confirming the effects of the Great Moderation10. The

posterior density of the change point and the estimates of σ2t = γst exp (ht), t = 1, ..., T for CP(1)-

AR(2)-SVM are also plotted in Figure 4.3. Evidently, besides a significant reduction in the volatility

of yt since the 1980s, results also point to a gradual reduction in the volatility of yt during the 1960s

followed by a subsequent increase up until the break point at 1984q211.

Finally, Table 4.4 displays out-of-sample results for one-period ahead direct forecasts (see Marcellino

et al. (2005)) for the models given in Table 4.3. In general, we carry out a forecasting exercise for

a specific out-of-sample period. We first estimate the models using the initial sample and forecast.

Then, we add one data point, update and forecast again, until the end of the out-of-sample data. This

strategy works for AR(2), AR(2)-SV and AR(2)-SVM as we do not need to specify the number of

structural breaks over the out-of-sample data. In the context of forecasting with the break models, we

want the optimal change-point number to vary over time as the number of regimes can increase as time

goes by. Thus, we follow Bauwens et al. (2011) and perform the following: for the first out-of-sample

observation at time t, we calculate ML and DIC for 1, ...,Kt−1 change points using Yt−1. Thereafter,

we choose the optimal change-point number, K∗t−1, and calculate the predictive mean, E [yt | Yt−1],

using the parameters associated with specification K∗t−112. Thereafter, we increase the out-of-sample

with one observation, calculate ML and DIC for 1, ...,K∗t−1 + 1 change points, choose the optimal K∗t

and repeat the above forecasting procedure to obtain E [yt+1 | Yt]. Furthermore, in addition to MAE

and RMSE, forecasts are also compared using the linear exponential (LINEX) loss function of Zellner

(1986). This loss function is defined as L (yt, yt) = b [exp (a (yt − yt))− a (yt − yt)− 1], where yt is the

forecast. L (yt, yt) ranks overprediction (underprediction) more heavily for a > 0 (a < 0).

Overall, structural break specifications offer improvements in terms of point forecasts. However,

compared to their respective constant parameter counterparts the improvements that they offer are

quite modest. For instance, we find that CP(m)-AR(2)-SVM outperforms AR(2)-SVM by only about

5% (9%) in terms of MAE (RMSE). When LINEX is used, the breaks model’s ability to capture

variations in higher moments also tend to provide gains in terms of point forecasts.

Table 4.4.: Out-of-sample forecasts

Model MAE RMSE LINEX, a = 1, b = 1 LINEX, a = −1, b = 1

AR(2) 0.573 0.611 0.401 0.407

AR(2)-SV 0.573 0.608 0.395 0.411

AR(2)-SVM 0.572 0.607 0.396 0.408

CP(m)-AR(2) 0.572 0.601 0.406 0.384

CP(m)-AR(2)-SV 0.570 0.604 0.418 0.377

CP(m)-AR(2)-SVM 0.543 0.553 0.358 0.364

This table reports mean absolute error (MAE) and root mean squared error (RMSE) for forecasts based on the predictivemean for one-period ahead. Furthermore, the average LINEX loss function is reported for a = 1, a = −1 and b = 1. Theout-of-sample period is from 1959q3 till the end of the sample.

10From an economic point of view, these results can possibly support the hypothesis that institutional changes have con-tributed to the reduction in the volatility of business cycle fluctuations. However, any major economic interpretationof these results is beyond the scope of the paper and is therefore left for future research.

11However, although γ2 < γ1 for CP(1)-AR(2)-SV and CP(1)-AR(2)-SVM, we find that the unconditional volatility of

volatility of yt, σh,st/√

1− φ2h,st

, increases in the second regime, which is very counterintuitive. We cannot find a

plausible explanation for this phenomenon. Furthermore, we also arrive at the same conclusion by manually splittingthe sample at 1984q2 and estimating AR(2)-SV and AR(2)-SVM models for each subsample.

12We do not get any conflicting results with regards to recursive change-point identification using ML and DIC.

Page 108: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

99

4.5.3. Structural break ARFIMA-SV model

In this section, we propose a structural break ARFIMA model with SV effects. Our model allows for

structural breaks in µ, d, autoregressive (AR), moving average (MA) coefficients, γ, φh and σ2h. The

change-point ARFIMA-SV model is as follows

yt − µst =Ψ (L)

Φ (L)(1− L)−dst

√γst exp (ht/2) εt, εt ∼ N (0, 1) (4.5.8)

ht = φh,stht−1 + σh,stζt, ζt ∼ N (0, 1) , (4.5.9)

where st = 1, ...,m, Φ (L) = (1− φ1,stL− ...− φp,stLp) and Ψ (L) = (1 + ψ1,stL+ ...+ ψq,stLq) are

AR and MA polynomials in the lag operator, L, where Lpyt = yt=p for p = 0, 1, ... with integer orders

p ≥ 0 and q ≥ 0. The fractional difference operator, (1− L)−dst , with dst ∈ R is given by

(1− L)−dst =

∞∑j=0

Γ (j + dst)

Γ (j + 1) Γ (dst)Lj ,

where Γ (.) is the Gamma function. Equation (4.5.8) is a generalization of the ARMA model to

non-integer values of dst . Specifically, if dst > 0, the process is said to have long memory since the

autocorrelations die out at an hyperbolic rate. For 0 < dst < 0.5, (4.5.8) is a stationary long-memory

process with non-summable autocorrelation functions. For dst = 0, we have a change-point ARMA

model with stochastic volatility model. In this paper we assume that 0 < dst < 0.5 and that Φ (z) = 0,

Ψ (z) = 0 for unknown z do not have common roots.

In order to estimate (4.5.8)-(4.5.9), we rely mainly on the idea of Chan and Palma (1998). Specif-

ically, Chan and Palma (1998) consider an approximation of (4.5.8) based on a truncation lag of

order L. We proceed to draw θ = θkmk=1, where θk = (µk, dk, φ1,k, ..., φp,k, ψ1,k, ..., ψq,k)′, γ1, ..., γm,

φh,1, ..., φh,m, σ2h,1, ..., σ

2h,m, P and L from their respective conditional posteriors. However, the con-

ditional posteriors of θk do not have closed form solutions, see Raggi and Bordignon (2012). There-

fore, we sample θk using Metropolis-Hastings. As in Section 4.5.1, we note that θk depends only

on information in regime k. Thus, at the ith iteration of PG-AS, we can sample each element of

θk one-at-a-time using information in regime k. For instance, dk is sampled as follows. First, let

θ−dkk = (µk, φ1,k, ..., φp,k, ψ1,k, ..., ψq,k)′:

1. Then, sample a candidate, d∗k, from a Gaussian random walk proposal

q(d∗k | d

(i−1)k

)∼ TN]0,0.5[

(d

(i−1)k ,Σk

)We adjust Σk to get an acceptance rate around 30%. We experiment with different values of Σk,

until we find one which yields a reasonable acceptance rate probability.

2. Define the acceptance probability for d∗k as

aMH

(d∗k, d

(i−1)k

)= min

1,p(d∗k | θ

−dk(i−1)k , γ

(i−1)k , L(i−1), h

(i)k , Y

(i)k

)q(d

(i−1)k | d∗k

)p(d

(i−1)k | θ−dk(i−1)

k , γ(i−1)k , L(i−1), h

(i)k , Y

(i)k

)q(d∗k | d

(i−1)k

) .

3. Draw u ∼ U (0, 1). If u ≤ aMH

(d∗k, d

(i−1)k

), then set d

(i)k = d∗k, else set d

(i)k = d

(i−1)k .

Page 109: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

100

We sample φh,k | σ2h,k, hk following Kim et al. (1998). σ2

h,k | φh,k, hk ∼ IG (νk/2, lk/2), vk = Tk + v0,

lk = ζ′kζk + l0, Tk is the number of observations in regime k and ζk = ζt : st = k. v0 and l0 are

prior hyperparameter values. γk | θk, L, hk, Yk ∼ IG (rk/2, gk/2), where rk = Tk + r0 and gk =

ε′kεk/ exp

(hk

)+ g0. As before, pkk | S ∼ Beta (a0 + nkk, b0 + 1), k = 1, ...,m− 1. Finally, we use the

method of Raggi and Bordignon (2012) to sample L from its conditional posterior.

We apply our model to a monthly time series of inflation, using the US City Average core consumer

price index (CUUR0000SA0L1E) of the Bureau of Labor Statistics (BLS). Our series excludes the

direct effects of price changes for food and energy. We denote the series by Pt and use data from

1960:1 until 2013:12, for a total of T = 648 observations. We follow Bos et al. (2012) and construct

the monthly US core inflation as πt = 100 log (Pt/Pt−1). To adapt for part of the seasonality in the

series, we regress the inflation on a series of seasonal dummies, D, as in π = Dβ + u. Instead of using

the original inflation, πt, we use yt = ut + π, where ut is the residual of adapting the inflation for the

major seasonal effects at time t, and π is the average inflation level.

We estimate (4.5.8)-(4.5.9) from 0 to 4 change points. We then choose the optimal number of

change points using ML and DIC. Both in terms of ML and DIC, results point that the specification

with 2 change points fits the data best. As before, we also compute the marginal likelihood for the

change-point ARFIMA(p,d,q)-SV models using the method of Sims et al. (2008) and do not find any

significant changes compared to G-D.

Table 4.5 reports estimation results for: a homoscedastic ARFIMA model, ARFIMA-SV and

ARFIMA-SV conditional on 2 change points, henceforth labeled as CP(2)-ARFIMA-SV. With re-

gards to model estimation, we experiment with different AR and MA lags. We find dst , φ11,st and

φ12,st to be significant. Thus, we assume that dst ∼ N (0, 1) truncated such that 0 < dst < 0.5,

φ1,st ∼ N (0, 1) , ..., φ12,st ∼ N (0, 1) and ensure that the roots of(1− φ1,stL− ...− φ12,stL

12)

lie

outside the unit circle by employing rejection sampling. However, φ1,st , ..., φ10,st are not significant

in our applications and are therefore fixed at zero. A suitable prior for the truncation lag, L, is

the Poisson truncated distribution with L ∈ Lmin, ..., Lmax. In this paper, we follow Raggi and

Bordignon (2012) and set Lmin = 10 and Lmax = 50. Finally, we follow Section 4.5.2 and assume

that γk, σ2h,k ∼ IG

(42 ,

0.22

), (φh,k + 1) /2 ∼ Beta (20, 1.5), k = 1, ...,m, and pkk ∼ Beta (20, 0.1),

k = 1, ...,m−1. In this setting, the prior on each element of θk is very standard, while the prior on pkk

favors infrequent structural breaks. In the Appendix, we evaluate sensitivity of the results to different

prior specifications by investigating alternative prior hyperparameter values on pkk.

We also report results for a change-point integrated moving average (IMA) model of order 1 with

SV effects, CP(m)-IMA(1,1 )-SV. This model corresponds to a CP(m)-ARFIMA(0,1,1 )-SV model for

yt, or a CP(m)-ARIMA(0,0,1 )-SV model for 4yt. In this model, changes in the long-run persistence

are captured by changes in the MA(1) parameter, ψst . We estimate CP(m)-IMA(1,1 )-SV from 0 to

4 change points. IMA(1,1 )-SV without structural breaks is equivalent to the unobserved components

model of Stock and Watson (2007). As before,we choose the optimal number of change points using

ML and DIC. Again, results point that the specification with 2 change points fits the data best. We

also report results for IMA(1,1 )-SV and CP(2)-IMA(1,1 )-SV on the right-hand side of Table 4.5.

Accordingly, we find that specifications with change points dominate ARFIMA(12,d,0). For in-

stance, compared to ARFIMA(12,d,0)-SV, the logBF in favor of CP(2)-ARFIMA(12,d,0)-SV is 75.

For the ARFIMA(12,d,0) model, the order of integration, d1, is estimated at 0.38. This implies that

Page 110: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

101

US core inflation exhibits long-memory behavior. φ12,1 captures the main seasonal effects. The aver-

age inflation rate, µ1 is estimated at 0.17%. The residual standard deviation of the ARFIMA(12,d,0)

model, σ1 is at 0.18% per month. When we compare ARFIMA(12,d,0) with ARFIMA(12,d,0)-SV,

we find that d1 drops from 0.38 to 0.29. The AR coefficients, φ11,1 and φ12,1 increase from 0.11 and

0.24 to 0.14 and 0.36, respectively. The estimate of µ1, is also affected, being more precisely estimated

at a lower value. The SV component itself is nearly nonstationary as the autoregressive coefficient of

volatility, φh,1, is close to one and σ2h,1 is well identified at 0.03 with a standard error of 0.01. The

average variance of inflation, γ1, is estimated at 0.025%. The posterior mode of the change-point

density is associated with 1973:7 and 1984:2.

Table 4.5.: Posterior means and standard deviations (in parentheses), US core inflation rate

ARFIMA ARFIMA IMA CP(2)-ARFIMA CP(2)-IMA

(12,d,0) (12,d,0)-SV (1,1)-SV (12,d,0)-SV (1,1)-SV

Parameter mean std. dev mean std. dev mean std. dev mean std. dev mean std. dev

d1 0.384 (0.030) 0.299 (0.035) 0.202 (0.069)

d2 0.417 (0.053)

d3 0.154 (0.049)

ψ1 -0.840 (0.027) -0.847 (0.042)

ψ2 -0.569 (0.109)

ψ3 -0.857 (0.063)

µ1 0.171 (0.070) 0.115 (0.020) 0.135 (0.042)

µ2 0.536 (0.113)

µ3 0.072 (0.017)

φ11,1 0.116 (0.037) 0.149 (0.036) 0.203 (0.081)

φ12,1 0.241 (0.038) 0.368 (0.038) 0.302 (0.080)

φ11,2 0.027 (0.089)

φ12,2 0.097 (0.093)

φ11,3 0.150 (0.046)

φ12,3 0.505 (0.048)

σ21 0.032 (0.001)

γ1 0.025 (0.006) 0.028 (0.006) 0.035 (0.006) 0.034 (0.007)

γ2 0.060 (0.018) 0.063 0.020)

γ3 0.014 (0.003) 0.021 (0.010)

φh,1 0.975 (0.010) 0.965 (0.014) 0.792 (0.113) 0.821 (0.108)

φh,2 0.913 (0.055) 0.925 (0.046)

φh,3 0.838 (0.086) 0.854 (0.083)

σ2h,1 0.035 (0.011) 0.040 (0.012) 0.049 (0.024) 0.047 (0.022)

σ2h,2 0.059 (0.029) 0.063 (0.029)

σ2h,3 0.054 (0.024) 0.054 (0.024)

L 29.545 (4.499) 20.570 (5.379)

log(ML), α = 0.50 168.895 271.701 224.363 346.612 259.751

log(ML), α = 0.75 168.867 272.105 224.768 347.018 260.156

log(ML), α = 0.95 168.837 272.342 225.004 347.254 260.392

log(ML), α = 0.99 168.828 272.383 225.046 347.295 260.434

DIC -362.286 -521.928 -422.079 -549.385 -434.130

Rank 5 2 4 1 3

This table reports posterior means (mean) and standard deviations (std. dev) for different ARFIMA(p,d,q)-SV typemodels. The parameters associated with each regime are labeled with subscript 1, ...,m. log(ML): logarithm of themarginal likelihood using the corresponding value of α. DIC: deviance information criterion. Rank: rank of the modelbased on ML and DIC. Total number of observations, T = 648.

Page 111: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

102

Figure 4.4 displays the data, estimates of σ2t = γst exp (ht), t = 1, ..., T , and the posterior density of

the change-point dates for CP(2)-ARFIMA(12,d,0)-SV. Furthermore, the top right panel of Figure

4.4 shows a noticeable and persistent decrease in the volatility of inflation since the early 1980s. As

previously mentioned, this period is labeled as the Great Moderation, see Stock and Watson (2007).

Evidently, results from (4.5.8)-(4.5.9) conditional on 2 change points show that we can divide the

evolution of yt into three subsequent phases: the period from 1960:1-1973:6, 1973:7-1984:1 and 1984:2

till the end of the sample. Both dst and µst are much smaller after the last change point, d2 = 0.41

versus d3 = 0.15 and µ2 = 0.53 versus µ3 = 0.07. On the other hand, φ11,st , φ12,st increase from 0.02,

0.09 to 0.15, 0.50 respectively. At the same time, the estimate of γst almost doubles in the second

regime. On the other hand, γst falls from 0.06 to 0.01 after the last structural break. Furthermore,

the unconditional volatility of volatility, σh,st/√

1− φ2h,st

, rises from 0.36 in the first regime to 0.60 in

the second regime. Thereafter, it falls to 0.42 from the last change point till the end of the sample.

The last two columns of Table 4.5 report results for CP(2)-IMA(1,1)-SV. The estimate of ψ1,st rises

from −0.84 in the first phase to −0.56 in the second phase. ψ1,st drops to −0.85 in the subsequent

phase. Furthermore, similar to CP(2)-ARFIMA(12,d,0)-SV, the unconditional volatility of volatility

drops from the last change point till the end of the sample. However, CP(2)-IMA(1,1)-SV performs

worse that CP(2)-ARFIMA(12,d,0)-SV.

Figure 4.4.: PG-AS sampler for US monthly core inflation rate from 1960:1 to 2013:12

67 73 79 85 91 97 03 09−0.5

0

0.5

1

1.5(a)

67 73 79 85 91 97 03 090

0.08

0.16

0.24

0.32(b)

67 73 79 85 91 97 03 090

0.04

0.08

0.12

0.16(c)

67 73 79 85 91 97 03 090

0.06

0.12

0.18

0.24(d)

95%−tile

σ2t

5%−tile

Graph (a): US monthly core inflation adjusted for fixed seasonals, yt, (b): posterior estimates of the conditional varianceof inflation, (c) and (d): posterior density of the first and the second change point.

Overall, we find evidence of structural breaks in the dynamics of yt. As expected, most significant

changes in the model parameters occur during the Great Moderation. More importantly, it is also

cautiously evident that the long-memory characteristics of US inflation might have not remained

significant after the Great Moderation.

Page 112: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

103

We follow Section 4.5.2 and compare the out-of-sample performance of CP(m)-ARFIMA(12,d,0)-SV

(break) with ARFIMA(12,d,0)-SV (no-break). Specifically, we compare the out-of-sample predictive

likelihood (PL) and predictive mean between these two models. Given the data up to time t−1, Yt−1,

the predictive likelihood (PL), p (yt, .., yT | Yt−1) is the predictive density evaluated at the realized

outcome, yt, ..., yT , t ≤ T , see Geweke (2005). The PL for model MA is given as

p (yt, .., yT | Yt−1,MA) =T∏s=t

N−1N∑i=1

p(ys | θ(i)

A , Ys−1,MA

). (4.5.10)

Notice that the terms on the right-hand-side of (4.5.10) have parameter uncertainty integrated out.

If t = 1, this would be the marginal likelihood and (4.5.10) changes to (4.3.4). Hence, the sum of

log-predictive likelihoods can be interpreted as a measure similar to the logarithm of the marginal

likelihood, but ignoring the initial t − 1 observations. The predictive likelihood can be used to order

models according to their predictive abilities. In a similar fashion to Bayes factors, one can also com-

pare the performance of models based on a specific out-of-sample period by predictive Bayes factors,

PBF. Suppose we have two different models denoted by MA and MB. The PBF for yt, .., yT and

modelsMA versusMB is PBFAB = p (yt, ..., yT | Yt−1,MA) /p (yt, ..., yT | Yt−1,MB). It summarizes

the relative evidence of the two models over the out-of-sample data, yt, ..., yT .

In order to compare the out-of-sample density forecasts of the models, we calculate PBF for data

at and after the first break point, 1973:7. Hence, t − 1 = 1973:6. As a new observation arrives, we

update the posterior through a new round of sampling and perform forecasting. As in Section 4.5.2, in

the context of forecasting with the break model, we follow Bauwens et al. (2011). For one observation

out-of-sample, log (PBF ) = 2.36, 6 months log (PBF ) = 2.81, 1 year log (PBF ) = 2.94, 5 years

log (PBF ) = 5.11, 10 years log (PBF ) = 7.33 and 15 years log (PBF ) = 10.34, each in favor of the

break specification. The improvements continue till the end of sample, see Table 4.6. Finally, Table

4.6 also displays out-of-sample results for one-month ahead point forecasts for the no-break and the

break model. Overall, the break model offers improvements in terms of MAE and RMSE compared

to the no-break model.

Table 4.6.: Out-of-sample forecasts for US core inflation

Model MAE RMSE LINEX, a = 1, b = 1 LINEX, a = −1, b = 1 log(PL)

No-break 0.129 0.180 0.017 0.016 116.048

Break 0.121 0.168 0.014 0.015 130.556

This table reports mean absolute error (MAE), root mean squared error (RMSE) and average LINEX for the forecastsbased on the predictive mean for one-month ahead. Furthermore, the one-month ahead log-predictive likelihood, log(PL),is also reported. The out-of-sample period is from 1973:7 till the end of the sample.

4.6. Conclusion

In this paper we apply PG-AS to the challenging class of stochastic volatility models and demonstrate

its flexibility under different circumstances. We show that PG-AS provides a very flexible framework

for estimation, forecasting and model comparison in all of the cases that we consider. First, we

estimate various SV models using daily DJIA returns. We find that the SV model with moving

average and Student-t distributed errors performs best in terms of ML and DIC. We also show the

flexibility of PG-AS by combing it with the change-point specification of Chib (1998). Our empirical

Page 113: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

4. PG-AS for SV Models with: Heavy Tails, in Mean Effects, Leverage and Structural Breaks

104

application using US real GDP data shows that this combination provides reliable results in terms

of estimation, change-point identification, volatility feedback modeling and forecasting. Finally, we

analyze the behavior of US monthly core inflation rate using structural break ARFIMA-SV models.

We find evidence in favor of two structural breaks. Furthermore, we find considerable differences

in parameter estimates in each regime. We also demonstrate that accounting for structural breaks

improves density and point forecasts.

Page 114: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

A. Appendix for Chapter 1

A.1. Estimation of the change-point model

To conduct estimation of the change-point model, we start by specifying independent conjugate priors

for the parameters in each regime. They are

βj ∼ N (n0, N0) , σ2j ∼ IG

(v02 ,

s02

), pj ∼ Beta (a0, b0)

for j = 1, ...,m and pm = 1. For the sake of notation let θj =(β′j , σ

2j

)′. Furthermore, in order to ease

the notation burden conditioning on XT is suppressed.

In order to perform Gibbs sampling, we divide the parameter space into three blocks: θ = θ1, ..., θm,the state of the system, S = (s1, ..., sT )

′, and the transition matrix, P . Below, we provide more details

on each step of the Gibbs sampler.

Step 1: Simulation of S | θ, P, YT . Chib (1998) shows that a joint draw of S can be achieved in

one step using

p (S | θ, P, YT ) = p (sT | θ, P, YT )T−1∏t=1

p (st | st+1, θ, P, Yt) (A.1.1)

in which one samples sequentially from each density on the right-hand-side of (A.1.1) beginning with

p (sT | θ, P, YT ), and then p (st | st+1, θ, P, Yt) t = T=1, ..., 1. At each step one conditions on the

previously drawn state, st+1, until a full draw of S is obtained. The individual densities in (A.1.1) are

obtained based on the following steps:

(a) Initialization: At t = 1, set p (s1 = 1 | θ, P, Y1) = 1.

(b) Compute the Hamilton (1989) filter, p (st = j | θ, P, Yt). This involves a prediction and an

update step in which one iterates on the following from t = 2, ..., T ,

p (st = j | θ, P, Yt−1) =

j∑l=j−1

p (st−1 = l | θ, P, Yt−1) plj , j = 1, ...,m (A.1.2)

p (st = j | θ, P, Yt) =p (st = j | θ, P, Yt−1) p (yt | θ, Yt−1, st = j)∑ml=1 p (st = l | θ, P, Yt−1) p (yt | θ, Yt−1, st = l)

, (A.1.3)

j = 1, ...,m.

The last equation is obtained from Bayes’ rule. Note that in (A.1.2) the summation is only from j− 1

to j, due to the restricted nature of the transition matrix, and p (yt | θ, Yt−1, st = j) ∼ N(Xt−1βj , σ

2j

)in (A.1.3) has a closed form solution.

Page 115: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

A. Appendix for Chapter 1

106

(c) Finally, Chib (1998) shows that the individual densities in (A.1.1) are

p (st | st+1, θ, P, Yt) ∝ p (st | θ, P, Yt) p (st+1 | st, P ) .

Thus, given sT = m, st is drawn backwards over t = T − 1, T − 2, ..., 2 as

st | st+1, θ, P, Yt =

st+1 with probability ct

st+1 − 1 with probability 1− ct,

where

ct =p (st = j | θ, P, Yt) p (st+1 = j | st = j, P )∑j

l=j−1 p (st = l | θ, P, Yt) p (st+1 = j | st = l, P ).

Finally, note that p (s1 = 1 | s2, θ, P, Yt) = 1.

Step 2: Simulation of θ | S, YT . The conditional posterior density of θj depends only on the data

in regime j. Therefore, let Yj = yt : st = j, Xj = Xt−1 : st = j and use Gibbs sampling methods

for a linear regression model. Hence, βj | Yj , Xj , σ2j ∼ N (nj , Nj) where

Nj =(σ−2j X

′jXj +N−1

0

)−1, nj = Nj

(σ−2j X

′j Yj +N−1

0 n0

)and σ2

j | Xj , Yj , βj ∼ IG(vj

2 ,sj2

), where vj = Tj + v0, sj =

(Yj − Xjβj

)′ (Yj − Xjβj

)+ s0 and Tj is

the number of observations in regime j.

Step 3: Simulation of P | S. The conditional posterior for each diagonal component of P is very

simple and given by pj | S ∼ Beta (a0 + nj , b0 + 1), where nj is the number of one-step transitions

from state j to state j in a sequence of S.

A.2. Marginal likelihood computation for the change-point model

To compute the marginal likelihood for the change-point model, we use the method of Chib (1995)

which is based on

p (YT ) =p (YT | θ, P ) p (θ, P )

p (θ, P | YT ), (A.2.1)

where p (YT | θ, P ) is the likelihood function with S integrated out, p (θ, P ) is the prior density, and

p (θ, P | YT ) is the posterior density. As before, we set YT = (y1, ..., yT )′

and follow the notation from

the previous sections. In principle, any value of (θ, P ) can be used to compute (A.2.1). Following Liu

and Maheu (2008), we use the posterior mean,(θ, P

). Thus,

log p (YT ) = log p(YT | θ, P

)+ log p

(θ, P

)− log p

(θ, P | YT

), (A.2.2)

p(θ, P

)is evaluated directly and the likelihood function, p

(YT | θ, P

), is calculated as

log p (YT | θ, P ) =

T∑t=1

log p (yt | θ, P, Yt−1)

Page 116: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

A. Appendix for Chapter 1

107

where

p (yt | θ, P, Yt−1) =

m∑j=1

p (yt | θ, Yt−1, st = j)

p (st = j | θ, P, Yt−1) .

The most difficult and demanding part of (A.2.2) is the computation of p(θ, P | YT

)since it must be

computed numerically. We use the decomposition

p(θ, P | YT

)= p

(β | YT

)p(σ2 | β, YT

)p(P | β, σ2, YT

), (A.2.3)

where each term on the right-hand-side can be estimated from MCMC simulations. The first term

can be estimated as

p(β | YT

)≈ 1

N

N∑i=1

p(β | σ2(i), S(i), YT

),

where p(β | σ2(i), S(i), YT

)=∏mj=1 p

(βj | σ2(i), S(i), YT

)and the draws

σ2(i), S(i)

Ni=1

are directly

available from the Gibbs output. The second term in (A.2.3) is equal to

p(σ2 | β, YT

)=

ˆp(σ2 | β, S, YT

)p(S | β, YT

)dS,

where p(σ2 | β, S, YT

)=∏mj=1 p

(σj

2 | β, S, YT). To obtain the draws from p

(S | β, YT

), we run an

additional reduced Gibbs sampling conditional on β, that is, we run a Gibbs sampling scheme where

we do not draw from β but fix them to be β. Thereafter, we useS(i)

Ni=1

and calculate p(σ2 | β, YT

)as

p(σ2 | β, YT

)≈ 1

N

N∑i=1

p(σ2 | β, S(i), YT

).

Finally, for p(P |β, σ2, YT

)=∏m−1j=1 p

(pj |β, σ2, YT

), j = 1, ...m−1, we sample

S(i)

Ni=1

from p(S | β, σ2, YT

)and set

p(pj |β, σ2, YT

)≈ 1

N

N∑i=1

p(pj | β, σ2, S(i), YT

)

A.3. Estimation of the MIA model

Consider the following model

yt = Xt−1βt + εt, εt ∼ N(0, σ2

), (A.3.1)

where βt = (β1t, ..., βkt)′

and

βit = βit−1 + κitηit, ηit ∼ N(0, q2

i

), i = 1, ..., k, (A.3.2)

Page 117: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

A. Appendix for Chapter 1

108

where κit = 0, 1 with Pr (κit = 1) = πi. The parameters of (A.3.1)-(A.3.2) are: the structural break

probabilities, π = (π1, ..., πk)′, the magnitude of the breaks in the state equations, q =

(q2

1, ..., q2k

)′, and

σ2. These quantities are all collected in θ. As before, in order to ease the notation burden conditioning

on XT is suppressed. Let Kt = (κ1t, ..., κkt)′, K = KtTt=1 and B = βtTt=1. The Gibbs sampling

scheme for (A.3.1)-(A.3.2) is as follows

Sample K | θ, YT

The structural breaks are sampled using the algorithm of Gerlach et al. (2000). In particular, Gerlach

et al. (2000) has two important features. First, K is generated without conditioning on the states, B.

Second, the number of operations required to obtain a draw of K is reduced from O(T 2)

to O (T ).

Define K−t = KsTs=1,s 6=t. The conditional posterior of Kt is given as

p (Kt | θ,K−t, YT ) ∝ p (YT | θ,K) p (Kt | θ,K−t)

∝ p (yt+1, ..., yT | θ,K, Yt)

× p (yt | θ,K1, ...,Kt, Yt−1) p (Kt | θ,K−t) . (A.3.3)

The term p (Kt | θ,K−t) is obtained from the prior and p (yt | θ,K1, ...,Kt, Yt−1) is computed using the

Kalman filter. The important contribution of Gerlach et al. (2000) is that the second term in (A.3.3),

p (yt+1, ..., yT | θ,K, Yt) can be obtained in one step after an initial set of backward recursions. Finally,

since Kt can only take a finite number of values, it can be drawn by computing the right-hand-side of

(A.3.3) for all possible values of Kt and then normalizing. For more details on implementation of the

algorithm, we refer the reader to Gerlach et al. (2000).

Sample B | K, θ, YT

B is sampled from its conditional posterior using the simulation smoother of Carter and Kohn (1994).

The algorithm of Durbin and Koopman (2002) is also an interesting alternative.

Sample θ | K,B, YT

To sample q2i and σ2, inverse Gamma densities are used, see Kim and Nelson (1999). Finally, assume

that p (πi) ∼ Beta (ai0, bi0). Then, the conditional posterior of πi is Beta (ai, bi), where ai = ai0 +

ΣTt=1κit and bi = bi0 + T − ΣT

t=1κit.

Page 118: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

B. Appendix for Chapter 2

B.1. A direct approach for evaluating the likelihood function

This appendix details a direct approach for evaluating the likelihood function of the ARFIMA model.

Consider the following ARFIMA model

yt = µ+ (1− L)−d εt, εt ∼ N(0, σ2

). (B.1.1)

Conditional on M , we write (B.1.1) as YT = u+Hε, where u, H and ε follow directly from the main

text. The special structure of H can be exploited to speed up computation. For instance, obtaining

the Cholesky decomposition of a banded T × T matrix with fixed bandwidth involves only O (T )

operations as opposed to O(T 3)

for a full matrix of the same size. Similar computational savings can

be generated in operations such as multiplication, forward and backward substitution by using block-

banded or sparse matrix algorithms. These banded and sparse matrix algorithms are implemented in

Matlab.

It follows from (B.1.1) that p (YT | θ,M) ∼ N (u,ΩYT ) where ΩYT = HSYTH′

and SYT = σ2IT .

Since SYT is a diagonal matrix, and H is a lower triangular sparse matrix, the product, ΩYT , is sparse.

Moreover, since |H| = 1 for any π1, ..., πM one has that |ΩYT | = |SYT |. The joint density of YT is

therefore given by

log p (YT | θ,M) = −T2

log (2π)− T

2log(σ2)− 1

2(YT − u)

′Ω−1YT

(YT − u) . (B.1.2)

As stated in Chan (2013), we do not need to obtain the T ×T inverse matrix, Ω−1YT

, in order to evaluate

(B.1.2), which would involve O(T 3)

operations. Instead, it can be computed in three steps, each of

which requires only O (T ) operations. Therefore, the following notation is introduced, see Chan (2013):

given a lower (upper) triangular T × T non-singular matrix, A, and a T × 1 vector, c, let A \ c denote

the unique solution to the triangular system, Ax = c, obtained by forward (backward) substitution,

i.e. A \ c = A−1c.

Now, obtain the Cholesky decomposition, CYT , of ΩYT such that CYTC′YT

= ΩYT . This involves only

O (T ) operations. Compute x1 = C′YT\(CYT \ (YT − u)) by forward followed by backward substitution,

each of which require O (T ) operations since CYT is also banded. Then, by definition

x1 = C−1′

YT

(C−1YT

(YT − u))

=(CYTC

′YT

)−1(YT − u) = Ω−1

YT(YT − u) .

Finally, compute x2 = −12 (YT − u)

′x1 = −1

2 (YT − u)′Ω−1YT

(YT − u), which gives the quadratic term

in (B.1.2).

Page 119: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

C. Appendix for Chapter 3

C.1. Simulation evidence: SV-MA(1) and SVM

In this section, we present simulation results for SV-MA(1) and SVM. We simulate T = 1000 ob-

servations from these models, and report the true DGP parameters along with PMMH parameter

estimates in Table C.1. In each case, we also estimate a plain SV model for comparison. Overall, we

see that PMMH works very well as parameter estimates are close to their respective true values. Not

surprisingly, in each case the corresponding model outperforms the plain SV model in terms of ML.

Finally, we analyze the performance of PMMH with respect to the number of particles, M . We do

this by estimating SV-MA(1) using M = 1, M = 10, M = 100 and M = 1000. In all of these cases,

we choose N = 20000. We see that using very low values of M appear to be insufficient, see Figure

C.1. For instance, the chain gets stuck on a specific parameter value almost throughout the sample

for M = 1. For M = 100, we get better results. However, the chain still gets stuck for a considerable

time. However, we see drastic improvements in the performance of the algorithm for M = 1000. We

also run the PMMH algorithm for M = 2000. We get almost identical results as for M = 1000.

Page 120: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

C. Appendix for Chapter 3

111

Tab

leC

.1.:

Sim

ula

tion

evid

ence

DG

P:

stoch

asti

cvo

lati

lity

wit

hM

A(1

)er

rors

,SV

-MA

(1)

DG

P:

stoch

asti

cvo

lati

lity

inm

ean,

SV

MSV

SV

-MA

(1)

SV

SV

MP

aram

eter

true

θRB

θRB

tru

RB

θRB

µ0.

20

0.4168

4.31

0.27

564.

400.

200.

7287

23.7

20.

2883

5.24

[0.2

129,0

.604

5][0

.041

9,0.

5011

][0

.410

6,1.

0132

][0

.058

0,0.

5191

0.98

0.9701

5.07

0.97

814.

550.

980.

9815

21.6

30.

9790

4.86

[0.9

515,0

.986

1][0

.963

5,0.

9906

][0

.965

1,0.

9928

][0

.965

5,0.

9904

20.

01

0.0171

4.86

0.01

045.

220.

010.

0087

0.01

085.

38[0

.007

8,0

.029

1][0

.004

8,0.

0172

][0

.004

3,0.

0159

]14

.04

[0.0

055,

0.01

74]

ψ1

0.40

0.45

915.

57[0

.422

1,0.

4963

0.80

0.75

335.

09[0

.709

0,0.

8001

]

log(

L)

-166

6.1

-158

1.0

-181

1.0

-158

6.8

log(

ML

),a

=0.

75

-1671

.2-1

592.

4-1

817.

5-1

598.

5lo

g(M

L),a

=0.

99

-1673

.8-1

592.

2-1

817.

2-1

598.

2M

-Hra

tio

0.42

0.38

0.33

0.37

This

table

rep

ort

ses

tim

ati

on

resu

lts

for

SV

-MA

(1)

and

SV

Mm

odel

susi

ng

sim

ula

ted

data

.lo

g(L

):lo

g-l

ikel

ihood,

log(M

L):

log-m

arg

inal

likel

ihood

for

the

corr

esp

ondin

gva

lue

ofa.

M-H

rati

o:

Met

rop

olis-

Hast

ings

acc

epta

nce

rati

o.

Page 121: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

C. Appendix for Chapter 3

112

Figure C.1.: Estimation results, SV-MA(1) model

µ |YT, M=1

0 10000 200000.0

0.5

1.0 µ |YT, M=1 ρ|YT, M=1

0 10000 200000.70

0.75

0.80ρ|YT, M=1 σ2|YT, M=1

0 10000 200000.00

0.15

0.30σ2|YT, M=1

ψ|YT, M=1

0 10000 200000.0

0.2

0.4

ψ|YT, M=1

µ |YT, M=10

0 10000 200000.0

0.5

1.0µ |YT, M=10 ρ|YT, M=10

0 10000 200000.7

0.8

0.9ρ|YT, M=10 σ2|YT, M=10

0 10000 200000.00

0.15

0.30σ2|YT, M=10

ψ|YT, M=10

0 10000 200000.00

0.25

0.50

ψ|YT, M=10

µ |YT, M=100

0 10000 20000−0.5

0.0

0.5

1.0µ |YT, M=100

ρ|YT, M=100

0 10000 200000.7

0.8

0.9

1.0

ρ|YT, M=100

σ2|YT, M=100

0 10000 200000.00

0.15

0.30σ2|YT, M=100

ψ|YT, M=100

0 10000 20000−0.1

0.2

0.5

ψ|YT, M=100

µ |YT, M=1000

0 10000 20000−0.8

0.0

0.8

1.6µ |YT, M=1000

0 10000 200000.7

0.8

0.9

1.0

ρ|YT, M=1000 ρ|YT, M=1000

σ2|YT, M=1000

0 10000 200000.00

0.15

0.30σ2|YT, M=1000

ψ|YT, M=1000

0 10000 20000−0.1

0.2

0.5

ψ|YT, M=1000

Each column shows posterior draws of the parameter of interest for different number of particles, M .

Page 122: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

C. Appendix for Chapter 3

113

C.2. Particle Gibbs with ancestor sampling

In order to ease the notation burden, we consider the plain stochastic volatility (SV) model

yt = exp (αt/2) εt, εt ∼ N (0, 1) (C.2.1)

αt+1 = µ+ ρ (αt − µ) + σηt, ηt ∼ N (0, 1) , (C.2.2)

where θ =(µ, ρ, σ2

)′. Within the PG-AS framework, we approach estimating (C.2.1)-(C.2.2) directly.

First, we draw α1:T ∼ p (α1:T | θ, YT ) using the conditional particle filter with ancestor sampling,

CPF-AS. Thereafter, we draw θ ∼ p (θ | α1:T , YT ) using standard Gibbs sampling techniques. Let

i = 1, ..., N denote the number of Gibbs sampling iterations, j = 1, ...,M denote the number of

particles, and let p (yt | θ, αt, Yt−1) denote the density of yt given θ, αt and Yt−1. Finally, let α(i−1)1:T

be a fixed reference trajectory of α1:T sampled at iteration i − 1 of the Gibbs sampler. The steps of

CPF-AS for the SV model are as follows

1. if t = 1

(a) Draw α(j)1 | θ for j = 1, ...,M − 1 and set α

(M)1 = α

(i−1)1 .

(b) Set w(j)1 = τ

(j)1 /ΣM

k=1τ(k)1 , where τ

(j)1 = p

(y1 | θ, α(j)

1 , Y0

)for j = 1, ...,M .

2. else for t = 2 to T do

(a) Resample α(j)t−1

M−1j=1 using indices δ

(j)t , where p

(j)t = k

)∝ w(j)

t−1.

(b) Draw α(j)t | α

(δ(j)t

)t−1 , θ for j = 1, ...,M − 1.

(c) Set α(M)t = α

(i−1)t .

(d) Draw δ(M)t from p

(M)t = j

)∝ w(j)

t−1p(α

(i−1)t | α(j)

t−1, θ)

.

(e) Set α(j)1:t =

(δ(j)t

)1:t−1 , α

(j)t

), w

(j)t = τ

(j)t /ΣM

k=1τ(k)t .

3. end for

4. Sample α(i)1:T | θ, YT with p

(i)1:T = α

(j)1:T | θ, YT

)∝ w(j)

T .

Notice that CPF-AS is akin to a standard particle filter, but with the difference that αM1:T is specified

a priori and serves as a reference trajectory. Hence, we use only M − 1 particles at each step.

Furthermore, whereas in the particle Gibbs algorithm of Andrieu et al. (2010), we set δ(M)t = M , in

PG-AS, we sample a new value for the index variable, δ(M)t , in an ancestor sampling step, (d). Finally,

we can include more unobserved processes in the model of interest. All we need to do is to modify

(a), (b), (c), (e), and thus draw particles for each process. At the same time, we still have one sets of

weights. Thereafter, we can sample the unobserved processes using step 4.

Page 123: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

D. Appendix for Chapter 4

D.1. Prior sensitivity analysis

In this section, sensitivity of the results to prior specification is evaluated by investigating alternative

prior hyperparameter values on the transition probabilities, pkk ∼ Beta (a0, b0), keeping prior hyper-

parameter values of the other parameters the same as in the main text. pkk, k = 1, ...,m− 1 is one of

the key parameters of the model because it controls the duration of each regime in S.

We experiment with different hyperparameter values on pkk in Table D.1. We report the break

dates for each of them by estimating CP(2)-ARFIMA-SV using the corresponding values of a0 and

b0. For instance, the first alternative prior is pkk ∼ Beta (0.1, 0.1), which is relatively flat. With this

prior, we still find that the change-point dates correspond to 1973:7 and 1984:2. In fact, regardless

of the values of a0 and b0, we still find that the change-point dates for each of these specifications

correspond to 1973:7 and 1984:2. We also report logBF of CP(2)-ARFIMA-SV versus ARFIMA-SV

using the corresponding values of α, along with the difference in DIC between CP(2)-ARFIMA-SV

and ARFIMA-SV, see Table D.1. These results overwhelmingly suggest existence of structural breaks.

More importantly, we find that the choice of prior hyperparameter values on P is of relatively limited

importance.

Table D.1.: Prior sensitivity analysis, CP(2)-ARFIMA-SV

Prior break dates logBFα = 0.50

logBFα = 0.75

logBFα = 0.95

logBFα = 0.99

diff(DIC)

Beta (0.1, 0.1) 1973:7, 1984:2 59.345 59.347 59.346 59.346 -33.251Beta (8, 0.1) 1973:7, 1984:2 67.807 67.808 67.808 67.808 -33.385Beta (20, 0.1) 1973:7, 1984:2 74.911 74.913 74.912 74.912 -27.454Beta (100, 0.1) 1973:7, 1984:2 60.123 60.125 60.124 60.124 -29.880

This table compares the performance of CP(2)-ARFIMA-SV for different values of a0 and b0, where pkk ∼ Beta (a0, b0).The priors of the other parameters are set according to the main text. logBF : logarithm of the Bayes factor ofCP(2)-ARFIMA-SV versus ARFIMA-SV using the corresponding value of α. diff(DIC): difference in DIC betweenCP(2)-ARFIMA-SV and ARFIMA-SV.

D.2. Sensitivity of PG-AS with respect to M

We often find that the choice of M is important because it ensures that the estimate of h1:T is not

too jittery or imprecise. Furthermore, increasing M also increases the computation time. Therefore,

it is important to find a reasonable value for M that avoids the above mentioned problems. In the

following, we experiment with different values of M to find out its effects on estimation results. We

do this by re-estimating the SV model using the DJIA data for M = 2, 10, 100 and 1000.

We report parameter estimates of the SV model using the above mentioned number of particles

in Table D.2. Besides these estimates, we also report the inefficiency factors (RB) of the parameters

Page 124: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

D. Appendix for Chapter 4

115

and h1:T for each case, see Figure D.1. Furthermore, we compute Geweke’s convergence statistics

and present estimation time in seconds for each M . In each case, we sample N = 20000 draws from

p (θ, h1:T | YT ) after a burn-in of 1000.

Overall, we see that PG-AS performs very well as parameter estimates are very similar regardless

the values of M . Furthermore, the choice of M = 100 is very sensible. In fact, we get almost identical

results for M = 10 and M = 100. However, the RBs decrease as we set M = 100. For instance,

in Figure D.1, for M = 10, 75% of h1:T s have inefficiency factors less than 8, while for M = 100,

this number is close to 4. Compared to M = 100, we do not obtain any significant gains in RB for

M = 1000. However, as M increases the computation time also increases. From this point of view,

M = 1000 seems computationally very demanding.

Figure D.1.: Sensitivity of the PG-AS sampler with respect to M

20

40

60

80

100

PG−AS, M=2

(a)

2

5

8

11

14

PG−AS, M=10

(b)

2

2.8

3.6

4.4

5.2

PG−AS, M=100

(c)

2

2.8

3.6

4.4

5.2

PG−AS, M=1000

(d)

Graphs (a)-(d): box plots of the inefficiency factors of h1:T using the corresponding number of particles.

Page 125: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

D. Appendix for Chapter 4

116

Table D.2.: Sensitivity of PG-AS with respect to M , DJIA daily returns

Parameter mean std. dev 5%-tile 95%-tile RB Geweke

M = 2

µ 0.088 (0.019) 0.057 0.119 6.091 0.800

µh -0.126 (0.289) -0.587 0.342 1.887 -0.532

φh 0.980 (0.006) 0.969 0.989 22.883 1.175

σ2h 0.051 (0.011) 0.035 0.070 61.345 -1.446

log(ML), α = 0.99 -2494.392

DIC 4969.476

Time (seconds) 8778.005

M = 10

µ 0.089 (0.019) 0.057 0.120 2.458 0.186

µh -0.120 (0.299) -0.589 0.362 1.280 0.309

φh 0.980 (0.006) 0.969 0.989 17.867 -0.134

σ2h 0.050 (0.011) 0.034 0.069 48.682 0.629

log(ML), α = 0.99 -2490.604

DIC 4961.471

Time (seconds) 9108.450

M = 100

µ 0.089 (0.019) 0.058 0.120 1.804 1.103

µh -0.123 (0.293) -0.590 0.338 1.050 0.757

φh 0.979 (0.006) 0.968 0.989 17.703 -1.691

σ2h 0.052 (0.012) 0.035 0.073 46.176 1.719

log(ML), α = 0.99 -2488.960

DIC 4958.287

Time (seconds) 12227.616

M = 1000

µ 0.088 (0.019) 0.057 0.119 1.924 -0.899

µh -0.120 (0.296) -0.587 0.352 0.982 -1.522

φh 0.979 (0.006) 0.969 0.989 17.550 0.021

σ2h 0.051 (0.011) 0.034 0.071 45.178 0.331

log(ML), α = 0.99 -2488.923

DIC 4958.711

Time (seconds) 41303.696

RB : inefficiency factor (using a bandwidth, B, of 100). Geweke: Geweke’s convergence

statistic. log(ML): log-marginal likelihood. DIC: deviance information criterion.

Page 126: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

Bibliography

[1] Abanto-Valle, C. A., D. Bandyopadhyay, V. H. Lachos, and I. Enriquez. 2010. “Robust

Bayesian analysis of heavy-tailed stochastic volatility models using scale mixtures of

normal distributions.” Computational Statistics and Data Analysis 54(12): 2883-2898.

[2] Andersen, T. G., T. Bollerslev, F. X. Diebold, and H. Ebens. 2001. “The Distribution of

Realized Stock Return Volatility.” Journal of Financial Economics 61(1): 43-76.

[3] Andersen, T. G., T. Bollerslev, and F. X. Diebold. 2007. “Roughing It Up: Including

Jump Components in the Measurement, Modeling and Forecasting of Return Volatility.”

Review of Economics and Statistics 89(4): 701-720.

[4] Andrieu, C., and A. Doucet. 2002. “Particle filtering for partially observed Gaussian state

space models.” Journal of the Royal Statistical Society B 64(4): 827-836.

[5] Andrieu, C., A. Doucet, and R. Holenstein. 2010. “Particle Markov chain Monte Carlo

methods (with discussion).” Journal of the Royal Statistical Society B 72(3): 1-33.

[6] Baillie, R. T., C. F. Chung, and M. A. Tieslau. 1996. “Analysing inflation by the frac-

tionally integrated ARFIMA-GARCH model.” Journal of Applied Econometrics 11(1):

23-40.

[7] Barndorff-Nielsen, O. E., and N. Shephard. 2002a. “Econometric Analysis of Realized

Volatility and its Use in Estimating Stochastic Volatility Models.” Journal of the Royal

Statistical Society B 64(2): 253-280.

[8] Barndorff-Nielsen, O. E., and N. Shephard. 2002b.“Estimating Quadratic Variation using

Realised Variance.” Journal of Applied Econometrics 17(5): 457-477.

[9] Barndorff-Nielsen, O. E., and N. Shephard. 2004. “Power and Bipower Variation with

Stochastic Volatility and Jumps.” Journal of Financial Econometrics 2(1): 1-37.

[10] Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard. 2009. “Realised

Kernels in Practice: Trades and Quotes.” The Econometrics Journal 12(3): 1-32.

[11] Bauwens, L., M. Lubrano, and J. F. Richard. 1999. Bayesian Inference in Dynamic

Econometric Models. Oxford University Press.

[12] Bauwens, L., G. Koop, D. Korobilis, and V. K. Rombouts. 2011. “A Comparison of

Forecasting Models for Macroeconomics Series: The Contribution of Structural Break

Models .” Working paper, University of Strathclyde.

[13] Beran, J. 1994. Statistics for Long-Memory Processes. Chapman and Hall.

Page 127: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

118

[14] Berg, A., R. Meyer, and J. Yu. 2004. “Deviance Information Criterion for Comparing

Stochastic Volatility Models.” Journal of Business and Economic Statistics 22(1): 107-

120.

[15] Bollerslev, T. 1987. “A Conditional Heteroskedastic Time Series Model for Speculative

Prices and Rates of Return.” The Review of Economics and Statistics 69(3): 542-547.

[16] Bollerslev, T., U. Kretschmer, C. Pigorsch, and G. E. Tauchen. 2007. “A Discrete-Time

Model for Daily S&P 500 Returns and Realized Variations: Jumps and Leverage Effects.”

Journal of Econometrics 150 : 151-166.

[17] Bos, C. S. 2011. “A Bayesian Analysis of Unobserved Component Models Using Ox.”

Journal of Statistical Software 41(13): 1-24.

[18] Bos, C. S. 2013. GnuDraw. URL http://www.tinbergen.nl/˜cbos/gnudraw.html.

[19] Bos, C. S., S. J. Koopman, and M. Ooms. 2012. “Long memory with stochastic vari-

ance model: A recursive analysis for U.S. inflation.” Computational Statistics and Data

Analysis 76(3): 144-157.

[20] Chan, J. 2013.“Moving Average Stochastic Volatility Models with Application to Inflation

Forecast.” Journal of Econometrics 176 (2): 162-172.

[21] Chan, J. 2014. “The Stochastic Volatility in Mean Model with Time-Varying Parameters:

An Application to Inflation Modeling.” Working paper, Research School of Economics,

Australian National University.

[22] Chan, J., and A. L. Grant. 2014. “Issues in Comparing Stochastic Volatility Models Us-

ing the Deviance Information Criterion.” Working paper, Research School of Economics,

Australian National University.

[23] Chan, J., and C. Hsiao. 2013. “Estimation of Stochastic Volatility Models with Heavy

Tails and Serial Dependence.” In Bayesian Inference in the Social Sciences. John Wiley

& Sons, New york.

[24] Chan, J., and I. Jeliazkov. 2009. “Efficient Simulation and Integrated Likelihood Esti-

mation in State Space Models.” International Journal of Mathematical Modelling and

Numerical Optimisation 1 : 101-120.

[25] Chan, N. H., and W. Palma. 1998. “State space modeling of long-memory processes.”

Annals of Statistics 26(2): 719-740.

[26] Carter, C., and R. Kohn. 1994. “On Gibbs sampling for state space models.” Biometrika

81(3): 541-553.

[27] Chib, S. 1995. “Marginal Likelihood from the Gibbs Output.” Journal of the American

Statistical Association 90(432): 1313-1321.

[28] Chib, S., and E. Greenberg. 1995. “Understanding the Metropolis-Hastings Algorithm.”

The American Statistician 49(4): 327-335.

Page 128: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

119

[29] Chib, S. 1996. “Calculating posterior distributions and modal estimates in Markov mix-

ture models.” Journal of Econometrics 75(1): 79-97.

[30] Chib, S. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal

of Econometrics 86(2): 221-241.

[31] Chib, S., F. Nadari, and N. Shephard. 2002. “Markov chain Monte Carlo methods for

stochastic volatility models.” Journal of Econometrics 108(2): 281-316.

[32] Cogley, T., and T. J. Sargent. 2005. “Drifts and volatilities: Monetary policies and out-

comes in the post WWII US.” Review of Economic Dynamics, 8(2): 262-302.

[33] Corsi, F., U. Kretschmer, S. Mittnik, and C. Pigorsch. 2008. “The Volatility of Realized

Volatility.” Econometric Reviews 27(1-3): 46-78.

[34] Corsi, F. 2009.“A Simple Approximate Long-Memory Model of Realized Volatility.”Jour-

nal of Financial Econometrics 7(2): 174-196.

[35] Creal, D. 2012. “A survey of sequential Monte Carlo methods for economics and finance.”

Econometric Reviews 31(3 ): 245-296.

[36] Del Moral, P. 2004. Feynman-Kac Formulae: Genealogical and Interacting Particle Sys-

tems with Applications. Springer.

[37] Diebold, F. X., and A. Inoue. 2001. “Long Memory and Regime Switching.” Journal of

Econometrics 105 : 131-159.

[38] Doornik, J. A. 2009. An Object-Oriented Matrix Language Ox 6. Timberlake Consultants

Press.

[39] Doucet, A., S. J. Godsill, and C. Andrieu. 2000. “On sequential Monte Carlo sampling

methods for Bayesian filtering.” Statistics and Computing 10(3): 197-208.

[40] Doucet, A., and A. Johansen. 2011. “A tutorial on particle filtering and smoothing: Fif-

teen years later.” In The Oxford Handbook of Nonlinear Filtering. D. Crisan and B.

Rozovsky, Eds. Oxford University Press.

[41] Durbin, J., and S. J. Koopman. 2002. “A simple and efficient simulation smoother for

state space time series analysis.” Biometrika 89(3): 603-616.

[42] Eisenstat, E., and R. W. Strachan. 2014. “Modelling inflation volatility.” CAMA Working

Paper 24.

[43] Flury, T., and N. Shephard. 2011. “Bayesian inference based only on simulated likelihood:

particle filter analysis of dynamic economic models.”Econometric Theory 27(5): 933-956.

[44] Fouque, J. P., C. H. Han, and M. Molina. 2010.“MCMC Estimation of Multiscale Stochas-

tic Volatility Models.” Handbook of Quantitative Finance and Risk Management 1109-

1120.

Page 129: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

120

[45] Gerlach, R., C. Carter, and R. Kohn. 2000. “Efficient Bayesian Inference for Dynamic

Mixture Models.” Journal of the American Statistical Association 95 : 819-828.

[46] Gelfand, A., and D. Dey. 1994. “Bayesian Model Choice: Asymptotics and Exact Calcu-

lations.” Journal of the Royal Statistical Society B 56(3): 501-514.

[47] Geweke, J. 2005. Contemporary Bayesian Econometrics and Statistics. Wiley.

[48] Geweke, J., and C. Whiteman. 2006. “Bayesian Forecasting.” In G. Elliott, C. Granger,

and A. Timmermann (eds.) Handbook of Economic Forecasting, vol. 1. New York: Else-

vier.

[49] Giordani, P., and R. Kohn. 2008. “Efficient Bayesian Inference for Multiple Change-Point

and Mixture Innovation Models.” Journal of Business and Economic Statistics 26(1):

66-77.

[50] Gordon, S., and J. Maheu. 2008. “Learning, Forecasting and Structural Breaks.” Journal

of Applied Econometrics 23(5): 553-583.

[51] Grassi, S., and T. Proietti. 2010. “Has the Volatility of U.S. Inflation Changed and How?”

Journal of Time Series Econometrics 2(1): Article 6.

[52] Groen, J., R. Paap, and F. Ravazzolo. 2012. “Real-time Inflation Forecasting in a Chang-

ing World.” Journal of Business and Economic Statistics 31(1): 29-44.

[53] Ghysels, E., P. Santa-Clara, and R. Valkanov. (2006). “Predicting Volatility: Getting the

Most out of Return Data Sampled at Different Frequencies.” Journal of Econometrics

131(1-2): 59-95.

[54] Hamilton, J. D. 1989.“A New Approach to the Economic Analysis of Nonstationary Time

Series and the Business Cycle.” Econometrica 57(2): 357-384.

[55] Hansen, P. R., and A. Lunde. 2006.“Realized Variance and Market Microstructure Noise.”

Journal of Business and Economic Statistics 24(2): 127-161.

[56] Hansen, P. R., A. Lunde, and J. M. Nason. 2011. “The Model confidence set.” Economet-

rica 79(2): 453-497.

[57] Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter,

Cambridge University Press.

[58] Huang, X., and G. Tauchen. 2005. “ The Relative Contribution of Jumps to Total Price

Variance.” Journal of Financial Econometrics 3(4): 456-499.

[59] Jacquiera, E., N. G. Polson, and P. E. Rossi. 1994. “Bayesian Analysis of Stochastic

Volatility Models.” Journal of Business and Economic Statistics 12 : 371-417.

[60] Jacquiera, E., N. G. Polson, and P. E. Rossi. 2004. “Bayesian analysis of stochastic

volatility models with fat-tails and correlated errors.” Journal of Econometrics 122(1):

185-212.

Page 130: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

121

[61] Kass, R. E., and A. E. Raftery. 1995. “Bayes Factors and Model Uncertainty.” Journal of

the American Statistical Association 90 : 773-795.

[62] Kim, S., N. Shephard, and S. Chib. 1998. “Stochastic Volatility: Likelihood Inference and

Comparison with ARCH Models.” Review of Economic Studies 65(3): 361-393.

[63] Kim, C. J., and C. R. Nelson. 1999. State-Space Models with Regime Switching: Classical

and Gibbs-sampling Approaches with Applications. MIT press.

[64] Kim, C. J., and C. R. Nelson. 1999. “Has the US economy become more stable? A

Bayesian approach based on a Markov-switching model of the business cycle.” Review of

Economics and Statistics 81(4): 608-616.

[65] Kim, C. J., C. R. Nelson, and J. Piger. 2004. “The less-volatile US economy: A Bayesian

investigation of timing, breath, and potential explanations.” Journal of Business and

Economic Statistics 22(1): 80-93.

[66] Kim, C. J., C. Morley, and C. Nelson. 2005. “The Structural Break in the Equity Pre-

mium.” Journal of Business and Economic Statistics 23(2): 181-191.

[67] Koop, G. 2003. Bayesian Econometrics. John Wiley & Sons Ltd.

[68] Koop, G., and D. Korobilis. 2010. “Bayesian Multivariate Time Series Methods for Em-

pirical Macroeconomics.” Foundations and Trends in Econometrics 3 : 267-358.

[69] Koop, G., E. Ley, J. Osiewalski, and M. Steel. 1997. “Bayesian analysis of long memory

and persistence using ARFIMA models.” Journal of Econometrics 76(1-2): 149-169.

[70] Koop, G., R. Leon-Gonzalez, and R. W. Strachan. 2010. “Dynamic probabilities of re-

strictions in state space models: an application to the Phillips curve.”Journal of Business

and Economic Statistics 28(3): 370-379.

[71] Koopman, S. J., and E. H. Uspensky. 2002. “The stochastic volatility in mean model:

empirical evidence from international stock markets.” Journal of Applied Econometrics

17(6): 667-689.

[72] Koopman, S. J., B. Jungbacker, and E. Hol. 2005. “Forecasting Daily Variability of the

S&P 100 Stock Index Using Historical, Realised and Implied Volatility Measurements.”

Journal of Empirical Finance 12(3): 445-475.

[73] Lancaster, T. 2004. Introduction to Modern Bayesian Econometrics. Wiley-Blackwell.

[74] Lindsten, F., M. I. Jordan, and T. B. Schon. 2012.“Ancestor Sampling for Particle Gibbs.”

Advances in Neural Information Processing Systems (NIPS) 25 : 2600-2608.

[75] Lindsten, F., M. I. Jordan, and T. B. Schon. 2014. “Particle Gibbs with Ancestor Sam-

pling.” Journal of Machine Learning Research 15 : 2145-2184.

[76] Lindsten, F., and T. B. Schon. 2013. “Backward Simulation Methods for Monte Carlo

Statistical Inference.” Foundations and Trends in Machine Learning 6(1): 1-14.

Page 131: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

122

[77] Liu, J. S., and R. Chen. 1998. “Sequential Monte Carlo Methods for Dynamic Systems.”

Journal of the American Statistical Association 93(443): 1032-1044.

[78] Liu, C., and J. Maheu. 2008. “Are There Structural Breaks in Realized Volatility?” Jour-

nal of Financial Econometrics 6(3): 326-360.

[79] Liu, C., and J. Maheu. 2009. “Forecasting Realized Volatility: A Bayesian Model Aver-

aging Approach.” Journal of Applied Econometrics 24(5): 709-733.

[80] Malik, S., and M. K. Pitt. 2011. “Modelling stochastic volatility with leverage and jumps:

a simulated maximum likelihood approach via particle filtering.” Working paper, Univer-

sity of Warwick.

[81] Marcellino, M., J. H. Stock, and M. W. Watson. 2005. “A Comparison of Direct and

Iterated AR Methods for Forecasting Macroeconomic Series h-Steps Ahead.” Journal of

Econometrics 135(1-2): 499-526.

[82] McAleer, M., and M. Medeiros. 2008.“Realized volatility: a review.”Econometric Reviews

27(1-3): 10-45.

[83] Muller, U. A., M. M. Dacorogna, R. D. Dave, R. B. Olsen, O. V. Pictet, and J. E. von

Weizsacker. 1997. “Volatilities of different time resolutions-Analyzing the dynamics of

market components.” Journal of Empirical Finance 4(2-3): 213-239.

[84] Nakajima, J., and Y. Omori. 2012. “Stochastic volatility model with leverage and asym-

metrically heavy-tailed error using GH skew Student’s t-distribution.” Computational

Statistics and Data Analysis 56(11): 3690-3704.

[85] Omori, Y., S. Chib, N. Shephard, and J. Nakajima. 2007. “Stochastic volatility with

leverage: Fast and efficient likelihood inference.”Journal of Econometrics 135(1-2): 499-

526.

[86] Pesaran, H., D. Pettenuzzo, and A. Timmermann. 2006.“Forecasting Time Series Subject

to Multiple Structural Breaks.” Review of Economic Studies 73(4): 1057-1084.

[87] Pitt, M. K., and N. Shephard. 1999. “Filtering via Simulation: Auxiliary Particle Filters.”

Journal of the American Statistical Association 94 : 590-599.

[88] Primiceri, G. E. 2005. “Time varying structural vector autoregressions and monetary

policy.” Review of Economic Studies 72(3): 821-852.

[89] Raggi, D., and S. Bordignon. 2012. “Long Memory and Nonlinearities in Realized Volatil-

ity: A Markov Switching Approach.”Computational Statistics and Data Analysis 56(11):

3730-3742.

[90] Robert, C., and G. Casella. 1999. Monte Carlo Statistical Methods. Springer, Berlin.

[91] Shephard, N. 1996. Statistical Aspects of ARCH and Stochastic Volatility. In Time Series

Models in Econometrics, Finance and Other Fields. Chapman and Hall.

Page 132: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

BIBLIOGRAPHY

123

[92] Sims, C. A., D. F. Waggoner, and T. Zha. 2008.“Methods for Inference in Large Multiple-

Equation Markov-Switching Models.” Journal of Econometrics 146(2): 255-274.

[93] So, M. K. P., C. W. S. Chen, and M. Chen. 2005. “A Bayesian Threshold Nonlinearity

Test for Financial Time Series.” Journal of Forecasting 24(1): 61-75.

[94] Stock, J. H., and M. W. Watson. 2007. “Why Has U.S. Inflation Become Harder to

Forecast?” Journal of Money, Credit, and Banking 39(1): 3-34.

[95] Spiegelhalter, D., N. Best, B. Carlin, and A. van der Linde. 2002. “Bayesian Measures of

Model Complexity and Fit (with comments)”. Journal of the Royal Statistical Society B

64(4): 583-639.

[96] Watanabe, T., and Y. Omori. 2004.“A Multi-move Sampler for Estimating Non-Gaussian

Time Series Models: Comments on Shephard and Pitt (1997).” Biometrika 91(1): 246-

248.

[97] Whiteley, N., C. Andrieu, and A. Doucet. 2010.“Efficient Bayesian Inference for Switching

State-Space Models using Particle Markov chain Monte Carlo methods.”Bristol Statistics

Research Report 10:04.

[98] Zellner, A. 1986. “Bayesian Estimation and Prediction Using Asymmetric Loss Func-

tions.” Journal of the American Statistical Association 81 : 446-451.

Page 133: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

DEPARTMENT OF ECONOMICS AND BUSINESS AARHUS UNIVERSITY

SCHOOL OF BUSINESS AND SOCIAL SCIENCES www.econ.au.dk

PhD Theses since 1 July 2011 2011-4 Anders Bredahl Kock: Forecasting and Oracle Efficient Econometrics 2011-5 Christian Bach: The Game of Risk 2011-6 Stefan Holst Bache: Quantile Regression: Three Econometric Studies 2011:12 Bisheng Du: Essays on Advance Demand Information, Prioritization and Real Options

in Inventory Management 2011:13 Christian Gormsen Schmidt: Exploring the Barriers to Globalization 2011:16 Dewi Fitriasari: Analyses of Social and Environmental Reporting as a Practice of

Accountability to Stakeholders 2011:22 Sanne Hiller: Essays on International Trade and Migration: Firm Behavior, Networks

and Barriers to Trade 2012-1 Johannes Tang Kristensen: From Determinants of Low Birthweight to Factor-Based

Macroeconomic Forecasting 2012-2 Karina Hjortshøj Kjeldsen: Routing and Scheduling in Liner Shipping 2012-3 Soheil Abginehchi: Essays on Inventory Control in Presence of Multiple Sourcing 2012-4 Zhenjiang Qin: Essays on Heterogeneous Beliefs, Public Information, and Asset

Pricing 2012-5 Lasse Frisgaard Gunnersen: Income Redistribution Policies 2012-6 Miriam Wüst: Essays on early investments in child health 2012-7 Yukai Yang: Modelling Nonlinear Vector Economic Time Series 2012-8 Lene Kjærsgaard: Empirical Essays of Active Labor Market Policy on Employment 2012-9 Henrik Nørholm: Structured Retail Products and Return Predictability 2012-10 Signe Frederiksen: Empirical Essays on Placements in Outside Home Care

Page 134: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2012-11 Mateusz P. Dziubinski: Essays on Financial Econometrics and Derivatives Pricing 2012-12 Jens Riis Andersen: Option Games under Incomplete Information 2012-13 Margit Malmmose: The Role of Management Accounting in New Public Management Reforms: Implications in a Socio-Political Health Care Context 2012-14 Laurent Callot: Large Panels and High-dimensional VAR 2012-15 Christian Rix-Nielsen: Strategic Investment 2013-1 Kenneth Lykke Sørensen: Essays on Wage Determination 2013-2 Tue Rauff Lind Christensen: Network Design Problems with Piecewise Linear Cost

Functions

2013-3 Dominyka Sakalauskaite: A Challenge for Experts: Auditors, Forensic Specialists and the Detection of Fraud 2013-4 Rune Bysted: Essays on Innovative Work Behavior 2013-5 Mikkel Nørlem Hermansen: Longer Human Lifespan and the Retirement Decision 2013-6 Jannie H.G. Kristoffersen: Empirical Essays on Economics of Education 2013-7 Mark Strøm Kristoffersen: Essays on Economic Policies over the Business Cycle 2013-8 Philipp Meinen: Essays on Firms in International Trade 2013-9 Cédric Gorinas: Essays on Marginalization and Integration of Immigrants and Young Criminals – A Labour Economics Perspective 2013-10 Ina Charlotte Jäkel: Product Quality, Trade Policy, and Voter Preferences: Essays on

International Trade 2013-11 Anna Gerstrøm: World Disruption - How Bankers Reconstruct the Financial Crisis: Essays on Interpretation 2013-12 Paola Andrea Barrientos Quiroga: Essays on Development Economics 2013-13 Peter Bodnar: Essays on Warehouse Operations 2013-14 Rune Vammen Lesner: Essays on Determinants of Inequality 2013-15 Peter Arendorf Bache: Firms and International Trade 2013-16 Anders Laugesen: On Complementarities, Heterogeneous Firms, and International Trade

Page 135: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

2013-17 Anders Bruun Jonassen: Regression Discontinuity Analyses of the Disincentive Effects of Increasing Social Assistance 2014-1 David Sloth Pedersen: A Journey into the Dark Arts of Quantitative Finance 2014-2 Martin Schultz-Nielsen: Optimal Corporate Investments and Capital Structure 2014-3 Lukas Bach: Routing and Scheduling Problems - Optimization using Exact and Heuristic Methods 2014-4 Tanja Groth: Regulatory impacts in relation to a renewable fuel CHP technology:

A financial and socioeconomic analysis 2014-5 Niels Strange Hansen: Forecasting Based on Unobserved Variables 2014-6 Ritwik Banerjee: Economics of Misbehavior 2014-7 Christina Annette Gravert: Giving and Taking – Essays in Experimental Economics 2014-8 Astrid Hanghøj: Papers in purchasing and supply management: A capability-based perspective 2014-9 Nima Nonejad: Essays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques in Time Series Econometrics

Page 136: Essays in Bayesian Particle and Markov chain …pure.au.dk/portal/files/82102081/Nima_Nonejad_PhD_thesis.pdfEssays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques

ISBN: 9788793195059


Recommended