Stochastic Vol Forecasting

Stochastic Volatility Forecasting for Emerging Markets

Swati Mital

University of Oxford

April 11, 2016

Abstract

We forecast one-step ahead volatility for the MSCI Emerging Markets Index

using a Stochastic Volatility model that is solved with Kalman Filtering tech-

nique. The stochastic model is evaluated against popular generalized autoregressive

GARCH model. The Stochastic Differential Equations for the model are derived

and linearized into State Space form that can be solved with a Kalman Filter. The

source code in R is provided.

1 Introduction

The ability to estimate volatility effectively is key to understanding financial markets and

predicting returns. Unlike asset returns, volatility is not directly observable in the market.

In other words, volatility is a latent variable. Most of the academic literature on this sub-

ject is dedicated to estimating volatility using various econometric style models (ARCH,

GARCH, IGARCH, EGARCH) [1]. The focus of this article is estimating volatility using

a stochastic model implemented using State Space Filtering technique.

MSCI Emerging Markets Index 1 is a free float-adjusted market capitalization index

designed to capture large and mid cap representations across 23 Emerging Market coun-

tries2. Several funds use emerging markets portfolio for their long term growth potential

by capitalizing on increasing consumption of the middle classes in these markets. The

nature of the Emerging Markets mean that the index is highly volatile. Since 2002 the

index has lost value in six calendar years and gained in eight with the worst annual return

in financial crisis of 2008 when it plummeted by 53.33% but made annual gain of 78.51%

the following year.

The highly fluctuating nature of the Emerging Markets index makes it an interesting

observation for volatility estimation. In this article, the focus is on the daily MSCI

Emerging Markets (EM) USD returns for estimation of the one-day ahead volatility using a

Stochastic Volatility (SV) model. The fitting and forecasting of the SV model is compared

to an auto-regressive GARCH model that is trained on the same dataset.

The implementation has been done using R version 3.2.3 and the source code is also

available to download from GitHub 3.

1https://www.msci.com/resources/2EM countries include: Brazil, Chile, China, Colombia, Czech Republic, Egypt, Greece, Hungary,

India, Indonesia, Korea, Malaysia, Mexico, Peru, Philippines, Poland, Russia, Qatar, South Africa,

Taiwan, Thailand, Turkey and United Arab Emirates.3https://github.com/1000084/AdvancedFinancialDataAnalysis

1

2 Stochastic Volatility Model

Some of the empirically observed stylized facts for volatility are clustering and persistence,

leptokurtosis and mean reversion [2]. We follow the seminal paper by Harvey, Ruiz and

Shepard [3] who describe a stochastic volatility model that attempts to capture these

stylized facts of the volatility. In this section we derive the master equation for stochastic

volatility model starting from lognormal asset return process and an Ornstein-Uhlenbeck

process for logarithm of volatility.

Let St represent the value of MSCI Index at time t, µ be the mean and σt the

volatility at time t. The lnσ2t follows a mean-reverting process with mean reversion speed

of κ, mean reversion level of ν and volatility of γ.

We get the following Stochastic Differential Equations where the Brownian Motions,

Wt and Bt, are correlated with factor ρ.

dStSt

= µdt+ σtdWt

d lnσ2t = κ(ν − lnσ2

t )dt+ γdBt

d 〈Wt, Bt〉 = ρdt

(1)

The first step is to solve the Geometric Brownian Motion and represent it in terms of

log returns. We perform discretization by selecting ∆ = 1252

where 252 is approximately

the number of business days in a year.

lnSt+∆ − lnSt =

(µ− σ2

t

2

)∆ + σt Wt (2)

We substitute xt for log returns, set ht = lnσ2t and

(µ− eht

2

)∆ = αt. We also know

that the Brownian Motion, Wt, can be written as Wt =√

∆Ut where Ut is a standard

normal random variable.

xt =

(µ− σ2

t

2

)∆ + σt

√∆Ut

=

(µ− eht

2

)∆ + e

ht2

√∆Ut

= αt + eht2

√∆Ut

(3)

At this point we approximate αt as an estimator of the mean of the log returns and

represent it as α̂ = ∆∑252

i=1 xi. We square both sides of the equation and take natural

logarithms to get,

2

ln(xt − α̂)2 = ht + ln(U2t ) (4)

ln(U2t ) is a logarithmic χ2 distribution with known expectation of -1.27 and variance

of approximately, π2

2, which is roughly 4.93. However, these theoretical moments can only

be replicated with very large number of samples. To take this into account we add and

subtract E [ln(U2t )] in Equation (4).

ln(xt − α̂)2 = E[ln(U2

t )]

+ ht +[ln(U2

t )− E[ln(U2t )]]

= η0 − 1.27 + ht + ξt(5)

where ξt = ln(U2t ) + 1.27 and has variance of π2

2≈ 4.93.

We now focus on the mean reverting part of Equation (1). A mean-reverting stochas-

tic differential equation can be solved by choosing a suitable integrating factor. In this

case, we select an integrating factor of e−κt. This gives us,

ht = hse−κ(t−s) + ν

{1− e−κ(t−s)}+ γ

∫ t

s

e−κ(t−u)dBu (6)

We follow similar discretization scheme as before and substitute for constant factors

to get,

ht+∆ = φ+ βht + ζt (7)

where ζt is Gaussian white noise with mean 0 and whose variance is estimated and

φ = ν{

1− e−κ∆}

and β = e−κ∆.

In summary, Equations (5) and (7) give us Master Equations for the Stochastic

Volatility model.

ln y2t = η0 − 1.27 + ht + ξt

ht+∆ = φ+ βht + ζt(8)

As we will see in later sections that this Stochastic Volatility model is able to capture

Excess Kurtosis and Clustering properties of the volatility process. In addition, the

Ornstein-Uhlenbeck nature of the volatility ensures mean reversion.

The estimation of Equation (8) is tricky because unlike GARCH model the volatility

cannot be observed one-step ahead. Harvey and Shepard in [4] proposed estimation

of the Stochastic Volatility model using Quasi-Maximum Likelihood (QML) procedure

by transforming Equation (8) into a linear state space form. This allows estimation of

the parameters φ, β, η0 and the variance of ξt and ζt by treating them as normal and

3

maximizing the prediction-error decomposition form of the likelihood obtained via the

Kalman Filter.

3 Kalman Filter

A State Space model has it’s origin in Control Theory and it’s dynamics are given by

Equation (9). The notations in this section follow that of Koopman et al. [5] and should

not be confused with the model described in previous section.

yt = dt + Ztαt + εt

αt = ct + Ttαt−1 +Rtηt(9)

where ηt ∼ N(0, Qt) and εt ∼ N(0, Ht).

The first equation is the measurement equation that links the observation, yt, with

the latent variable, αt, along with a noise, εt, and a deterministic input, dt. The second

equation is the state transition equation that updates the latent variable using information

from the previous step and some random noise, ηt. The matrices Zt, Tt, Rt, Qt, Ht can

evolve over time as long as they are known at time t− 1.

The Kalman Filter method is an iterative computational algorithm that is used to

forecast the latent variable, αt, and it’s variance at each step using a combination of

measurement and prediction update functions.

1. Time Update Equations: Perform one step ahead forecast of the state variable

and compute it’s variance conditional on the observations up to the last time step.

Therefore, we define,

at−1 = E[αt−1|y0, ..., yt−1]

Pt−1 = E[(αt−1 − at−1)(αt−1 − at−1)T ](10)

where, at, is the estimate of the state vector at time t conditional on past observa-

tions and Pt is the conditional covariance matrix at time t. These are given by the

time update equations,

at|t−1 = Ttat−1 + ct

Pt|t−1 = TtPt−1TTt +RtQtR

Tt

(11)

The Kalman Filter requires initial estimates of at|t−1 and Pt|t−1 to start the iteration.

2. Measurement Update Equations: In this step, we update the conditional es-

timates of the state vector using the new observations. Let Ft = ZtPt|t−1ZTt + Ht,

then we can update at and Pt as,

4

at = at|t−1 + Pt|t−1ZTt F−1t (yt − Ztat|t−1 − dt)

Pt = Pt|t−1 − Pt|t−1ZtF−1t ZT

t Pt|t−1

(12)

Fernando Tussell provides detailed comparison of various R implementations of Kalman

Filtering in [7]. For this paper, we have chosen R Package FKF version 0.1.3 4 which ap-

peared on CRAN in March 2012.

4 Time Series Analysis and Pre-Processing

We choose MSCI Emerging Markets Index ETF data during the last 5 years (11 April

2011 until 02 March 2016) for our study. We download the daily closing prices from Yahoo

Finance (ticker: EEM) for this index and convert them into log returns using Equation

(13).

rt = lnSt − lnSt−1 (13)

where rt is the daily log returns computed from daily closing prices St.

Figure 1: MSCI Emerging Markets ETF Daily Prices

Figures 1 and 2 plot the MSCI Emerging Markets Index daily price level and log

returns during the sample period of last 5 years. In total, we have 1257 data points for

4https://cran.r-project.org/web/packages/FKF/FKF.pdf

5

Figure 2: MSCI Emerging Markets Daily Log Returns

the index. We divide this into in-sample data that we use to train our model and the

out-sample data that is used for testing the model for the purposes of forecasting. The

in-sample data consists of 1131 points and span the period from 11 April 2011 until 07

October 2015. The descriptive statistics for the MSCI EM return series is shown in Table

1. It shows the mean, median, kurtosis, skewness and the standard deviation of the series.

Table 1: Descriptive Statistics during 11 April 2011 - 08 April 2016

MSCI Emerging Markets Index ETF Returns

Statistics Value

Mean −0.000320

Median 0.0

Standard Deviation 0.014278

Excess Kurtosis 3.035950

Skewness −0.283108

Maximum 0.060530

Minimum −0.087054

We notice that the mean of the series is very close to 0 and the volatility as measured

by standard deviation is 0.01428. The returns are negatively skewed, which is explained by

the index performing badly in the recent years, and the excess kurtosis is close to normal

distribution indicating lack of leptokurtosis (fat tailed-ness) in this particular dataset.

6

5 Benchmark Model Selection

Figure 3 below shows the serial auto-correlation in log returns of the Emerging Markets

in-sample data. We notice that the ACF dies out after a lag of around 5 in this particular

dataset. A popular econometric way of modeling volatility is through a Generalised

Autoregressive Conditionally Heteroscedastic, GARCH(p,q), model. We use GARCH as

benchmark to compare the performance of the Stochastic Volatility model.

Consider a time series, yt, then the GARCH(p, q), where p is the order of GARCH

terms σ2 and q is the order of the ARCH terms y2t , is given by the conditional variance

equation as below.

σ2t = ω +

q∑i=1

αiy2t−i +

p∑i=1

βiσ2t−i (14)

where the series yt is assumed to have variance σ2t . The distribution of the series

is typically chosen to be either normal or student-t depending on the series statistical

properties.

Figure 3: MSCI Emerging Markets Auto-correlations

Akaike Information Criteria (AIC) rewards a model for goodness of fit and penalizes

the number of free parameters employed in achieving that fit. Lower AIC values are

preferred. It is defined as, AIC = −2 lnL + 2k, where L is the maximized value of the

log likelihood with k parameters. Similarly, Bayesian Information Criteria (BIC) is also

a goodness of fit measure of a model and is defined as BIC = −2 lnL+ 2k ln(N), where

N is the number of observations.

7

Table 2: AIC for GARCH(p,q) model

p/q 1 2 3 4 5

1 -5.90917 −5.90745 −5.90573 −5.90433 −5.90310

2 -5.90785 -5.90904 -5.90598 -5.90671 -5.90487

3 -5.90608 -5.90725 -5.90398 -5.90574 -5.90386

4 -5.90432 -5.90539 -5.90221 -5.90441 -5.90252

5 -5.90266 -5.90343 -5.90042 -5.90252 -5.90075

Table 3: BIC for GARCH(p,q) model

p/q 1 2 3 4 5

1 -5.89138 -5.88521 -5.87904 -5.87320 -5.86751

2 -5.88561 -5.88235 -5.87484 -5.87112 -5.86483

3 -5.87939 -5.87611 -5.86839 -5.86571 -5.85938

4 -5.87318 -5.86981 -5.86218 -5.85993 -5.85359

5 -5.86707 -5.86340 -5.85594 -5.85359 -5.84737

Even though the reference [6] gives compelling evidence to select GARCH(1,1) model

we perform our own research and find that indeed GARCH(1,1) is the optimal model

according to the AIC and BIC values for different combinations of p and q as shown

in Tables 2 and 3. Henceforth, we only discuss GARCH(1,1) model as a benchmark to

evaluate the Stochastic Volatility model.

6 Model Estimation using Kalman Filter

In Section 2, we had derived the Stochastic Volatility State Space model that is given by

the following measurement and update equations,

ln y2t = η0 − 1.27 + ht + ξt

ht+∆ = φ+ βht + ζt(15)

The Quasi-maximum likelihood (QML) approach for estimating the stochastic volatil-

ity model using Kalman Filter is proposed in [3] and [4]. Since ln(y2t ) is not Gaussian,

the Kalman Filter returns a minimum mean square linear estimators (MMSLE) for ht as

opposed to minimum mean square estimators (MMSE). For this purpose, we assume that

ξt ∼ N(0, π2

2) and estimate the variance of ζt, given by θ2.

8

Using the R package for Kalman Filtering, FKF 5, we get the negative log likelihood

of the stochastic volatility estimator with initial values of the parameters. We maximize

this objective function using quasi-Newton method BFGS [8] implemented in R statistical

package, optim{stats}. The in-sample data we use are the log returns squared ranging

from 11 April 2011 until 07 October 2015. The fitted parameters that are output from

the optimization function in R are given in Table 4.

Table 4: SV model fitted parameters

Parameters Fit 95% Confidence Interval

η0 0.088506 (−1.046131, 1.223141)

φ -0.164523 (−0.368023, 0.038977)

β 0.981534 (0.965822, 0.997245)

θ 0.124870 (0.071250, 0.178491)

Table 5: GARCH(1,1) model fitted parameters

Fit Std. Error t value Pr(> |t|)

ω 2.598018e-06 1.322631e-06 1.964281 4.949757e-02

α1 7.769311e-02 1.901114e-02 4.086714 4.375255e-05

β1 9.111031e-01 2.111014e-02 43.159494 0.000000e+00

We also fit GARCH(1,1) model to the in-sample dataset. The fitted parameters of

GARCH(1,1) and their standard errors are given in Table 5. The small p-values for the ω

and α1 parameters indicate that we should reject the null hypothesis. Given log returns,

yt, GARCH(1,1) volatility equation as described before is given by,

σ2t = ω + α1y

2t−1 + β1σ

2t−1 (16)

The log volatility estimate, ht, from the Stochastic Volatility model is converted to

annual volatility using the formula, σt =√

252eht/2. The annualized volatility is compared

with the GARCH(1,1) volatilities and the realized historical volatility using 20 business

days period. The plot in Figure 4 shows the three estimates. We notice that during mid-

2011 the GARCH and Realized volatilites move closely whereas the Stochastic Volatility

doesn’t display similar shocks.

5https://cran.r-project.org/web/packages/FKF/FKF.pdf

9

Figure 4: MSCI EM Vols Fit Comparison

The AIC and BIC scores of the Stochastic Volatility and GARCH(1,1) models are

given in Table 6 showing lower scores for the GARCH model.

Table 6: AIC and BIC scores of SV and GARCH(1,1) models

GARCH(1,1) Stochastic Vol

AIC -5.909171 -4.460508

BIC -5.891378 -4.442715

7 Volatility Forecasting

We now use the fitted parameters from the Stochastic Volatility model that are returned

by the QML estimator based on Kalman Filter to perform forecasting on the out-sample

dataset. The MSCI Emerging Markets Index ETF out-sample dataset ranges from 08

October 2015 until 02 March 2016.

Kalman Filter lends itself nicely for forecasting of the state variable which in this case

is the daily volatility. We use the time update equation of the filter, and in particular,

the estimate of the state vector at = Ttat−1 + ct, as explained in Section 3, to get one step

ahead volatility forecast for the MSCI EM Index. We similarly compute the GARCH(1,1)

forecast using the fitted parameters from the in-sample data.

Figure 5 shows the comparison between the annualized volatility computed using

10

GARCH and the Stochastic Volatility methods. It also shows the realized historical

annualized volatility computed using 20 days window. The black dotted line shows the

one-step ahead forecast using the SV model and the red dotted line computes the same

but using the GARCH(1,1) model.

Figure 5: MSCI EM Annualized Vols Forecast Comparison

Kalman Filter provides at each iteration the forecast covariance matrix for the state

vector. This allows us to construct a confidence interval around the one step ahead

annualized volatilities as shown in Figure 6. Figure 6 shows the 95% confidence interval

around the forecast volatility for the first 30 days. As we can see the prediction error

stays relatively constant with each new observation.

Figure 6: Out Sample Volatility Forecast Confidence Intervals

11

8 Model Evaluation

Some popular metrics for measuring forecast accuracy are Mean Square Error (MSE),

Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Given estimates of

the one step ahead volatility at time t, σt, these can be defined as,

MSE =1

n

n∑t=1

(σ̂t − σt)2

RMSE =

√√√√ 1

n

n∑t=1

(σ̂t − σt)2

MAE =1

n

n∑t=1

|σ̂t − σt|

(17)

where σ̂ =√

1T−1

∑Tt=1(rt − µ)2

We compute the realized volatility on a 20 day window for the out-sample dataset

and compare the forecast measures between the Stochastic Volatility and the GARCH(1,1)

models. The model with least error is generally considered to be a better model. Table 7

shows the forecast statistics between the SV model and the GARCH(1,1) model.

Table 7: Model Evaluation (Equation 17)


MSE 0.017533 0.022913

RMSE 0.132412 0.151371

MAE 0.084495 0.091245

An alternate definition of the forecast errors is given by [9]. Since volatility is a latent

variable, they suggest that the forecast accuracy measures can also be defined in terms of

returns, rt, as,

MSE =1

n

n∑t=1

((rt+1 − r̄)2 − σ2

t

)2

RMSE =

√√√√ 1

n

n∑t=1

((rt+1 − r̄)2 − σ2t )

2

MAE =1

n

n∑t=1

∣∣(rt+1 − r̄)2 − σ2t

∣∣(18)

12

We show the output using this definition of evaluation measure in Table 8. Using

this new definition of the forecast error we see that the Stochastic Volatility performs

marginally better.

Table 8: Model Evaluation (Equation 18)


MSE 0.003788 0.002832

RMSE 0.061544 0.053215

MAE 0.0588245 0.052035

9 Conclusion

The Stochastic Volatility model described in this paper is fundamentally driven from the

empirical properties or stylized facts of the volatility and is commonly used for pricing

complex derivatives. However, it’s accuracy for forecasting the MSCI Emerging Markets

Index Volatility is questionable.

We have seen that it performs either worse or marginally better than a GARCH(1,1)

model using the evaluation criteria in Section 8. It’s AIC and BIC scores are also higher

than a GARCH(1,1) model. However, since volatility is an unobserved variable only

implied from index returns, we draw the conclusion that the selection of forecasting tech-

nique shouldn’t be a binary decision and should take into consideration the nature of the

use of volatility in making a financial decision.

In summary, we don’t have a strong indicator that would enable us to recommend a

complex Stochastic Volatility model for forecasting volatility of MSCI Emerging Market

Index.

References

[1] Torben G. Andersen, Tim Bollerslev, Francis X. Diebold and Paul Labys. Modeling

and Forecasting Realized Volatility. Econometrica, 71, 529-626.

[2] Rama Cont. Volatility Clustering in Financial Markets: Empirical Facts and Agent-

Based Models. Long memory in economics, A Kirman and G Teyssiere (eds.),

Springer(2005)

[3] Andrew Harvey, Esther Ruiz, Neil Shepard. Multivariate Stochastic Variance Models,

The Review of Economic Studies, 61 (247-264)

13

[4] Andrew C. Harvey and Neil Shephard. Estimation of an Asymmetric Stochastic

Volatility Model for Asset Returns, Journal of Business & Economic Statistics,

14(4):429434, 1996.

[5] Koopman, S. J., Shephard, N., Doornik, J. A. Statistical algorithms for models in

state space using SsfPack 2.2. Econometrics Journal, Royal Economic Society, vol.

2(1), pages 107-160.

[6] Hansen, P., and Lunde, A. (2004). A Forecast Comparison of Volatility Models: Does

Anything Beat a GARCH(1,1) Model?. Journal of Applied Econometrics, 20, 873-889.

[7] Fernando Tussell. Kalman Filtering in R. Journal of Statistical Software, 2011, Vol.

39, Issue 2

[8] Nash, J. C. Compact Numerical Methods for Computers. Linear Algebra and Function

Minimisation. Adam Hilger.

[9] Awartani, B.M.A. and V. Corradi. (2005). Predicting the volatility of the S&P-500

stock index via GARCH models: the role of asymmetries. International Journal of

Forecasting, 21, 167-183.

14

Date post:	21-Jan-2017
Category:	Economy & Finance
Upload:	swati-mital
View:	406 times
Download:	0 times

Stochastic Vol Forecasting

Economy & Finance