Volatility modelling - Financial Econometrics VU Bachelor...

transcript

Volatility modelling

Financial EconometricsVU Bachelor Econometrie

Charles Bos

Tinbergen Institute & Vrije Universiteit Amsterdam

c.s.bos@vu.nl, 11A91

2 April 2015

FE15 Ts 3, p. 1/31

Overview

I Modelling time varying variances in time series of �nancialreturns. See Tsay (2010, �3.1-3.9, �3.14 and �3.16), Creal,Koopman, and Lucas (2013).

I Characteristics of �nancial dataI ML-estimation - recoupI Time varying variance: GAS, GARCH, EGARCH etc.I Diagnostic testing

FE15 Ts 3, p. 2/31

Modelling time varying volatilities in returns.

rt = log(1 + Rt) = log(Pt/Pt−1) = logPt − logPt−1

continuously compounded return

= µt + at

forecastable part + unforecastable error

1. Forecastable part µt : small/negligible, or e.g. ARMA model

2. Unforecastable error part at :I disturbance, (un-)conditional expectation zeroI with standard AR(I)MA modelling: var(at) = σ2

t≡ σ2, �xed

I with �nancial data, often serial correlation,σ2

t= var(at | F t−1) = var(rt | F t−1)

I F t−1 is the �ltration, the information set at time t − 1.

FE15 Ts 3, p. 3/31

S&P 500 volatility and clustering

91 94 97 00 03 06 09 12 15

Adj Close

91 94 97 00 03 06 09 12 15

Returns�(Returns|year)

5 10 15 20 25 30

ACF Returns

5 10 15 20 25 30

ACF Sq returns

1990/1-2015/3 daily S&P 500

FE15 Ts 3, p. 4/31

Intermezzo ML: Simple model + notationModel (speci�cs):

y ∼ N (µ, σ2) DGP

f (y |θ) =1√2πσ2

(−(y − µ)2

)Density

Notation (general):

Ln(θ;Yn) =n∏

f (yi |θ) Likelihood

ln(θ;Yn) = log Ln(θ;Yn) =∑

log f (yi |θ) Log-likelihood

Eθ∗ log f (Y |θ) ≡ l∗(θ;Y ) Expected loglikelihood

θn = argmaxθ ln(θ;Yn) ML estimator

FE15 Ts 3, p. 5/31

ML: Why would this work?

-6 -4 -2 0 2 4 6-30

0flog flog g

Lemma (Monahan (2011), L9.1)

Ef (log g(y)) =

∫(log g)f (y)dy ≤

∫(log f )f (y)dy = Ef (log f (y))

(log f is high whereever f (y) high, so highest in expectation)

FE15 Ts 3, p. 6/31

ML: Why II

Consequence of lemma:

l∗(θ;Y ) ≤ l∗(θ∗;Y )

or: Random θ will never give a better loglikelihood than `real'parameters θ∗, in expectation.

Law of large numbers:

nln(θ;Yn)

LLN→ Eθ∗(log f (Y |θ)) ≡ l∗(θ;Y ) =

∫log f (y |θ)f (y |θ∗)dy

Hence: Maximum value of ln(θ) should correspond, for large n,with n × l∗(θ

∗), so θ ≈ θ∗.

FE15 Ts 3, p. 7/31

ML: Optimisation

1. Start at j = 0, with θ ≡ θ(j)

2. Approximate with 2nd order Taylor expansion at θ:

Q(θ) ≡ ln(θ;Yn)

Q(θ + h) ≈ q(h) ≡ Q(θ) + hTQ ′(θ) +1

2hTQ ′′(θ)h

3. Maximise approximation q(h):

q′(h) = Q ′(θ) + Q ′′(θ)h = 0

⇔ Q ′′(θ)h = −Q ′(θ) or Hh = −g

4. Update: θ(j+1) = θ(j) + h, j = j + 1, and repeat from 2. asnecessary.

FE15 Ts 3, p. 8/31

91 94 97 00 03 06 09 12 15

Returns

1990/1-2015/3 daily S&P 500 returns

Model: rt ∼ N (µ, σ2)

FE15 Ts 3, p. 9/31

91 94 97 00 03 06 09 12 15

Returns�(r|year)

1990/1-2015/3 daily S&P 500 returns, andmoving-window yearly standard deviation.

Model: rt ∼ N (µ, σ2t )

FE15 Ts 3, p. 9/31

Time-varying volatility

yt ∼ N (µ, σ2t ) Volatility changes

ft = σ2t Signal is volatility

gt ≡ ∇t =∂l(θ; y)

∂ft= −1

ft− (yt − µ)2

)It|t−1 = Et−1∇t∇′t = −Et−1 Ht

= Et−11

(1− 2

(yt − µ)2

(y − µ)4

ht ≡ st = −H−1g = I−1t ∇t

= −(2f 2t )1

ft− (y − µ)2

)= −

(ft − (yt − µ)2

)ft+1 = ft + ht Newton-Raphson

ft+1 = ω + Ast + Bft Generalised Autoregressive Score (GAS)

FE15 Ts 3, p. 10/31

Back to basics: What did just happen?

at = yt − µ Unforcastable part

ft+1 = ω − A(ft − (yt − µ)2

)+ Bft Variance update

= ω + (B − A)ft + Aa2t

Compare

σ2t+1 ≡ α0 + βσ2t + α1a2t GARCH

GAS model building scheme (yt ∼ N (µ, σ2t ), ft ≡ σ2t ) ≡Generalised Autoregressive Conditional Heteroskedasticity model(Engle, 1982; Bollerslev, 1986, GARCH)

FE15 Ts 3, p. 11/31

TV III: GARCH(G)ARCH(1, 1):

σ2t+1 = α0 + α1a2t + β1σ

I ARCH(1): β1 = 0, so σ2t+1 = f (a2t )

I Higher lags possible, necessary, ARCH(m)

I GARCH(1,1): β1 > 0, so σ2t+1 = f (a2t , σ2t ) ≡ f (a2t , a

2t−1, . . . )

I Higher lags possible, hardly ever useful

I Result/intention: scaled innovation

εt =atσt

=yt − µtσt

∼ i. i. d.N (0, 1)

I Alternatively: e.g. εt ∼ t(0, 1, ν) [Implications...]

I Restrictions on parameter space?FE15 Ts 3, p. 12/31

TV IV: Expectations

Distinguish conditional and unconditional moments

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

FE15 Ts 3, p. 13/31

TV IV: ExpectationsDistinguish conditional and unconditional moments

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

FE15 Ts 3, p. 13/31

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

�2 maximal

FE15 Ts 3, p. 13/31

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

�2 maximal

FE15 Ts 3, p. 13/31

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

�2 maximal

�2 average

FE15 Ts 3, p. 13/31

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

�2 maximal

�2 average

N(0, 1)

FE15 Ts 3, p. 13/31

E(yt) = µ E(yt | F t−1) = µ

var(yt) =α0

1− α1 − β1var(yt | F t−1) = σ2t

K (yt) =3(1− (α1 + β1)2)

(1− (α1 + β1)2 − 2α21)K (yt | F t−1) = 3

-4 -3 -2 -1 0 1 2 3 4

�2 minimal

�2 maximal

�2 average

N(0, 1)t(0, 1, �= 25.28)

FE15 Ts 3, p. 13/31

TV V: ARMA in ηt = a2t − σ2

De�ne ηt ≡ a2t − σ2t

σ2t+1 = α0 + α1a2t + β1σ

a2t = α0 + (α1 + β1)a2t−1 + ηt − β1ηt−1.

I ηt is uncorrelated series with mean 0.

I a2t is an ARMA(1, 1)

(Useful for deriving some theoretical properties)

FE15 Ts 3, p. 14/31

ARMA-GARCH models

More general: ARMA(p, q)-GARCH(m, s) model

Φ(L)(rt − µ) = Θ(L)at

at | F t−1 ∼ N(0, σ2t ), σ2t+1 = α0 +m∑i=1

αia2t−i+1 +

s∑j=1

βjσ2t−j+1

Notice:

rt | F t−1 ∼ N (µt , σ2t )

rt 6∼ N (µ, σ2)

FE15 Ts 3, p. 15/31

On estimationUse prediction error decomposition,

log L(y; θ) =n∑

log p(yt | F t−1)

= −n2log(2π)− 1

n∑t=1

log(σ2t )− 1

n∑t=1

a2tσ2t

�lling inI the prespeci�ed conditional density (here: normal)I the pre-�ltered vector of variances,

Σ = (σ1, . . . , σn) = fGARCH(y , θ)I the pre-�ltered residuals, A = (a1, . . . , an) = fARMA(y , θ)

and optimise.Conventional asymptotic properties, provided that ARMA and

GARCH processes are both stationary.FE15 Ts 3, p. 16/31

On estimation IICheck model: inequality restrictions needed, e.g.

α0 > 0, 0 ≤ α1 < 1, 0 < β < 1, α + β < 1

How can we impose these?

I Transformation of parameters. Example:

α0 = exp(α∗0) α∗0 = log(α0)

α1 =exp(α∗1)

1 + exp(α∗1)α∗1 =???

Now α∗0, α∗1 can be estimated without restrictions.

I Direct method to impose inequality restrictions in Ox: useMaxSQP(), MaxSQPF() instead of MaxBFGS()

I Important: Do check if parameter values valid, or act otherwiseFE15 Ts 3, p. 17/31

On estimation III

Initial conditions: What to do with pre-sample σ20, r0?

I For r0, use unconditional means (implied by θ), or samplemean

I For σ20, use unconditional variance (implied by θ), or samplevariance

I Alternatively, include them in vector of parameters?

Conditioning on presample-values gives conditional MLE.

Exact maximum likelihood is di�cult as the (unconditional) densityof the �rst observation of a sample r0 does not have a closed formexpression.

FE15 Ts 3, p. 18/31

Diagnostic testsQuestions:

I Do I need ARMA? (test yt for autocorrelation)

I Did I model ARMA correctly? (test at for autocorrelation)

I Do I need GARCH? (test a2t for autocorrelation)

I Did I model GARCH correctly? (test a2t /σ2t for

autocorrelation)

I Are residuals normally distributed?

Answers:

1. Check ACF

2. Check Lagrange Multiplier

3. Check Ljung-Box

4. Check Jarque-Bera for residual normalityFE15 Ts 3, p. 19/31

Diagnostics: ACF

5 10 15 20 25 30 35 40

ACF a2

5 10 15 20 25 30 35 40

ACF a2/σ2

SP500, 1990/01/02�2014/03/25, n = 6106 observations,MA(1)-GARCH(1,1)

Not a test, just visual...

FE15 Ts 3, p. 20/31

Diagnostics: LM-testLagrange multiplier test for autocorrelation:

xt = a0 + a1xt−1 + . . . amxt−m + et t = m, . . . , n

H0 :a1 = . . . am = 0 No autocorrelation in xt

H1 :Not H0

LM =(SSR0 − SSR1)/m

SSR1/(n − 2m − 1)≡ nR2

cH0∼ χ2(m)

Advantage:

I Only restricted model needs to be estimated (plus OLS)I If applied to a2t : Tests for (G)ARCH e�ects, or general

volatility e�ects.I If applied to ε2t : Tests for correct speci�cation of (G)ARCH

Q: Why n − 2m − 1, not n −m − 1?FE15 Ts 3, p. 21/31

Diagnostics: LB-test

Ljung-Box test for autocorrelation

H0 :ρj ≡ 0 No autocorrelation in xt

H1 :Not H0

QLB(m) = n(n + 2)m∑j=1

ρ2jn − j

H0∼ χ2(m)

Choose number of correlations wisely (...).

FE15 Ts 3, p. 22/31

Diagnostics: JB-test

Jarque-Bera test for normality

(sk2 +

4(k − 3)2

)H0∼ χ2(2)

sk =m3

Sample skewness

Sample kurtosis

H0 :xt ∼ N (µ, σ2) Normality of underlying series

H1 :Not H0

Could be useful for testing εt , GARCH-N or GARCH-t?

FE15 Ts 3, p. 23/31

Diagnostics: Results

Table: SP 500 autocorrelation tests

y 44.952 [0.00] 42.794 [0.00]a 36.702 [0.00] 35.263 [0.00]a2 1336.246 [0.00] 2436.435 [0.00]ε2 13.893 [0.02] 14.290 [0.01]

Table: SP 500 normality tests

JB sk k

y 19101.143 [0.00] -0.238 11.652a 19050.133 [0.00] -0.253 11.638ε 959.012 [0.00] -0.415 4.755

FE15 Ts 3, p. 24/31

GARCH alternatives

Many alternatives available, see Bollerslev (2010), Glossary to

ARCH (GARCH). 32 pages of acronyms and descriptions.

Here, three main alternatives:

I GARCH-t: Adapts for heavier tails found in practice

I GARCH-M: Allows volatility to in�uence mean return

I EGARCH: Generates asymmetric impact of news throughlog-volatilities

Modern alternative:

I Beta-t-GARCH (Harvey and Chakravarty, 2009) ≡ GAS-t(Creal, Koopman, and Lucas, 2013; Harvey, 2013)

FE15 Ts 3, p. 25/31

GARCH-t models

Scaled residuals εt = at/σt often have `fatter tails' than than thenormal distribution → Use tν-distribution with an unknown degreesof freedom ν.Write

σ2t = α0 + α1a2t−1 + β1σ

2t−1,

at ∼ σt

√ν − 2

νtν , ν > 2,

f (at | F t−1) = h(ν)1

a2t(ν − 2)σ2t

]−(ν+1)/2with h(ν) a constant function of ν (see book).Df ν can be estimated or �xed.

FE15 Ts 3, p. 26/31

GARCH-M model

Risk is costly → investors want compensation?Returns may have higher mean (M) when volatility increases, theso-called risk premium:

rt = µ+ cσ2t + at , at = σtεt .

orrt = µ+ cσt + at , at = σtεt .

This e�ect usually is not very strong as it induces serial correlationin the returns. This serial correlation usually is not verypronounced. The risk premium is more clearly identi�ed in a crosssection analysis of returns.

FE15 Ts 3, p. 27/31

EGARCH model: log volatility and asymmetryWhat do you like more, negative (at = σtεt < 0) or positive(at > 0) shock? → asymmetric impact of news.EGARCH(1,1) model:

log(σ2t ) = α0 + g(εt−1) + α1 log(σ2t−1)

g(εt) = θεt + γ [|εt | − E (|εt |)]

I Models log volatilityI Relates to scaled innovation εt instead of innovation atI Additionally allows for e�ect of |εt |I Positive shock: E�ect θ + γ, negative: θ − γI Could use additional lags of g(εt) and log(σ2t )I Forecasting σ2 very non-linear. Multi-step not analytically

availableFE15 Ts 3, p. 28/31

Looking back

What did we do?

I Introduce concept of volatility

I Linked it to maximum likelihood

I Looked at moments

I Tested

I Discussed alternatives

What will you do?

I Link discussion to Tsay (2010)

I Think of what an AR(1)-GARCH(1,1) model would look like,how to estimate

I Try it out...

FE15 Ts 3, p. 29/31

Bibliography

Bollerslev, Tim (1986). �Generalized Autoregressive ConditionalHeteroskedasticity�. In: Journal of Econometrics 31.3, pp. 307�327.DOI: 10.1016/0304-4076(86)90063-1.� (2010). �Glossary to ARCH (GARCH)�. In: Volatility and Time

Series Econometrics: Essays in Honor of Robert F. Engle. Ed. byTim Bollerslev, Je�rey Russell, and Mark Watson. Oxford: OxfordUniversity Press. DOI:10.1093/acprof:oso/9780199549498.003.0008.Creal, Drew, Siem Jan Koopman, and André Lucas (2013).�Generalized Autoregressive Score Models with Applications�. In:Journal of Applied Econometrics 28.5, pp. 777�795. DOI:10.1002/jae.1279.

FE15 Ts 3, p. 30/31

Bibliography

Engle, Robert F. (1982). �Autoregressive ConditionalHeteroscedasticity with Estimates of the Variance of UnitedKingdom In�ation�. In: Econometrica 50, pp. 987�1008. URL:http://www.jstor.org/stable/1912773.Harvey, Andrew C. (2013). Dynamic Models for Volatility and

Heavy Tails. Cambridge: Cambridge University Press.Harvey, Andrew C. and Tirthankar Chakravarty (2009).Beta-t-(E)GARCH. Tech. rep. Update from CWPE 0840. Universityof Cambridge.Monahan, John F. (2011). Numerical Methods of Statistics.2nd ed. Cambridge series on statistical and probabilisticmathematics. Cambridge: Cambridge University Press. DOI:10.1017/CBO9780511977176.Tsay, Ruey S. (2010). Analysis of Financial Time Series. 3rd. NewJersey: John Wiley & Sons. URL: http://onlinelibrary.wiley.com/book/10.1002/9780470644560.

FE15 Ts 3, p. 31/31

Volatility modelling - Financial Econometrics VU Bachelor...

Documents