+ All Categories
Home > Documents > Module 3 GARCH Models

Module 3 GARCH Models

Date post: 23-Feb-2016
Category:
Upload: dareh
View: 154 times
Download: 13 times
Share this document with a friend
Description:
Module 3 GARCH Models. References The classics: • Engle, R.F. (1982), Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K, Econometrica. • Bollerslev, T.P. (1986), Generalized Autoregresive Conditional Heteroscedasticity, Journal of Econometrics. - PowerPoint PPT Presentation
Popular Tags:
66
Module 3 GARCH Models
Transcript
Page 1: Module 3 GARCH Models

Module 3

GARCH Models

Page 2: Module 3 GARCH Models

References The classics:• Engle, R.F. (1982), Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K, Econometrica.

• Bollerslev, T.P. (1986), Generalized Autoregresive Conditional Heteroscedasticity, Journal of Econometrics.

Introduction/Reviews:• Bollerslev T., Engle R. F. and D. B. Nelson (1994), ARCH Models, Handbook of Econometrics Vol. 4.

• Engle, R. F. (2001), GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics, Journal of Economic Perspectives. 

Page 3: Module 3 GARCH Models

• Until the early 1980s econometrics had focused almost solely on modeling the means of series -i.e., their actual values.

yt = Et(yt |x) + εt , εt ῀ D(0,σ2)For an AR(1) process:

Et-1 (yt|x) = Et-1 (yt) = α + β yt-1 Note:

E(yt) = α/(1-β) and Var(yt) = σ2/(1-β2)

The conditional first moment is time varying, though the unconditional moment is not!

Key distinction: Conditional vs. Unconditional moments.

• Similar idea for the varianceUnconditional variance: Var(yt ) = E[(yt –E[yt])2] = σ2/(1-β2) Conditional variance: Vart-1 (yt ) = Et-1[(yt –Et-1[yt])2] = Et-1[εt

2]

Page 4: Module 3 GARCH Models

Vart-1 (yt ) is the true measure of uncertainty at time t-1.

mean

variance

Conditional variance

Page 5: Module 3 GARCH Models

• Stylized Facts of Asset Returns

i) Thick tails - Mandelbrot (1963): leptokurtic (thicker than Normal)

ii) Volatility clustering - Mandelbrot (1963): “large changes tend to be followed by large changes of either sign.”

iii) Leverage Effects – Black (1976), Christie (1982): Tendency for changes in stock prices to be negatively correlated with changes in volatility.

iv) Non-trading Effects, Weekend Effects – Fama (1965), French and Roll (1986) : When a market is closed information accumulates at a different rate to when it is open –for example, the weekend effect, where stock price volatility on Monday is not three times the volatility on Friday.

Page 6: Module 3 GARCH Models

vi) Volatility and serial correlation – LeBaron (1992): Inverse relationship between the two.

vii) Co-movements in volatility – Ramchand and Susmel (1998): Volatility is positively correlated across markets/assets.

v) Expected events – Cornell (1978), Patell and Wolfson (1979), etc: Volatility is high at regular times such as news announcements or other expected events, or even at certain times of day –for example, less volatile in the early afternoon.

Page 7: Module 3 GARCH Models

Figure: Descriptive Statistics and Distribution for EUR/ROL changes.

-2 -1 0 1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 DensityHistogram of Returns together with the Normal and Return Density

Statistic t-Test P-Value

Skewness 1.0472 15.922 4.4605e-057

Excess Kurtosis

8.5138 64.769 0.00000

Jarque-Bera 4432.9  

• Easy to check leptokurtosis (Stylized Fact #1)

Page 8: Module 3 GARCH Models

• Easy to check Volatility Clustering (Stylized Fact #2)

Page 9: Module 3 GARCH Models

-6

-4

-2

0

2

4

6

2/08/94 1/09/96 12/09/97

A rgentina/P esos

-2

-1

0

1

2

2/08/94 1/09/96 12/09/97

A rgentina/Dollars

-60

-40

-20

0

20

40

60

2/08/94 1/09/96 12/09/97

B razil

-6

-4

-2

0

2

4

2/08/94 1/09/96 12/09/97

Chile

-10

-5

0

5

10

2/08/94 1/09/96 12/09/97

Hong K ong

-20

-10

0

10

20

30

40

2/08/94 1/09/96 12/09/97

Mexico

Changes in interest rates: Argentina, Brazil, Chile, Mexico and HK

Page 10: Module 3 GARCH Models

ARCH Model - Engle(1982) Auto-Regressive Conditional Heteroskedasticity

2

1

2211

2

2

)()()(

),0(~

LEVar

NXYq

iitittttt

ttttt

22ttdefine t

ttLt 22 )(

• This is an AR(q) model for squared innovations. This model cleverly estimates the unobservable (latent) variance.

• Note: Since we are dealing with a variance

ialli 00

Page 11: Module 3 GARCH Models

• Even though the errors may be serially uncorrelated they are not independent: there will be volatility clustering and fat tails.

• Define standardized errors:

tttz /

They have conditional mean zero and a time invariant conditional variance equal to 1. That is, zt ~ D(0,1).

• If, as assumed above, zt is assumed to be time invariant, with a finite fourth moment (use Jensen’s inequality):

.3)(/)()(

)(3)()()()()()()(224

22224224444

ttt

tttttttt

EE

EEzEEzEEzEE

If we assume a normal distribution, the 4th moment for an ARCH(1):

.13)31/()1(3)( 222 ift

Page 12: Module 3 GARCH Models

More convenient, but less intuitive, presentation of the ARCH(1) model:

ttt 2

where υt is iid with mean 0, and Var[υt]=1.

Since υt is iid, the:

211

21

21

221

21 )()()()( tttttttttt EEEE

It turns out that σt2 is a very persistent process. Such a process can be

captured with an ARCH(q), where q is large. This is not efficient.

Page 13: Module 3 GARCH Models

GARCH – Bollerslev (1986)In practice q is often large. A more parsimonious representation is the Generalized ARCH model or GARCH(q,p):

jtj

p

jiti

q

it

2

1

2

1

2

22 )()( LL

tttdefine 22

tt LLL )())()(( 22

which is an ARMA(max(p,q),p) model for the squared innovations.

Page 14: Module 3 GARCH Models

This is covariance stationary if all the roots of

1)()( LL

lie outside the unit circle. For the GARCH(1,1) this amounts to

111

• Bollerslev (1986) showed that if 3α12 + 2α1β1 + β1 2 < 1, the second

and 4th moments of εt exist:

0)321()321)(1(

)1(3)(

)1/()(

2111

212

1112

111

112

4

112

ifE

E

t

t

Page 15: Module 3 GARCH Models

• Forecasting and PersistenceConsider the forecast in a GARCH(1,1) model

)( 12

122

12

12

1 ttttt z

Taking expectation at time t

)1()( 1122

1 tttE

By repeated substitutions:

jt

j

i

ijttE )(])([)( 11

21

011

2

As j→∞, the forecast reverts to the unconditional variance: ω/(1-α1-β1).

jE tjtt 22 )(

• When α1+β1=1, today’s volatility affect future forecasts forever:

Page 16: Module 3 GARCH Models

Nelson’s (1991) EGARCH modelNelson, D.B. (1991), "Conditional Heteroskedasticity in Asset Returns: A New Approach," Econometrica.

p

jjtj

q

iitititit zEzz

1

2

1

2 )log(|))||(|()log(

jtj

p

jititi

q

iiti

q

it I

2

1

2

1

2

1

2 *

GJR-GARCH modelGlosten, L.R., R. Jagannathan and D. Runkle (1993), "Relationship between the Expected Value and the Volatility of the Nominal Excess Return on Stocks," Journal of Finance.

• Both models capture sign (asymmetric) effects in volatility: Negative news increase the conditional volatility (leverage effect).

where It-i=1 if εt-i<0; 0 otherwise.

Page 17: Module 3 GARCH Models

Non-linear ARCH model NARCH Higgins and Bera (1992) and Hentschel (1995)

These models apply the Box-Cox transformation to the conditional variance.

jtj

p

jiti

q

it

11

||

• The variance depends on both the size and the sign of the variance which helps to capture leverage type (asymmetric) effects.

Special case: γ=2 (standard GARCH model).

Page 18: Module 3 GARCH Models

Threshold ARCH (TARCH) Rabemananjara, R. and J.M. Zakoian (1993), “Threshold ARCH Models and Asymmetries in Volatilities,”Journal of Applied Econometrics.

q

i

p

jjtjititiitit II

1 1

222 )))()((

Many other versions are possible by adding minor asymmetries or non-linearities in a variety of ways.

Large events to have an effect but no effect from small events

There are two variances:

)(,

)(,

1 1

222

1 1

222

it

q

i

p

jjtjitiit

it

q

i

p

jjtjitiit

if

if

Page 19: Module 3 GARCH Models

Switching ARCH (SWARCH) Hamilton, J. D. and R. Susmel (1994), "Autoregressive Conditional Heteroskedasticity and Changes in Regime," Journal of Econometrics.

• Intuition:- Hamilton (1989) models time series with changes of regime.

Simplest case: 2-state process.- Hamilton assumes the existence of an unobserved variable, st, that can

take two values: one or two (or zero or one). - Hamilton postulates a Markov transition matrix, P, for the evolution of

the unobserved variable:

p(st =1 | st-1 =1) = pp(st =2 | st-1 =1) = (1-p)p(st =1 | st-1 =2) = q p(st =2 | st-1 = 2) = (1-q)

Page 20: Module 3 GARCH Models

Reformulate ARCH(q) equation to make the conditional variance dependent on st –i.e., the state of the economy.

• A parsimonious formulation:

q

iitssisst tttt

1

2,,,

211

itt s

q

iitist

//1

22

For a SWARCH(1) with 2 states (1 and 2) we have 4 possible σt2:

2,2,/

1,2,/

2,1,/

1,1,/

12

112

12

112

122

112

12

112

222

122

11

111

tttt

tttt

tttt

tttt

ss

ss

ss

ss

Page 21: Module 3 GARCH Models

• The parameter γst=1 is set to 1. Then, the parameter γst=2 is a relative volatility scale parameter. If γst=2 =3, then volatility in the state 2 is three times higher than in state 1.

• Estimation of the model will estimate the volatility parameters, and the transition probabilities. As a byproduct of the estimation, we will also have an estimate for the latent variable –i.e., the “state.”

• In SWARCH models, the states refer to the states of volatility. For a 2-state example, we have “high,” or “low” volatility states.

Since we have an unobservable variable, estimation is usually done with a variation of the Kalman filter model.

Page 22: Module 3 GARCH Models

Integrated GARCH (IGARCH)The standard GARCH model

222 )()( LLt

is covariance stationary if

1)1()1(

• But strict stationarity does not require such a stringent restriction (That is, that the unconditional variance does not depend on t).

If we allow α1 + β1 =1, we have the IGARCH model.

• In the IGARCH model the autoregressive polynomial in the ARMA representation has a unit root: a shock to the conditional variance is “persistent.”

Page 23: Module 3 GARCH Models

• This is the Integrated GARCH model (IGARCH).

• Nelson (1990) establishes that, as this satisfies the requirement for strict stationarity, it is a well defined model.

• In practice, it is often found that α1 + β1 are close to 1.

• We may suspect that IGARCH is more a product of omitted structural breaks than the result of true IGARCH behavior. See Lamoreux and Lastrapes (1989) and Hamilton and Susmel (1994).

jE tjtt 22 )(

Today’s variance remains important for future forecasts of all horizons.

Page 24: Module 3 GARCH Models

• Recall the ARIMA(p,d,q) model(1-L)d α(Lp)yt=β(Lq)εt, where εt, is white noise.

When d=1, we have yt is an Integrated process. In time series, it is usually assumed that d=1,2,…D.

• But it can be any positive number, for example, 0<d<1. In this case, we have a fractionally integrated process, or ARFIMA. (See Granger and Joyeaux (1980).)

• d is called the fractional integration parameter.

• When d €{-1/2,1/2}, the series is stationary and invertible. Hoskings (1981).

FIGARCH Model Baillie, Bollerslev and Mikkelsen (1996), Journal of Econometrics.

Page 25: Module 3 GARCH Models

• Recall the ARMA representation for GARCH process:

22

2 ,))(1()1))((1(

ttt

tt

where

LLL

• Now, the FIGARCH process is defined as:

ttd LLL ))(1()1))((1( 2

• When d=0, we have a GARCH(q,p), when d=1, we have IGARCH.

• This model captures long-run persistent (memory).

• Similar intuition carries to the GARCH(q,p) model.

2

1

2

1

22 )()( tt

q

iiti

q

iitit LL

Page 26: Module 3 GARCH Models

• Questions1) Lots of ARCH models. Which one to use?2) Choice of p and q. How many lags to use?

• Hansen and Lunde (2004) compared lots of ARCH models:- It turns out that the GARCH(1,1) is a great starting model. - Add a leverage effect for financial series and it’s even better.

Page 27: Module 3 GARCH Models

All of these models can be estimated by maximum likelihood. First we need to construct the sample likelihood.

Since we are dealing with dependent variables, we use the conditioning trick to get the joint distribution:

Estimation: MLE

).;,,,..,|()....;,,|();|();,...,,( 11212121 θxxθxxθxθ 11T11 yyyfyyfyfyyyf TTT

Taking logs

).;,,,..,|(ln...);,,|(ln);|(ln);,...,,(ln

11

212121

θxxθxxθxθ

11T

11

yyyfyyfyfyyyf

TT

T

Page 28: Module 3 GARCH Models

Example: ARCH(1) model.

T

t

T

tttt

T

tt

TL1 1

211

2211

1

)/()log()2log(2

)log(

Taking derivatives with respect to θ=(ω,α,γ), where γ=K mean pars:

2211

1 1

2211 )(/)1()/(1

t

T

t

T

ttt

2211

1 1

21

2211

21

1

)(/)1()/(

t

T

t

T

ttttt

T

itttt

TL1

222 )/)log((21)2log(

2)log(

Assuming normality, we maximize with respect to θ the function:

2

1

/2 t

T

ttt

x

γ

Page 29: Module 3 GARCH Models

Note that the δϑ/δγ=0 (K f.o.c.’s) will give us GLS.

Denote δϑ/δθ=S(yt,θ)=0 (S(.) is the score vector)

We have a (K+2xK+2) system. But, it is a non-linear system. We willneed to use numerical optimization.

• Gauss-Newton or BHHH can be easily implemented.

• Given the AR structure, we will need to make assumptions about σ0 (and ε0,ε1 , ..εp if we assume an AR(p) process for the mean).

Alternatively, we can take σ0 (and ε0,ε1 , ..εp) as parameters to be estimated (it can be computationally more intensive and estimationcan lose power.)

Page 30: Module 3 GARCH Models

Note: The appeal of MLE is the optimal properties of the resulting estimators under ideal conditions.

Crowder (1976) gives one set of sufficient regularity conditions for the MLE in models with dependent observations to be consistent and asymptotically normally distributed.

Verifying these regularity conditions is very difficult for general ARCH models - proof for special cases like GARCH(1,1) exists.

For GARCH(1,1) model: if E(ln α1,zt2 +β1] < 0, the model is strictly

stationary and ergodic. See Lumsdaine (1992).

Page 31: Module 3 GARCH Models

If the conditional density is well specified and θ0 belongs to Ω, then

• Common practice in empirical studies: Assume the necessary regularity conditions are satisfied.

T

t

tt ySTAwhereANT1

0110

100

2/1 ),(),,0()ˆ(

• Under the correct specification assumption, A0=B0, where

T

ttttt ySySETB

100

10 ])',(),,([

The estimator B0 has a computational advantage over A0.: Only first derivatives are needed. But A0=B0 only if the distribution is correctlyspecified. This is very difficult to know in practice.

We estimate A0 and B0 by replacing θ0 by its estimated MLE value.

Page 32: Module 3 GARCH Models

• Block-diagonalityIn many applications of ARCH, the parameters can be partitioned into mean parameters, θ1, and variance parameters, θ2.

Then, δμt(θ)/δθ2=0 and, although, δσt(θ)/δθ1≠0, the Information matrix is block-diagonal (under general symmetric distributions for zt and for particular ARCH specifications).

Not a bad result:- Regression can be consistently done with OLS.- Asymptotically efficient estimates for the ARCH parameters can be obtained on the basis of the OLS residuals.

• But block diagonality can’t buy everything:- Conventional OLS standard errors could be terrible. - When testing for serial correlation, in the presence of ARCH, the conventional Bartlett s.e. – (1/n)-1- could seriously underestimate the true s.e.

Page 33: Module 3 GARCH Models

Estimation: QMLE• The assumption of conditional normality is difficult to justify in many empirical applications. But, it is convenient.

• The MLE based on the normal density may be given a quasi-maximum likelihood (QMLE) interpretation.

• If the conditional mean and variance functions are correctly specified, the normal quasi-score evaluated at θ0 has a martingale difference property:

E{δϑ/δθ=S(yt,θ0)}=0

Since this equation holds for any value of the true parameters, the QMLE, say θQMLE is Fisher-consistent –i.e., E[S(yT, yT-1,…y1 ; θ)] = 0 for any θ€Ω.

Page 34: Module 3 GARCH Models

• The asymptotic distribution for the QMLE takes the form:

).,0()ˆ( 100

100

2/1 ABANT QMLE

For symmetric departures from conditional normality, the QMLE is generally close to the exact MLE.

For non-symmetric conditional distributions both the asymptotic and the finite sample loss in efficiency may be large.

The covariance matrix (A0-1 B0 A0

-1) is called “robust.” Robust to departures from “normality.”

• Bollerslev and Wooldridge (1992) study the finite sample distribution of the QMLE and the Wald statistics based on the robust covariance matrix estimator:

Page 35: Module 3 GARCH Models

Estimation: GMM

• Suppose we have an ARCH(q). We need moment conditions:

0)]...1/([][)3(

0)]('[][)2(

0)]('[][)1(

12

3

222

1

qt

tt

t

EmE

EmE

yEmE

2t

tt

ε

γxx

Note: (1) refers to the conditional mean, (2) refers to the conditional variance, and (3) to the unconditional mean.

GMM objective function:

)];([)]';([)(^^

yX,θmWyX,θmθy;X, EEQ

where

]']'[]'[]'[[)];([ 3

^

3

^

1

^^mmmyX,θm EEEE

Page 36: Module 3 GARCH Models

• γ has K free parameters; α has q free parameters. Then, we have a=K+q+1 parameters.

• m(θ;X,y) has r=k+m+2 equations.

• Dimensions: Q is 1x1; E[m(θ;X,y)] is rx1; W is rxr.

• Problem is over-identified: more equations than parameters so cannot solve E[m(θ;X,y)]=0, exactly.

• Choose a weighting matrix W for objective function and minimize using nonlinear solver (for example, optmum in GAUSS).

• Optimal weighting matrix: W =[E[m(θ;X,y)]E[m(θ;X,y)]’]-1.

• Var(θ)=(1/T)[DW-1D’]-1,

where D = δE[m(θ;X,y)]/δθ’. (all these expressions evaluated at θ^.)

Page 37: Module 3 GARCH Models

TestingWhite’s (1980) general test for heteroskedasticity.Engle’s (1982) TR2~χ2

q

• In ARCH Models, testing as usual: LR, Wald, and LM tests.Reliable inference from the LM, Wald and LR test statisticsgenerally does require moderately large sample sizes of at least twohundred or more observations.

• Issues:- Non-negative constraints must be imposed. θ0 is often on theboundary of Ω. (Two sided tests may be conservative)- Lack of identification of certain parameters under H0, creating asingularity of the Information matrix under H0. For example, underH0:α1=0 (No ARCH), in the GARCH(1,1), ω and β1 are not jointlyidentified. See Davies (1977).

Page 38: Module 3 GARCH Models

Ignoring ARCHHamilton, J.D. (2008), “Macroeconomics and ARCH, Working paper, UCSD.

• Many macroeconomic and financial time series have an AR structure. What happens when ARCH effects are ignored?

Assume yt = γ0 + γ1 yt-1 + εt , where εt follows a GARCH(1,1) model.

Then, ignoring ARCH:

T

t

T

t

T

t

s1

12

1 1

1

)'(

)()'(

tt

^

tttt

xxV

yxxxγ

Assume the 4th moment exists, standard consistency give us

][][][1

][)'( 211

121

1

112

tt

tt

pT

t yEyEyE

ETsT tt

^xxV

Page 39: Module 3 GARCH Models

For simplicity assume γ0=0. Then, T1/2γ is approximately N(0,1). But,

T

t

T

t

TTsT1

2/1

1

1120

^)()'()( tttt xxx

Under H0: No ARCH, the second summation is a MDS with variance

][][][][

]'[ 21

21

1

tttt

tt

εεEεεEεεEεE

εE 2

22t

tt2t xx

Using CLT:

])[,0( 21

2

11

2/1

ttD

T

ttt ENεyT

To calculate the value of the variance, recall the ARMA(1,1) representation for GARCH(1,1) models:

Page 40: Module 3 GARCH Models

)1,0(,)( 112

1112 iidNttttt

For an ARMA(1,1):

1112

1

1111

22

224

222

212

221

2

)(21])(1[

)()()(][

tttt EEE

Then, after some substitutions:

)1(231

)1)(1(3)1(

),0(

112

12

1

111122

224

11

11

V

VNT D

Note: V11 ≥1, with equality iff α1=0. OLS treats T1/2γ^ as N(0,1), but the true asymptotic distribution is N(0,V11). OLS tests reject more often. As α1 and β1 get closer to μ4=∞, we reject even more.

Page 41: Module 3 GARCH Models

Figure 1. (From Hamilton (2008).) Asymptotic rejection probability for OLS t-test that autoregressive coefficient is zero as a function of GARCH(1,1) parameters α and δ. Note: null hypothesis is actually true and test has nominal size of 5%.

Page 42: Module 3 GARCH Models

• If the ARCH parameters are in the usual range found in estimates of GARCH models, an OLS t-test with no correction for heteroskedasticity would spuriously reject with arbitrarily high probability for a sufficiently large sample.

• The good news is that the rate of divergence is slow: it may take a lot of observations before the accumulated excesskurtosis overwhelms the other factors.

The solid line in Figure 2 plots the fraction of samples for which anOLS t test of γ1= 0 exceeds two in absolute value. Thinking we’reonly rejecting a true null hypothesis 5% of the time, we would do so15% of the time when T = 100 and 33% of the time when T = 1,000.

• White’s (1980) s.e. help. Newey-West’s (1987) s.e. help less.• Engle’s TR2 is very good. Better than White’s (1980), as expected

Page 43: Module 3 GARCH Models

Figure 2. From Hamilton (2008). Fraction of samples in which OLS t-test leads to rejection of the null hypothesis that autoregressive coefficient is zero as a function of the sample size for regression with Gaussian errors (solid line) and Student’s t errors (dashed line). Note: null hypothesis isactually true and test has nominal size of 5%.

Page 44: Module 3 GARCH Models
Page 45: Module 3 GARCH Models

ARCH in MEAN (G)ARCH-M Engle, R.F., D. Lilien and R. Robins (1987), “Estimating Time Varying Risk Premia in the Term Structure: the ARCH-M Model,” Econometrica.

• Finance theory suggests that the mean of a relationship will be affected by the volatility or uncertainty of a series.

ARCH in mean (ARCH-M) framework:

tttt xy 2

q

i

p

jjtjitit

1 1

222

The variance or the standard deviation are included in the mean relationship.

Page 46: Module 3 GARCH Models

The difference from the previous models ARCH/GARCH models is that the volatility enters also in the mean of the return.

This is exactly what Merton’s (1973, 1980) ICAPM producesrisk-return tradeoff. It must be the case that δ > 0.

• Again, we have a Davies (1977)-type problem.Let μt(θ)= μ + δσt(θ), with μ≠0 ,

δ is only identified if the conditional variance is time-varying. Thus, a standard joint test for ARCH effects and δ= 0 is not feasible.

Note: Block-diagonality does not hold for the ARCH-M model. Consistent estimation requires correct specification of cond. mean and variance. (And simultaneous estimation.)

Page 47: Module 3 GARCH Models

Non normality assumptionsThe basic GARCH model allows a certain amount of leptokurtosis.It is often insufficient to explain real world data.

Solution: Assume a distribution other than the normal which help to allow for the fat tails in the distribution.

• t Distribution - Bollerslev (1987)The t distribution has a degrees of freedom parameter which allows greater kurtosis. The t likelihood function is

)ln(5.0)))2(1()2()5.0())1(5.0(ln( 22/)1(12/11t

vtt vzvvvl

where Γ is the gamma function and v is the degrees of freedom.As υ→∞, this tends to the normal distribution.

• GED Distribution - Nelson (1991)

Page 48: Module 3 GARCH Models

Multivariate ARCH ModelsEngle, R.F. and K.F. Kroner (1993), Multivariate Simultaneous Generalized ARCH, working paper, Department of Economics, UCSD.

It is common in Finance to stress the importance of covarianceterms. The above model can handle this if y is a vector and weinterpret the variance term as a complete covariance matrix. Thewhole analysis carries over into a system framework:

tt Σ)'εεt

(tE

• From an econometric theory point of view, multivariate ARCH models add no problems. The log likelihood assuming normality is:

T

it TNL

1

}]|log{)2log([5.)log( t1

ttt εΣ'ε|Σ

Page 49: Module 3 GARCH Models

• Several practical issues:

-A direct extension of the GARCH model would involve a very large number of parameters (for 4 assets, we have to estimate 10 elements in Ωt).

-The conditional variance could easily become negative even when all the parameters are positive.

-The chosen parameterization should allow causality between variances.

- Covariances and Correlations: How to model them?

Page 50: Module 3 GARCH Models

Vector ARCHLet vech denote the matrix stacking operation

)( dbadbba

vech

A general extension of the GARCH model would then be

)()()()()( 1t1t1tt ΣB'εεAWΣ vechLvechLvech

This quickly produces huge numbers of parameters, for p=q=1 and n=5 there are 465 parameters to estimate here.

W is vector with T(T+1)/2 elements, A(L) and B(L) are squared matrices with T(T+1)/2xT(T+1)/2 elements. Total parameters: T(T+1)/2 +T2 (T+1)2/2.

Page 51: Module 3 GARCH Models

• One simplification used is the Diagonal GARCH model where A and B are taken to be diagonal, but this assumes away causality in variances and co-persistence. We need still more restrictions to ensure positive definiteness in the covariance matrix.

• A more tractable alternative: the BEKK model

q

i

p

j1 1jjtjiititit BΣB'Aε'εA'VV'Σ

• We can further reduce the parameterization by making A and B diagonal.

• V is a lower diagonal matrix with T(T+1)/2 parameters, A and B are squared matrices with N2 parameters each.

• BEKK guarantees p.d. for Σt, since it works with quadratic forms.

Page 52: Module 3 GARCH Models

Factor ARCHSuppose a vector of N series has a common factor structure. Such as:

ttt By where ξ are the common factors and

),0(~ N ttE )'(

then the conditional covariance matrix of y is given by

')(1 BByCov tttt

If Λt is diagonal with elements λkt or if the off-diagonal elements are constant and combined into Ψ, then, the model may be written as

k

iktiit

1

'

Page 53: Module 3 GARCH Models

So given a set of factors we may estimate a parsimonious model for the covariance matrix once we have parameterized λk.

• One assumption is that we observe a set of factors which cause the variance, then we can simply use these. For example, “the market,” liquidity, interest rates, exchange rates, etc.

• Another common assumption is that each factor has a univariate GARCH representation.

K

k

K

kktkkkttkkt

1 1111 )'()''(

• Application of Factor ARCH: Common Factors. (Engle and Kozicki (1993), Engle and Susmel (1993).

• Diebold and Nerlove (1989) use a factor ARCH structure, but with λk as a latent variable. (Estimation: Kalman filter.)

Page 54: Module 3 GARCH Models

Realized Volatility (RV) Models

French, Schwert and Stambaugh’s (1987) use higher frequencyto estimate the variance as:

k

itt r

ks

1

21

2 1

where rt is realized returns in days, and we estimate monthly variance.

• Model-free measure –i.e., no need for ARCH-family specifications.

• This method is used a lot for intra-daily data, called high frequency (HF) data.

• Very popular to calculate intra-day or daily volatility. For example, based on TAQ data, say, 1’ or 10’ realized returns we can calculate the daily variance, or realized volatility, RVt:

Page 55: Module 3 GARCH Models
Page 56: Module 3 GARCH Models

• RV is affected by microstructure effects: bid-ask bounce, infrequent trading, calendar effects, etc.. For example, the bid-ask bounce induces serial correlation in intra-day returns, which biases RVt. (Big problem!)

-Proposed Solution: filter the intra-day returns using MA or AR models before constructing RV measures.

TtrRVM

jjtt ,....,2,1,

1

2,

Where rt,j is jth interval return on day t. That is, RV is defined as the sum of intraday returns.

• We can use time series models –say an ARIMA- for RVt to forecast daily volatility.

Page 57: Module 3 GARCH Models

• The key problem is the choice of sampling frequency (or number of observations per day).

— Bandi and Russell (2003) propose a data-based method for choosing frequency that minimizes the MSE of the measurement error.— Simulations and empirical examples suggest optimal sampling is around 1-3 minutes for equity returns.

• Under some conditions (bounded kurtosis and 1 autocorrelation of squared returns less than 1), RVt is consistent and m.s. convergent.• Realized volatility is a measure. It has a distribution.• For returns, the distribution of RV is non-normal (as expected). It tends to be skewed right and leptokurtic. For log returns, the distribution is approximately normal.• Daily returns standardized by RV measures are nearly Gaussian.• RV is highly persistent.

Page 58: Module 3 GARCH Models

• The Parkinson’s (1980) estimator: s2

t={Σt (ln(Ht)-ln(Lt)2 /(4ln(2)T)},where Ht is the highest price and Lt is the lowest price.

• There is an RV counterpart, using HF data: Realized Range (RR): RRt={Σj 100x(ln(Ht,j)-ln(Lt,j)2 /(4ln(2)},

where Ht,j and Lt,j are the highest and lowest price in the jth interval.

• These “range estimators are very good and very efficient.

Reference: Christensen and Podolskij (2005).

• Another method: AR model for volatility:

ttt |||| 1

The εt are estimated from a first step procedure -i.e., a regression.

Make sure that the estimates are positive.

Page 59: Module 3 GARCH Models

Stochastic volatility (SV/SVOL) models

• The difference with ARCH models: The shocks that govern the volatility are not necessarily εt’s.

),0(~;1 Ntttt

Jacquier, E., Polson, N., Rossi, P. (1994), Bayesian analysis of stochastic volatility models, Journal of Business and Economic Statistics. (Estimation)Heston, S.L. (1993), A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Stu. (Theory)

ttt 1loglogOr using logs:

• We have 3 SVOL parameters to estimate: φ=(ω,β,σv).

Page 60: Module 3 GARCH Models

• This is really a discretization of a continuous-time model, where themean and the variance follow two OU processes.

• SVOL models can be estimated by MLE, QMLE or other methods. In general, Bayesian methods (Gibbs sampling, MCMC Models).

• Brief Review: Bayesian Estimation

Idea: We are not estimating a parameter value (θ), but rather updating and sharpening our subjective beliefs about θ.

• The centerpiece of the Bayesian methodology is Bayes’ theorem:P(A|B) = P(A ∩B)/P(B) = P(B|A) P(A)/P(B).

Think of B as “something known” –for example, the data- and A as “something unknown” –e.g., the coefficients of a model.

• Our interest: Value of the parameters (θ), given the data (y).

Page 61: Module 3 GARCH Models

• We can write:P(θ|y) = P(y|θ) P(θ)/P(y) (“Bayesian learning”)

• For estimation, we can ignore the term P(y), since the data do not depend on the parameters. Then, we can write:

P(θ|y) ∞ P(y|θ) P(θ)

• Terminology:- P(y|θ) : Density of the data, given the parameters (“likelihood function”).- P(θ): Prior density of the parameters. Prior belief of the researcher. - P(θ|y): Posterior density of the parameters, given the data. (A mixture of the prior and the “current information” from the data.)

Note: Posterior is proportional to likelihood times prior.

• Prior information is a controversial aspect of Bayesian econometrics since it sounds unscientific. Where do they come from?

Page 62: Module 3 GARCH Models

• Prior and the likelihood are needed to get the posterior.

• Once we get more data, the posterior becomes the prior and we update again.

• The calculations involved in Bayesian analysis can be burdensome.

• Priors can have any form. However, it is common to choose particular classes of priors which are easy to interpret and/or make computation easier.

• Conjugate prior: prior and posterior both have same class of distributions.

• Prior can be interpreted as arising from an imaginary data set from the same process which generated the actual data.

Page 63: Module 3 GARCH Models

Example: Linear Model y = Xβ + ε- Suppose that the data is normal –i.e., f(y|β,σ,X) = N(Xβ,σ2I).- X’s are fixed.- Assume β|σ2 ~ N(m,σ2A). - Assume σ is known (to simplify).

Note: m represents best guess for β, before seeing y and X. A represents the confidence in the guess.

Recall that we can writey - Xβ = (y – Xb) - X(β- b) (b = (X’X)-1X’y)(y–Xβ)’(y–Xβ) = (y–Xb)’(y–Xb)+(β- b)’X’X(β- b)-2(β- b)’X’ (y– Xβ)

= υs2+(β- b)’X’X(β- b)

where s2 = SSE = (y–Xb)’(y–Xb)/(T-k); and υ=(T-k).

Page 64: Module 3 GARCH Models

)})(()'(2

exp{||

)}(2

exp{||x)}()'(2

exp{

)()|(),|(),(),|(),|(

12/112/

2/12/2/

222222

m*βAXX'm*βXX'A

m(βAm)'βAXβyXβy 1

hh

hhhh

ffXffXfXf

k

kT

• Then, the posterior is:

• The likelihood can be written as:

}2

exp{)1x(}2

1exp{)/1()2(),( 2

22/

222/22/2

sf kT b)X(βX'b)'(ββX,|y

The likelihood can be written as a product of a normal and a density of form f(θ) = κ θ-λ exp{-λ/θ}.

This is called an inverted gamma (inverse of a χ2) distribution.Note: Bayesians work with h = 1/ σ2, which they called “precision.” A gamma prior is usually assumed for h.

Page 65: Module 3 GARCH Models

• Note II: We had to specify the priors and the distribution of the data. If we change any of these two assumptions, the results would change.

We get exact results, because we made distributional assumptions on the data.

• In other words, the pdf of β, conditioning on the data, is normal with mean m* and variance matrix σ2 (X’X+A-1)-1.

• Note I: If we have a large variance A (a “diffuse prior”), our prior, m, will have a lower weight. As A→∞, m* →(X’X)-1X’y (OLS!)

• We can do the similar calculations when we impose another prior on σ. But, the results would change.

where m*= (A-1+X’X)-1(A-1 m + X’y)

(See Hamilton (1994), Chapter 12)

Page 66: Module 3 GARCH Models

SVOL Estimation is based on the idea of hierarchical structure:- f(y|σt

2) (distribution of the data given the volatilities)- f(σt

2|φ) (distribution of the volatilities given the parameters)- f(φ) (distribution of the parameters)Goal: To estimate the joint f(σt

2,φ|y) (“posterior”)

Priors (Beliefs):Normal-Gamma for f(φ). (Standard Bayesian regression model)Inverse-Gamma for f(σv) (exp(-vs0

2/2σ)/σv+1).Normals for ω,β.Impose (assume) stationarity of σt

2. (Truncate β as necessary)Algorithm: MCMC (JPR (1994).)Augment the parameter space to include σt

2. Using a proper prior for f(σt

2,φ) the MCMC provides inference about the joint posterior f(σt

2,φ|y).

Classic reference: Andersen (1994), Mathematical Finance.Application to interest rates: Kalimipalli and Susmel (2004), JEF.


Recommended