Regime–Switching Models - WordPress.com...modelling of time series subject to shifts in regime....

Regime–Switching Models

HANS-MARTIN KROLZIG

Department of EconomicsandNuffield College,University of Oxford.

[email protected]

Hilary Term 2002

The course offers an introduction to regime-switching models, covering their theoretical prop-erties and the statistical tools for empirical research (including maximum likelihood estima-tion, model evaluation, model selection and forecasting). With the Markov-switching vectorautoregressive model, it presents a systematic and operational approach to the econometricmodelling of time series subject to shifts in regime. The theory will be linked to empiricalstudies of the business cycle, using MSVAR for OX.

Course structure

(1) Introduction(2) Types of regime-switching models

(Assumptions, properties and estimation)

• Structural change and switching regression models• Threshold models• Smooth transition autoregressive models• Markov-switching vector autoregressions

(3) Assessing business cycles with regime-switching models(Markov-switching VECM of the UK labour market)

(4) Prediction and structural analysis with regime-switching models

1

Basic literature

• Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary timeseries and the business cycle,Econometrica, 57, 357–384.

Hamilton, J.D. (1994).Time Series Analysis.Princeton: Princeton University Press.Chapter 22.

• Hansen, B. (1999), Testing for Linearity,Journal of Economic Surveys, 13, 551–576.• Krolzig, H.-M., Marcellino, M. and G. E. Mizon, A Markov–Switching Vector Equilib-

rium Correction Model of the UK Labour Market,Empirical Economics, forthcoming. Potter, S. (1999), Nonlinear time series modelling: An introduction,Journal of Eco-

nomic Surveys, 13, 505–528.• Terasvirta, T. (1994). Specification, estimation, and evaluation of smooth transition

autoregressive models,Journal of the American Statistical Association, 89, 208–218.

Monographies

Franses, H.P. and D. van Dijk (2000).Nonlinear Time Series Models in Empirical Fin-ance, Cambridge: Cambridge University Press.

Granger, C.W.J. and T. Teräsvirta (1993). Modelling Nonlinear Economic Relation-ships, Oxford, Oxford University Press.

Kim, C.J. and C.R. Nelson (1999).State-Space Models with Regime Switching, Cam-bridge, MA: MIT Press.

Krolzig, H.-M. (1997). ‘Markov-Switching Vector Autoregressions. Modelling, Statist-ical Inference and Application to Business Cycle Analysis’,Lecture Notes in Economicsand Mathematical Systems, Volume 454, Berlin: Springer.

2

1 Introduction

1.1 Linear time series models

Since Sims (1980) critique of traditional macroeconometric modeling, vector autoregressive(VAR) models are widely used in macroeconometrics. Their popularity is due to the flexib-ility of the VAR framework and the ease of producing macroeconomic models with usefuldescriptive characteristics, within statistical tests of economically meaningful hypothesis canbe executed. Over the last two decades VARs have been applied to numerous macroeconomicdata sets providing an adequate fit of the data and fruitful insight on the interrelations betweeneconomic data.

In the vector autoregressive model, theK-dimensional time series vectoryt = (y1t, . . . , yKt)′

is generated by a vector autoregressive process of orderp

yt = ν + A1yt−1 + · · ·Apyt−p + εt (1)

wheret = 1, . . . , T , theν is a vector of intercepts andAi are coefficient matrices. The errorprocessεt = (ε1t, . . . , εKt)′ is an unobservable, usually Gaussian, zero-mean white noiseprocess,

εt ∼ WN(0,Σ).

that is,E[εt] = 0, E[εtε′t] = Σ, andE[εtε

′s] = 0 for s 6= t, where the variance-covariance

matrixΣ is time-invariant, positive-definite and non-singular.

The errors are such that the innovations can be interpreted as the one-step prediction errors ofthe system

εt = yt − E[yt|Yt−1],

while the expectation ofyt conditional on the information setYt−1 = (yt−1, yt−2, . . . , y1−p)is given by the vector autoregression:

E[yt|Yt−1] = ν +p∑

j=1

Ajyt−j.

Although, in the past macroeconomic fluctuations and growth have been largely investigatedusing linear time series models, it is now increasingly recognized that the implications of thelinear models

• linearity (invariance of dynamic multipliers with regard to the history of the system, sizeand sign of the shocks)

• time-invariance of parameters• Gaussianity

are problematic and that a better understanding requires new econometric tools. Consequentlythere has been a great deal of interest in the modelling of non-linearities in economic timeseries.

3

1.2 Regime-switching models

While the importance of regime shifts seems to be generally accepted, there is no establishedtheory suggesting a unique approach for specifying econometric models that embed changesin regime. Increasingly, regime shifts are not considered as singular deterministic events, butthe unobservable regime is assumed to be governed by an exogenous stochastic process. Thusregime shifts of the past are expected to occur in the future in a similar fashion.

When a time series is subject to regime shifts, the parameters of the statistical model willbe time-varying. Thebasic idea of regime-switching modelsis that the process is time-invariant conditional on a regime variablest indicating the regime prevailing at timet. Regime-switching models characterize a non-linear data generating process as piecewise linear by re-stricting the process to be linear in each regime, where the regime might be unobservable, andonly a discrete number of regimes are feasible. The models within this class differ in theirassumptions concerning the stochastic process generating the regime.

The primary objective ofregime-switching modelsis to provide a systematic econometric ap-proach for the statistical analysis of multiple time series when the mechanism which generatedthe data is subject to regime shifts:

(i) extracting the information in the data about regime shifts in the past,(ii) estimating the parameters of the model consistently and efficiently,(iii) detecting recent regime shifts,(iv) correcting the vector autoregressive model at times when the regime alters,(v.) incorporating the probability of future regime shifts into forecasts.

Regime-switching models studied represent a very general class which encompasses somealternative non-linear and time-varying models. In general, the model generate conditionalheteroscedasticity and non-normality; prediction intervals are asymmetric and reflect the pre-vailing uncertainty about the regime.

We will investigate the issues of detecting multiple breaks in multiple time series, modelling,specification, estimation, testing and forecasting. En route, we discuss the relation to altern-ative non-linear models and models with time-varying parameters. In course of this study wewill also propose new directions to generalize the MS-VAR model. Although some methodo-logical and technical ideas are discussed in detail, the focus is on modelling, specification andestimation of suitable models.

4

1.2.1 Regime shifts

Characteristics

finite number — infinite number

deterministic — stochastic

single event — reoccurring within sample — reoccurring out of sample

observable — observable if DGP is known — unobservable even if DGP is known

(strongly) exogenous — endogenous

permanent — persistent — transitory

predictable — unpredictable

common — interrelated — independent

Granger causal — Granger noncausal

Implications

nonlinearity

time-varying parameters

non-Gaussianity

5

1.2.2 The Conditional Process

The statistical model ofyt defined conditional upon the regimest ∈ 1, . . . ,M. :

p(yt|Yt−1,Xt, st) =

f(yt|Yt−1,Xt, θ1) if st = 1

...f(yt|Yt−1,Xt, θM ) if st = M.

wherep(yt|Yt−1,Xt, st) is the probability density function of the vector of endogenous vari-ablesyt = (y1t, . . . , yKt)′ conditional upon the history of the process,Yt−1 = yt−i∞i=1,

some (strongly) exogenous variablesXt = xt−i∞i=0 and the regime variablest.. θm is theparameter vector present in regimem.

It is usually assumed that the statistical model is linear in each regime, sayst = m. In thefollowing we focus on autoregressive processes

yt = νm + αm1yt−1 + . . . + αmpyt−p + εt, εt ∼ IID(0, σ2m),

and their multivariate generalization: the vector autoregressive (VAR) process

yt = νm + Am1yt−1 + . . . + Ampyt−p + εt, εt ∼ IID(0,Σm).

1.2.3 The Regime Generating Process

If the stochastic process ofyt is defined conditionally upon the (unobservable) regimest,a complete description of the data generating mechanism requires the specification of thestochastic process which generates the regime:

Pr(st|Yt−1, St−1,Xt; ρ)

where the historySt−1 = st−j∞j=1 of the state variable might be unobserved but will be“reconstructed” from the observations and the vectorρ collects the parameters of the regimegenerating process.

6

2 Types of regime-switching models

2.1 Structural change and switching regression models

2.1.1 Structural break models

Structural break at timet = τ :

yt =

ν1 +

∑pi=1 α1iyt−i + εt for t < τ

ν2 +∑p

i=1 α2iyt−i + εt for t ≥ τ(2)

whereεt ∼ IID(0, σ2). By using theindicator function I (t; τ) :

I(t; τ) =

1 for t > τ

0 for t ≤ τ.

the DGP can be rewritten as

yt =

(ν1 +

p∑i=1

α1iyt−i

)(1 − I (t; τ)) +

(ν2 +

p∑i=1

α2iyt−i

)I (t; γ) + εt.

Two different assumptions regarding the information structure

• τ is known: break is deterministic• τ is unknown: break is stochastic

2.1.2 Switching regression model

Closely related to the structural change model is the switching regression model, where theregime shifts are driven by an observable regime variablest:

yt =

(ν1 +

p∑i=1

α1iyt−i

)(1 − I (st = 1)) +

(ν2 +

p∑i=1

α2iyt−i

)I (st = 2) + εt. (3)

7

2.1.3 Maximum likelihood estimation under normality

Structural break at timet = τ :

yt =

ν1 +

∑pi=1 α1iyt−i + εt for t < τ

ν2 +∑p

i=1 α2iyt−i + εt for t ≥ τ

whereεt ∼ NID(0, σ2).

Two different assumptions regarding the information structure

• τ is known: break is deterministic

– Estimation: Split sample andOLS for each regime;– Test ofβ1 = β2 has standard asymptotics; whereβm = (νm, α1, . . . , αp).– The same technique can be used for switching regression models.

• τ is unknown: break is stochastic

– Grid search forτ ∈ [0.15, 0.85]T :

τ∗ = arg minτ

RSS(τ)

= arg minτ

τ σ21(τ) + (1 − τ)σ2

2(τ)

– Test ofβ1 = β2 has non-standard asymptotics asτ becomes nuisance variable.– See,inter alia, Andrews (1993), and Andrews and Ploberger (1994) and Banerjee,

Lazarova and Urga (1998).

8

2.2 Threshold models

2.2.1 The TAR model

In the threshold autoregressive model, the regime shifts are triggered by an observable, exo-genoustransition variable xt crossing thethreshold c:

yt =

(ν1 +

p∑i=1

α1iyt−i

)(1 − I (xt; c)) +

(ν2 +

p∑i=1

α2iyt−i

)I (xt; c) + εt (4)

whereεt ∼ IID(0, σ2). The indicator function I (xt; c) is of the type

I(x; c) =

1 if g(xt) > c

0 if g(xt) ≤ c.

Forxt = t a model with a structural break at timet = c occurs

2.2.2 The SETAR model

If the transition variable is a lagged endogenous variableyt−d with delay d > 0, the self-exciting threshold autoregressive model results:

yt =

(ν1 +

p∑i=1

α1iyt−i

)(1 − I (yt−d; c)) +

(ν2 +

p∑i=1

α2iyt−i

)I (yt−d; c) + εt (5)

whereεt ∼ IID(0, σ2). . c is again thethreshold.

Note that the model can be written as:

yt = ν(st) +p∑

i=1

αi(st)yt−i + εt

where for a given but unknown thresholdc, the ‘probability’ of the unobservable regime, sayst = 2 is given by

Pr (st = 1|St−1, Yt−1) = I (yt−d; c) =

1 if g(yt−d) > c

0 if g(yt−d) ≤ c.

Thus in theself-exciting threshold autoregressive (SETAR) model, the regime-generating pro-cess is not assumed to be exogenous but directly linked to the lagged endogenous variableyt−d. While the presumptions of the SETAR and the MS-AR model seem to be quite different,the relation between both model alternatives is rather close. Actually, SETAR and MS-VARmodels can be observationally equivalent as illustrated in Carrasco (1994).

9

SETAR Models of US GNP of Tiao and Tsay (1994) and Potter (1993)

Quarterly growth rate of U.S. GNP,∆yt:

∆yt = µ(st) +5∑

i=1

αi(st)∆yt−i + ut, ut ∼ IID(0, σ2(st))

2-regime SETAR withd = 2.

Empirical models:

• Thresholdr ≈ 0: st =

1 if ∆yt−2 > r

2 if ∆yt−2 ≤ r

• Moving swiftly out of recessions:α(L)2 << 0

2.2.3 Maximum likelihood estimation under normality

(i) For given delayd, and thresholdc :

• Sample split according toI(yt−d; c).• OLS regression for each regime separately:

βm = (X′mXm)−1X′

mym

em =[I − Xm(X′

mXm)−1X′m

]ym

σ2m = T−1

m e′mem

whereXm andym collect the observations from regimem, i.e. those observationsat timet with st = m. Tm is the number of observations in regimem.

• Alternative indicator functions can be used in a single regression, constraining theresidual error variance to be constant across regimes (see, for example, Potter,1993, p.113.).

(ii) Grid search overd andc: select the pair (c, d) that minimizes the overall residual sumof squares (RSS)

(c, d)∗ = arg min(c,d)

RSS(c, d) = arg min(c,d)

M∑m=1

Tmσ2m

Usually the search overc (givend) is restricted such that

min Tm ≥ 0.15T.

(iii) When p is unknown, fit is usually traded against parsimony. A search is made over allvalues ofp ≤ pmax, and the preferred order is often taken to be that which minimizesAIC.

p∗ = arg minp

AIC(p) =

M∑m=1

Tm ln σ2m + 2 (p + 1)

.

Tsay (1989) describes a specification procedure for threshold models.

10

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995

-2

0

2

4Actual and fitted values from an AR(3), 1948:1 - 1990:4

actual fitted

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995

-2

0

2

4Actual and fitted values from an AR(2), 1959:4 - 1996:2

actual fitted

Figure 1 Linear AR model of US GNP growth.

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995

-2

0

2

4Actual and fitted values from a SETAR(2;2,2), 1947:4 - 1990:4

actual fitted

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995

-2

0

2

4Actual and fitted values from SETAR(2;2,2), 1959:4 - 1996:2

actual fitted

Figure 2 SETAR model of US GNP growth.

11

2.3 Smooth transition autoregressive models

2.3.1 The STAR model

In thesmooth transition autoregressive modelpopularized by Granger and Teräsvirta (1993),the weight attached to the regimes depends on the realization of exogenous or lagged endo-genous variableszt:

Pr(st = 2|St−1, Yt−1,Xt) = G(zt; γ, c),

where thetransition function G (zt; γ, c) is a continuous function determining the weight ofregime 2, and usually bounded between 0 and 1.

The STAR model is closely associated with the work of Teräsvirta (1994), (1998)

yt =

(ν1 +

p∑i=1

α1iyt−i

)(1 − G (zt; γ, c)) +

(ν2 +

p∑i=1

α2iyt−i

)G (zt; γ, c) + εt (6)

whereεt ∼ IID(0, σ2).

The transition variable zt can be a lagged endogenous variable (zt = yt−d for d > 0),an exogenous variable (zt = xt), or a function of some lagged endogenous and exogenousvariables:zt = g(yt−d, xt). For zt = t a model with smoothly changing parameters results(see Lin and Teräsvirta, 1994).c is thethreshold, γ is thesmoothnessparameter.

The STAR model (6) exhibits two regimes

• associated with the extreme values of the transition function: G (zt; γ, c) = 1 andG (zt; γ, c) = 0;

• transition from one regime to the other is gradual;• the regime occurring at timet is observable (for givenzt; γ, c) and can be determined

by G (zt; γ, c).

For multiple-regime STAR models: see Dijk (1999).

Choices for the transition functionG (zt; γ, c) :

• logistic cumulative density function (LSTAR): different behavior for positive versusnegative values ofzt relatively toc

G (zt; γ, c) =1

1 + exp −γ(zt − c) .

Forγ → ∞ : LSTAR→ SETAR:

G (zt; γ, c) = I(zt > c);

Forγ → 0 : LSTAR→ linear AR

G (zt; γ, c) = 0.5.

12

• exponential function (ESTAR): different behavior for small versus large deviations ofzt

from the thresholdc :

G (zt; γ, c) = 1 − exp−γ(zt − c)2

.

Forγ → ∞ andγ → 0 : ESTAR→ linear AR:

G (zt; γ, c) = 0.

• quadratic logistic function:

G (zt; γ, c) =1

1 + exp −γ(zt − c1)(zt − c2) .

Forγ → ∞ : quadratic LSTAR→ 3-regime SETAR:

G (zt; γ, c) = 1 − I(c1 < zt < c2);

Forγ → 0 : quadratic LSTAR→ linear AR

G (zt; γ, c) = 0.5.

Properties of STAR models

• Little is known about the conditions under which STAR models are stationary;• Stationarity has to be evaluated by numerical procedures;• Even under stationarity: Rich variability of the implied dynamics

– unique equilibrium– multiple equilibria– limit cycles– strange attractors (chaos)

STAR models of US Industrial Production

Terasvirta and Anderson (1992):2-regime LSTAR model of the annual growth rate of USIndustrial Production (quarterly data from 1961-1986):

∆4yt =

(ν1 +

9∑i=1

α1i∆4yt−i

)(1 − G(·)) +

(ν2 +

9∑i=1

α2i∆4yt−i

)G(·) + εt

with the transition function

G (yt−3; γ, c) =1

1 + exp −45(∆yt−3 − 0.0061)/σy .

Properties of business cycle

• expansion: ∆yt−3 > 0.61%largest root ofα1(L): modulus= 0.76 and period= 61 quarters

• contraction:∆yt−3 < 0.61%largest root ofα2(L): modulus= 1.1 and period= 8.9 quarters

• the economy moves from deep recession into higher growth very aggressively.

13

Multivariate Smooth Transition Models

yt =

(ν1 +

p∑i=1

A1iyt−i

)(1 − G (zt; γ, c)) +

(ν2 +

p∑i=1

A2iyt−i

)G (zt; γ, c) + εt

whereyt = (y1t, · · · , yKt)′, εt ∼ IID(0,Σ), Ami is a(K × K) matrix,νm is (K × 1).

Tsay (1998) describes a specification procedure for multivariate threshold models.

Suppose now thatyt is I(1), but a linear combinationet = β′yt is stationary with meanµ.

Then a smooth transition equilibrium correction model is of interest:

Asymmetric VECMs

∆yt = α1 (1 − G (et−1; γ, µ)) (et−1 − µ) + α2G (et−1; γ, µ) (et−1 − µ) + εt.

LSTAR: positive versus negative deviations from equilibrium

G (et−1; γ, µ) =1

1 + exp −γ(et−1 − µ) .

SETAR results forγ → ∞G (et−1; γ, µ) = I(et−1 > µ)

ESTAR: small versus large deviations from equilibrium

G (et−1; γ, µ) = 1 − exp−γ(et−1 − µ)2

.

Interesting case: random walk behavior in regime 1(β′α1 = 0) and mean adjustment inregime 2 (β′α2 < 0)

See Granger and Lee (1989) for an early attempt and Granger and Swanson (1996) for a moregeneral discussion.

14

2.3.2 Maximum likelihood estimation

STAR model

yt = x′tβ1 (1 − G (zt; γ, c)) + x′

tβ2G (zt; γ, c) + εt,

εt ∼ IID(0, σ2)

Non-linear least squares (NLS)estimation ofθ = (β′1, β

′2; γ, c)′ :

θ = arg minθ

RSS = arg minθ

T∑t=1

ε2t (θ)

whereεt(θ) = yt − [x′tβ1 (1 − G (zt; γ, c)) + x′

tβ2G (zt; γ, c)] .

• Under the assumption of normality,εt ∼ NID(0, σ2) : NLS = ML.• Estimation via numerical optimization procedure (see e.g. Hendry, 1995, Appendix

A5).

– local maxima!– convergence?

• Starting values:

– Conditional uponγ andc : OLS estimation ofβ = (β′1′ , β

′2)

′

β(γ, c) =T∑

t=1

[xt(γ, c)xt(γ, c)′

]−1xt(γ, c)yt

wherext(γ, c) = (x′t (1 − G (zt; γ, c)) , x′

tG (zt; γ, c))′ ;– Grid search overγ andc : minRSS(γ, c).

• Concentrating the likelihood (RSS) function:

– Conditional uponγ andc : OLS estimation ofβ = (β′1′ , β

′2)

′;– NLS of γ andc : minRSS(γ, c).

• Problem: precise estimation ofγ

– reason: for large values ofγ, the shape of the logistic function changes only little– accurate estimate ofγ requires many observations in the immediate neighbourhood

of the thresholdc.– insignificance ofγ should not be interpreted as evidence against the presence of

STAR nonlinearity (see Bates and Watts, 1988).

15

2.3.3 Model selection

An empirical specification procedure

Terasvirta (1994) based on the Granger and Teräsvirta (1993) recommendation of a specific-to-general procedure for non-linear models.

(1) Specify appropriate linear AR(p) model;(2) Test the null hypothesis of linearity against the STAR alternative;(3) If linearity is rejected, selectzt and specifyG(zt; γ, c);(4) Estimate the STAR model;(5) Evaluate the STAR model using diagnostic tests;(6) If misspecification is detected, modify the model;(7) Use the model for descriptive or forecasting purposes.

Testing for STAR nonlinearity

Problem: Under the null of linearity, some ‘nuisance’parameters are not identified

null hypothesis nuisance parameters

(ν1, α11, . . . α1p) = (ν2, α21, . . . α2p) γ; c

γ = 0 (ν1, α11, . . . α1p) − (ν1, α11, . . . α1p); c

→ conventional statistical theory can not be applied (see Davies, 1977, Davies, 1987 andHansen, 1996b)

→ non-standard distributions

→ critical values have to be determined by means of simulation methods.

Solution proposed by Luukkonen, Saikkonen and Teräsvirta (1988):

Replace the transition functionG(zt; γ, c) by a suitableTaylor approximation .In the reparametrized model, the identification problem is no longer present.Linearity can be tested by means of a Lagrange multiplier (LM) statistic,which has a standard asymptoticχ2−distribution under the null.

→ Test against LSTAR: Luukkonenet al. (1988).

→ Test against LSTAR: Granger and Teräsvirta (1993).

→ LSTAR against ESTAR: Teräsvirta (1994) and Escribano and Jorda (1999).

16

Diagnostic checking in STAR models

Eitrheim and Teräsvirta (1996) discuss formal diagnostic tests for STAR models

• Jarque-Bera test for normality of the residuals• LM type test for serial autocorrelation• LM test for remaining nonlinearity (two-regime STAR against the alternative of an ad-

ditive STAR model)• LM test for parameter constancy (two-regime STAR against the alternative of a time-

varying STAR model)

17

Hans–Martin Krolzig Hilary Term 2002


2.4 Markov-switching vector autoregressions

2.4.1 The MS-VAR model

In Markov-switching vector autoregressive (MS-VAR)models it is assumed that the regimest

is generated by ahiddendiscrete-state homogeneous and ergodicMarkov chain:

Pr(st|St−1, Yt−1,Xt) = Pr(st|st−1; ρ)

defined by the transition probabilities

pij = Pr(st+1 = j|st = i).

The conditional process is a VAR(p) with

• shift in the mean (MSM-VAR): once-and-for-all jump in the time series

yt − µ(st) = A1(st) (yt−1 − µ(st−1)) + . . . + Ap(st) (yt−p − µ(st−p)) + ut,

• shift in the intercept (MSI-VAR): smooth adjustment of the time series

yt = ν(st) + A1(st)yt−1 + . . . + Ap(st)yt−p + ut,

A major advantage of the MS-VAR is its flexibility, see Krolzig (1997).

Special MS-VAR ModelsMSM MSI Specification

µ varying µ invariant ν varying ν invariant

Aj Σ invariant MSM–VAR linear MVAR MSI–VAR linear VAR

invariantΣ varying MSMH–VAR MSH–MVAR MSIH–VAR MSH–VAR

Aj Σ invariant MSMA–VAR MSA–MVAR MSIA–VAR MSA–VAR

varying Σ varying MSMAH–VAR MSAH–MVAR MSIAH–VAR MSAH–VAR

18

1955 1960 1965 1970 1975 1980 1985

0

2.5

MSM(2)-AR(4), 1952 (2) - 1984 (4)

1955 1960 1965 1970 1975 1980 1985

.5

1Probabilities of Regime 1

1955 1960 1965 1970 1975 1980 1985

.5

1Probabilities of Regime 2

Figure 3 Hamilton’s MSM(2)-AR(4) model.

Markov-switching autoregressive models of US GNP

Hamilton (1989): 2-regime MS-AR model for the quarterly growth rate of U.S. GNP:

∆yt − µ(st) =4∑

k=1

αk (∆yt−k − µ(st−k)) + ut, ut|st ∼ NID(0, σ2)

Two regimes“state of the business cycle”

µ(st) =

µ1 > 0 if st = 1 (‘expansion’)µ2 < 0 if st = 2 (‘contraction’)

generated by an ergodicMarkov chain

p12 = Pr( contraction int | expansion int − 1)

p21 = Pr( expansion int | contraction int − 1)

19

50 60 70 80 90

.5

1MSM(2)-AR(4) Model, 1959:2 - 1996:250 60 70 80 90

.5

1MSM(2)-AR(4) Model, 1959:2 - 1996:2

50 60 70 80 90

.5

1MSM(2)-AR(4) Model, 1947:2 - 1990:4

50 60 70 80 90

.5

1MSM(2)-AR(4) Model, 1947:2 - 1984:4

1950 1960 1970 1980 1990

.5

1MSM(2)-AR(2) Model, 1959:2 - 1996:2

1950 1960 1970 1980 1990

.5

1MSM(2)-AR(2) Model, 1947:1 - 1990:4

1950 1960 1970 1980 1990

.5

1MSM(2)-AR(2) Model, 1959:2 - 1990:4

1950 1960 1970 1980 1990

.5

1MSM(2)-AR(2) Model, 1947:2 - 1984:4

Figure 4 MSM(2)-AR models of US GNP growth.

50 60 70 80 90

.5

1‘High’ Growth Regime, H

1948:2-1990:4

50 60 70 80 90

.5

11948:2-1984:4

50 60 70 80 90

.5

1‘Recession’ Regime, L

50 60 70 80 90

.5

1

50 60 70 80 90

.5

11960:2-1990:4

50 60 70 80 90

.5

1

50 60 70 80 90

.5

11960:2-1996:2

50 60 70 80 90

.5

1

Figure 5 MSM(3)-AR models of US GNP growth.

20

2.4.2 State-Space Representation

The framework for the statistical analysis of MS-VAR models is thestate-space form. Theadvantage of viewing MS-VAR models in this way is that general concepts as the likelihoodprinciple and a recursive filter algorithm can be introduced. The state-space model consists ofthe set ofmeasurementandtransitionequations.

Measurement or observation equation(conditional process): The measurement equationdescribes the relation between the unobserved state vectorξt and the observed time seriesvectoryt. Here, the predetermined variablesYt−1 and the vector of Gaussian disturbancesut

enter the model.

Example: MSI(M )-VAR(1) model

yt = Mξt + A1yt−1 + ut

whereM =[

ν1 · · · νM

]andξt =

I(st = 1)...

I(st = M)

with I(st = m) =

1 if st = m

0 otherwise

State or transition equation (regime generating process): The state vectorξt follows aMarkov chain subject to a discrete adding-up restriction. The Markov chain governing thestate vectorξt can be represented as a first-order vector autoregression (cf. Hamilton, 1994b):

ξt+1 = Fξt + vt+1, vt+1 ≡ ξt+1 − E[ξt+1|ξt−j∞j=0]

whereF = P′ is the transition matrix. The last equation implies that the innovationvt is anmartingale difference series. Although the vectorvt can take on only a finite set of values,the meanE[vt] = E[vt|ξt−j∞j=1] equals zero. While it is impossible to improve the fore-cast ofvt given the previous realizations of the Markov chain, the conditional variance ofvt,E[vtv

′t|ξt−j∞j=1] = E[vtv

′t|ξt−1] depends onξt−1.

21

MSM-VAR processes as linearly transformed VAR processes

MSM(M )–VAR(p) Process,p ≥ 0

A(L) (yt − µ(st)) = ut ⇐⇒

yt = µ(st) + zt

µt = Mξt,

A(L) zt = ut, ut i.i.d. WN(0,Σu).

State Space Representation

yt − µy = Mζt + Jzt

ζt = Fζt−1 + vt

zt = Azt−1 + ut

⇐⇒

yt − µy =

[M J

] [ ζt

zt

][

ζt

zt

]=

[F 00 A

][ζt−1

zt−1

]+

[vt

ut

]

ζt =

ξ1,t...

ξM−1,t

−

ξ1...

ξM−1

F =

p1,1 − pM,1 . . . pM−1,1 − pM,1...

...p1,M−1 − pM,M−1 . . . pM−1,M−1 − pM,M−1

,

M =[

µ1 − µM . . . µM−1 − µM

],

zt =

zt

zt−1...

zt−p+1

, A =

A1 . . . Ap−1 Ap

IK 0 0. . .

......

0 . . . IK 0

, ut =

ut

0...0

,

J = e′1 ⊗ IK .

A VARMA-Representation Theorem

MSM(M)−VAR(p)

yt = µy + Mζt + zt

zt = A(L)−1ut, A(L) = IK − A1L − . . . − ApLp

ζt = F(L)−1vt, F(L) = IM−1 −FL

Moving-average representation:

yt = µy + MF(L)−1vt + A(L)−1ut

Final-equations-form VARMA(M + Kp − 1,M + Kp − 2):

|F(L)||A(L)|(yt − µy) = M|A(L)|F(L)∗vt + |F(L)|A(L)∗ut,

22

2.4.3 Related models

Mixture of normals

Themixture of normalsmodel is characterized by serially independently distributed regimes:

Pr(st|St−1, Yt−1) = Pr(st; ρ).

This is a special case of the MS-AR model, which results when the transition probabilities areindependent of the history of the regime.

The conditional probability distribution ofyt is independent ofSt−1,

p(yt|Yt−1, St−1) = p(yt|Yt−1),

and the regimes are Granger non-causal foryt. Even so, this model can be considered as arestricted MS-VAR model where the transition matrix has rank one. Moreover, if only level ofthe time series is regime-dependent, the model is observationally equivalent to time-invariantlinear processes with non-normal errors.

Time-varying transition probabilities (endogenous switching)

All the previously mentioned models are special cases of anendogenous selection model:Thetransition probabilitiespij are not time-invariant parameters, but functions of the observedtime series vectoryt−d or some exogenous variablesxt:

Pr(st = 1|St−1, Yt−1,Xt) = F (zt, st−1; γ, c) =

1 − F12(zt; γ, c) if st−1 = 1

F21(zt; γ, c) if st−1 = 2.

For example, in the case of an exponential function the time-varying transition probabilitiesare given by:

pijt = Fij(zt; γ, c) = 1 − exp−γij(zt − cij)2

for i 6= j

andpiit = 1 −∑Mj=1 pijt.

In contrast to an MS-AR model, the regime switching rule also depends on the history ofthe observed variables. Since the observed variables contain additional information on theconditional probability distribution of the states, the regime generating process is no longerMarkovian:

Pr(st|St−1, Yt−1)a.e.6= Pr(st|st−1).

In contrast to the SETAR and the STAR model, MS-VAR models include the possibility thatthe threshold depends on the last regime,e.g. that the threshold for staying in regime 2 isdifferent from the threshold for switching from regime 1 to regime 2 .

23

−5 −4 −3 −2 −1 0 1 2 3 4 5

0.2

0.4

Regime−dependent densities

p(yt |st=1,Yt−1) p(yt |st=2,Yt−1)

−5 −4 −3 −2 −1 0 1 2 3 4 5

0.1

0.2

0.3Density of yt given Yt−1

p(yt |Yt−1) for Pr(st=1|Yt−1)=.3 p(yt |Yt−1) for Pr(st=1|Yt−1)=.5

−5 −4 −3 −2 −1 0 1 2 3 4 50.0

0.5

1.0Regime inference after observation of yt

Pr(st=1|Yt) for Pr(st=1|Yt−1)=.3 Pr(st=1|Yt) for Pr(st=1|Yt−1)=.5

Figure 6 Regime inference.

2.4.4 Regime inference

The discrete support of the state in the MS-AR model allows to derive the complete conditionaldistribution of the unobservable state variable

• instead of deriving the first two moments, as in the Kalman filter (cf. Kalman, 1960,Kalman and Bucy, 1961, and Kalman, 1963) for Gaussian linear state-space models,

• the grid-approximation suggested by Kitagawa (1987) for non-linear, non-normal state-space models.

Literature

• The filtering and smoothing algorithms for time series models with Markov-switchingregimes are closely related to Hamilton (1988, 1989, 1994a) building upon ideas ofCosslett and Lee (1985).

• The basic filtering and smoothing recursions had been introduced by Baum, Petrie,Soules and Weiss (1970) for the reconstruction of hidden Markov chains.

• Lindgren (1978) applied their algorithms to regression models with Markovian regimeswitches.

• A major improvement of the smoother has been provided by the backward recursions ofKim (1994).

24

Filtering

The filter introduced by Hamilton (1989) can be described as an iterative algorithm for calcu-lating the optimal inference ofξt+1 on the basis of the information set int consisting of theobserved values ofyt, namelyYt = (y′t, y′t−1, . . . , y

′1−p)

′. It might also be viewed as a discreteversion of the Kalman filter for the state-space model

yt = XtB ξt + ut,

ξt+1 = F ξt + vt+1.

For given parameters, the discrete-state algorithm under consideration summarizes the condi-tional probability distribution of the state vectorξt by

ξt|t = E[ξt|Yt] =

Pr(ξt = ι1|Yt)...

Pr(ξt = ιN |Yt)

.

Since each component ofξt|t is a binary variable,ξt|t possesses not only the interpretation asthe conditional mean, which is the optimal inference ofξt given Yt, but it also presents theprobability distribution ofξt conditional onYt.

The filtering algorithm computesξt|t by deriving the joint probability density ofξt and yt

conditioned on observationsYt.

By invoking the law of Bayes, the posterior probabilitiesPr(ξt|yt, Yt−1) are given by

Pr(ξt|Yt) ≡ Pr(ξt|yt, Yt−1) =p(yt|ξt, Yt−1)Pr(ξt|Yt−1)

p(yt|Yt−1),

with the prior probability

Pr(ξt|Yt−1) =∑ξt−1

Pr(ξt|ξt−1)Pr(ξt−1|Yt−1)

and the density

p(yt|Yt−1) =∑ξt

p(yt, ξt|Yt−1) =∑ξt

Pr(ξt|Yt−1)p(yt|ξt, Yt−1).

Note that the summation involves all possible values ofξt andξt−1.

Let ηt be the vector of the densities ofyt conditional onξt andYt−1

ηt =

p(yt|θ1, Yt−1)...

p(yt|θN , Yt−1)

=

p(yt|ξt = ι1, Yt−1)...

p(yt|ξt = ιN , Yt−1)

,

whereθ has been dropped on the right hand side to avoid unnecessary notation, such that thedensity ofyt conditional onYt−1 is given byp(yt|Yt−1) = η′tξt|t−1 = 1′

N (ηt ξt|t−1).

25

Then, the contemporaneous inferenceξt|t is given in matrix notation by

ξt|t =ηt ξt|t−1

1′N (ηt ξt|t−1)

, (7)

where denotes the element-wise matrix multiplication and1N = (1, . . . , 1)′ is a vectorconsisting of ones. The filter weights for each regime the conditional density of the observationyt, given the vectorθm of AR parameters of regimem, with the predicted probability of beingin regimem at time t given the information setYt−1. Thus, the instruction (7) describesthe filtered regime probabilitiesξt|t as an update of the estimateξt|t−1 of ξt given the newinformationyt.

The transition equation implies that the vectorξt+1|t of predicted probabilities is a linear func-tion of the filtered probabilitiesξt|t:

ξt+1|t = Fξt|t. (8)

The sequenceξt|t−1Tt=1 can therefore be generated by iterating on (7) and (8), which can be

summarized as:

ξt+1|t =F(ηt ξt|t−1)

1′(ηt ξt|t−1). (9)

In the prevailing Bayesian context,ξt|t−1 is the prior distribution ofξt. The posterior distri-bution ξt|t is calculated by linking the new informationyt with the prior via Bayes’ law. Theposterior distributionξt|t becomes the prior distribution for the next stateξt+1 and so on.

Smoothing

The filter recursions deliver estimates forξt, t = 1, . . . , T based on information up to timepoint t. This is a limited information technique, as we have observations up tot = T . In thefollowing, full-sample information is used to make an inference about the unobserved regimesby incorporating the previously neglected sample informationYt+1.T = (y′t+1, . . . , y

′T )′ into

the inference aboutξt. Thus, the smoothing algorithm gives the best estimate of the unobserv-able state at any point within the sample.

The smoothing algorithm proposed by Kim (1994) may be interpreted as a backward filter thatstarts at the end pointt = T of the previously applied filter.

The full–sample smoothed inferencesξt|T can be found by iterating backward fromt = T −1, · · · , 1 by starting from the last output of the filterξT |T and by using the identity

Pr(ξt|YT ) =∑ξt+1

Pr(ξt, ξt+1|YT )

=∑ξt+1

Pr(ξt|ξt+1, YT )Pr(ξt+1|YT ). (10)

26

For pure AR models with Markovian parameter shifts, the probability laws foryt and ξt+1

depend only on the current stateξt and not on the former history of states. Thus, we have

Pr(ξt|ξt+1, YT ) ≡ Pr(ξt|ξt+1, Yt, Yt+1.T )

=p(Yt+1.T |ξt, ξt+1, Yt)Pr(ξt|ξt+1, Yt)

p(Yt+1.T |ξt+1, Yt)= Pr(ξt|ξt+1, Yt).

It is therefore possible to calculate the smoothed probabilitiesξt|T by getting the last term fromthe previous iteration of the smoothing algorithmξt+1|T , while it can be shown that the firstterm can be derived from the filtered probabilitiesξt|t,

Pr(ξt|ξt+1, Yt) =Pr(ξt+1|ξt, Yt)Pr(ξt|Yt)

Pr(ξt+1|Yt)

=Pr(ξt+1|ξt)Pr(ξt|Yt)

Pr(ξt+1|Yt). (11)

If there is no deviation between the full information estimate,ξt+1|T , and the inference basedon the partial information,ξt+1|t, then there is no incentive to updateξt|T = ξt|t and thefiltering solutionξt|t cannot be further improved.

In matrix notation, (10) and (11) can be condensed to

ξt|T =(F′(ξt+1|T ξt+1|t)

) ξt|t, (12)

where and denote the element-wise matrix multiplication and division. The recursionis initialized with the final filtered probability vectorξT |T . Recursion (12) describes howthe additional informationYt+1.T is used in an efficient way to improve the inference on theunobserved stateξt.

27

2.4.5 Maximum Likelihood estimation

The Likelihood Function

In econometrics the so-calledMarkov model of switching regressionsconsidered by Goldfeldand Quandt (1973)

yt = x′tβm + umt, umt ∼ NID(0, σ2

m) for m = 1, 2

has been one of the first attempts to analyze regressions with Markovian regime shifts. Gold-feld and Quandt (1973) claimed to derive maximum likelihood estimates by maximizing their“likelihood” function, which would be in terms of our model

Q(θ, ρ, ξ0) =T∏

t=1

ηt(θ)′ξt|0(ρ, ξ0),

whereηt is again an(M × 1) vector collecting the conditional densitiesp(yt|Yt−1, θm),m =1, . . . ,M , andξt|0 = Ftξ0 are the unconditional regime probabilities.

Unfortunately, the functionQ(θ, ρ, ξ0) isnot the likelihood function as pointed out by Cosslettand Lee (1985).

Derivation of the likelihood function as a by–product of the filter:

L(λ|Y ) := p(YT |Y0;λ)

=T∏

t=1

p(Yt|Yt−1, λ)

=T∏

t=1

∑ξt

p(yt|ξt, Yt−1, θ) Pr(ξt|Yt−1, λ)

=T∏

t=1

η′tξt|t−1

=T∏

t=1

η′t Fξt−1|t−1 .

The conditional densitiesp(yt|ξt−1 = ιi, Yt−1) are mixtures of normals. Thus, the likelihoodfunction is non-normal:

L(λ|Y ) =T∏

t=1

N∑i=1

N∑j=1

pij Pr(ξt−1 = ιi|Yt−1, λ) p(yt|ξt = ιj , Yt−1, θ)

=T∏

t=1

N∑i=1

N∑j=1

pij ξi.t−1|t−1

(2π)−K/2|Σj|−1/2 exp

(−1

2u′

jtΣ−1j ujt

),

whereujt = yt − E[yt|ξt = ιj , Yt−1] andN = Mp+1 in MSM specifications orN = M

otherwise.

28

Normal Equations of theML Estimator

The maximum likelihood (ML) estimates can be derived by maximization of likelihood func-tion L(λ|Y ) subject to the adding-up restrictions:

P1M = 1

1′Mξ0 = 1

and the non-negativity restrictions

ρ ≥ 0, σ ≥ 0, ξ0 ≥ 0.

If the non-negativity can be ensured, theML estimateλ is given by the first-order conditions(FOCs) of the constrained log-likelihood function

ln L∗(λ) := ln L(λ|YT ) − κ′1(P1M − 1M ) − κ2(1′

M ξ0 − 1). (13)

Then the FOCs are given by the set of simultaneous equations

∂ ln L(λ|Y )∂θ′

= 0

∂ ln L(λ|Y )∂ρ′

− κ′1(1

′M ⊗ IM ) = 0

∂ ln L(λ|Y )∂ξ′0

− κ21′M = 0,

where it is assumed that the interior solution of these conditions exits and is well-behaved,such that the non-negativity restrictions are not binding.

The derivation of the log-likelihood function concerning the parameter vectorθ leads to thescore function

∂ ln L(λ|Y )∂θ′

=1L

∫∂p(Y |ξ, θ)

∂θ′Pr(ξ|ξ0, ρ) dξ

=1L

∫∂ ln p(Y |ξ, θ)

∂θ′p(Y |ξ, θ)Pr(ξ|ξ0, ρ) dξ

=∫

∂ ln p(Y |ξ, λ)∂θ′

Pr(ξ|Y, λ) dξ

=T∑

t=1

∑ξt

∂ ln p(yt|ξt, Yt−1, λ)∂θ′

Pr(ξt|YT , λ)

Maximization of the constrained likelihood function with respect to the parameter vectorρ ofthe hidden Markov chain leads to

∂ ln L(λ|Y )∂ρ′

=1L

∫p(Y |ξ, θ)

∂Pr(ξ|ξ0, ρ)∂ρ′

dξ

=1L

∫∂ ln Pr(ξ|ξ0, ρ)

∂ρ′p(Y |ξ, θ)Pr(ξ|ξ0, ρ) dξ

=∫

∂ ln Pr(ξ|ξ0, ρ)∂ρ′

Pr(ξ|Y, λ) dξ.

29

Thus, theML estimator of the vector of transition probabilitiesρ is equal to the transitionprobabilities in the sample calculated with the smoothed regime probabilities:

pij =∑T

t=1 Pr(st = j, st−1 = i|YT ;λ)∑Tt=1 Pr(st−1 = i|YT ;λ)

.

The EM Algorithm

As shown in Hamilton (1990), theExpectation-Maximization(EM) algorithm introduced byDempster, Laird and Rubin (1977) can be used in conjunction with the filter to obtain themaximum likelihood estimates of the model’s parameters.

The EM algorithm is an iterativeML estimation technique designed for a general class ofmodels where the observed time series depends on some unobservable stochastic variables.For the hidden Markov-chain model an early precursor to the EM algorithm was providedby Baumet al. (1970) building upon ideas in Baum and Eagon (1967). The consistency andasymptotic normality of the proposedML estimator were studied in Baum and Petrie (1966)and Petrie (1969). Their work has been extended by Lindgren (1978) to the case of regressionmodels with Markov-switching regimes.

Each iteration of the EM algorithm consists of two steps:

• In the expectation step(E), the unobserved statesξt are estimated by their smoothedprobabilitiesξt|T . The conditional probabilitiesPr(ξ|Y, λ(j−1)) are calculated with thefilter and smoother by using the estimated parameter vectorλ(j−1) of the last maximiz-ation step instead of the unknown true parameter vectorλ.

• In themaximization step(M), an estimate ofλ is derived as a solutionλ of the FOCs ofML estimation, where the conditional regime probabilitiesPr(ξt|Y, λ) are replaced bythe smoothed probabilitiesξt|T (λ(j−1)) of the last expectation step. Thus, the dominantsource of non-linearities in the FOCs is eliminated. If the score,i.e. the gradient ofln L(λ|YT ), would have been linear inξ, this procedure were equivalent to replacing theunobserved latent variablesξ in the FOCs with their expectationξt|T .

Equipped with the new parameter vectorλ the filtered and smoothed probabilities are updatedand so on. Thus, each EM iteration involves a pass through the filter and smoother, followedby an update of the first order conditions and the parameter estimates and is guaranteed toincrease the value of the likelihood function.

General results available for the EM algorithm indicate that the likelihood function increasesin the number of iterationsj. Finally, a fixed-point of this iteration scheduleλ(j) = λ(j−1)

coincides with the maximum of the likelihood function. The general statistical properties ofthe EM algorithm are discussed more comprehensively in Ruud (1991).

30

Determination of the number of regimes in MS-VAR models

Testing for the number of regimes in an MS-VAR model is a difficult enterprise:

Conventional testing approaches are not applicable due to the presence of unidentified nuis-ance parameters under the null of linearity.

null hypothesis nuisance parameters

µ1 = µ2 p12, p21

p12 = 0(s0 = 1) µ2

The presence of the nuisance parameters gives the likelihood surface sufficient freedom so thatone cannot reject the possibility that the apparently significant parameters could simply be dueto sampling variation. The scores associated with parameters of interest under the alternativemay be identically zero under the null.

Davies (1977, 1987) derived an upper bound for the significance level of the likelihood ratiotest statistic under nuisance parameters.

Formal tests of the Markov-switching model against linear alternative employing standardizedlikelihood ratio test designed to deliver (asymptotically) valid inference have been proposedby Hansen (1992, 1996a), Garcia (1998), but they are computationally demanding.

The results of Ang and Bekaert (1998) indicate that critical values of theχ2(r+n) distributioncan be used approximately wherer is the number of restricted parameters andn is the numberof nuisance parameters.

Alternatives

• Information criteria:

AIC = −2 log L/T + 2n/T,

SC = −2 log L/T + n log(T )/T,

HQ = −2 log L/T + 2n log(log(T ))/T,

whereL is the maximized likelihood,n is the number of parameters andT is the samplesize: see Akaike (1985), Schwarz (1978), and Hannan and Quinn (1979).

• Check model congruency: specification and misspecification testing!

31

Hans–Martin Krolzig Hilary Term 2002


3 Prediction and structural analysis with regime-switching models

Forecasting and structural analysis with regime-switching models is considerably more in-volved than with linear ones. Various techniques have been proposed to overcome these prob-lems (see,inter alia, Granger and Teräsvirta, 1993). Though the main problems are commonto all non-linear models, we will focus on the MS-VAR approach in the following.

3.1 Predictions of linear and nonlinear stochastic processes

For the mean square prediction error (MSPE) criterion,

miny

E[(yt+h − y)2

∣∣∣Ωt

],

the optimal predictor ofyt+h is given by the conditional expectation for the given informationsetΩt :

yt+h|t = E[yt+h|Ωt],

whereΩt is the available information set, i.e. the past of the stochastic process up to timet,Ωt = Yt. The prediction error associated with the optimal predictoryt+h|t is given by

et+h|t = yt+h − E[yt+h|Yt].

32

3.1.1 Linear AR(1) model

yt = αyt−1 + εt, εt ∼ IID(0, σ2).

One-step predictionyt+1|t = E [αyt + εt+1|Ωt] = αyt.

Multi-step prediction

yt+h|t = E [αyt+h−1 + εt+h|Ωt] = αyt+h−1|t = αhyt = F h(yt, α).

3.1.2 Nonlinear AR(1) model

yt = F (yt−1; θ) + εt, εt ∼ IID(0, σ2)

whereF (yt−1; θ) is some nonlinear function.

One-step predictionyt+1|t = E [F (yt; θ) + εt+1|Ωt] = F (yt; θ).

Multi-step prediction, sayh = 2 :

yt+2|t = E [F (yt+1; θ) + εt+2|Ωt]

= E [F (yt+1; θ)|Ωt]

6= F (E [yt+1|Ωt] ; θ) = F(yt+1|t; θ

)

33

3.1.3 Methods of calculating multi-step forecasts in nonlinear models

(1) ‘Naive’ approachy

(n)t+2|t = F

(yt+1|t; θ

)→ biased.

(2) ‘Exact’ approach (closed form forecast)

y(e)t+2|t =

∫ +∞

−∞F (F (yt; θ) + εt+1; θ) f(εt+1) dεt+1

=∫ +∞

−∞F (yt+1; θ) g(yt+1|Ωt) dyt+1

=∫ +∞

−∞E [yt+2|yt+1] g(yt+1|Ωt) dyt+1

wheref(εt+1) is the pdf ofεt+1 andg(yt+1|Ωt) = p(yt+1−F (yt; θ)) is the pdf ofyt+1

conditional onΩt.

→ approximation by numerical integration; time-consuming forh > 2→ normal forecast error method: assumes normality ofg(yt+h−1|Ωt).

(3) ‘Monte-Carlo’ method

y(mc)t+2|t =

1N

N∑i=1

F (F (yt; θ) + εi; θ)

whereN is large andεi is drawn from the presumed distribution ofεt.

→ approximation ofg(yt+h−1|Ωt) by simulation(4) ‘Bootstrap’ method

y(bs)t+2|t =

1T

T∑i=1

F (F (yt; θ) + εi; θ)

where the residualsεi from the estimated model are used.→ distribution-free

(5) ‘Direct’ approach (Multi-step estimation)

yt = G(yt−2; τ) + ε∗t=⇒ yt+2 = G(yt; τ)

34

3.2 Forecasting performance of non-linear / regime-switching models

3.2.1 Empirical Findings

• Superior in-sample fit does not imply superior forecastswhen compared to linear models

– Clements and Krolzig (1998)– Dacco and Satchell (1999)

• Dependence on the regime in which the forecast was made

– Pesaran and Potter (1997)– Clements and Smith (1999)

3.2.2 Illustrative Example: Hamilton’s model of the US business cycle

∆yt − µ(st) =4∑

k=1

αk (∆yt−k − µ(st−k)) + ut,

States of the business cycle: µ(st) =

µ1 > 0 if st = 1 (‘expansion’)µ2 < 0 if st = 2 (‘contraction’)

Transition probabilities : p12 = Pr(contraction int | expansion int − 1)

p21 = Pr(expansion int | contraction int − 1)

Forecast comparison

• Monte Carlo study

– Generate data from the empirical MSM(2)-AR(4) model– Estimate MS-AR, AR and SETAR models– Compare their forecasts for different metrics.

• Empirical forecast accuracy comparison (1980-84,1985-1996,1992-96)

35

1 2 3 4 5 6 7 8

.98

1

1.02

1.04

1.06

1.08

1.1

RMSE

Forecast Horizon

DGP AR MS-AR MS2-AR4 SETAR

Figure 7 Monte Carlo comparison of the models on RMSE.

1 2 3 4 5 6 7 8 9 10 11 12

.75

1

1.25

1.5

1.75

2

Q.95

Q.90

RMSE

MAE

Q.50

Forecast horizon

Figure 8 Monte Carlo. Forecast Errors when the DGP is the MSM(2)-AR(4).

36

-2.5 0 2.5

.2

.4

Predicting the MSM(2)-AR(4) Process1-stepN(s=0.976)

-2.5 0 2.5

.2

.42-stepN(s=1.02)

-2.5 0 2.5

.1

.2

.312-stepN(s=1.06)

-2.5 0 2.5

.2

.4

Forecasting the MSM(2)-AR(4) Process with MSM(2)-AR(p) Models1-stepN(s=1.01)

-2.5 0 2.5

.2

.42-stepN(s=1.05)

-2.5 0 2.5

.2

.412-stepN(s=1.07)

-2.5 0 2.5

.2

.4Forecasting the MSM(2)-AR(4) Process with AR(p) Models

1-stepN(s=1.03)

-2.5 0 2.5

.1

.2

.32-stepN(s=1.07)

-2.5 0 2.5

.2

.412-stepN(s=1.07)

-2.5 0 2.5 5

.2

.4

Forecasting the MSM(2)-AR(4) Process with SETAR Models1-stepN(s=1.06)

-5 -2.5 0 2.5 5

.1

.2

.32-stepN(s=1.1)

-15 -10 -5 0 5

.2

.412-stepN(s=1.08)

Figure 9 Monte Carlo. Forecast Error Density.

1 2 3 4 5 6 7 8

.5

1

1.5

2

2.5

1980-84 (ex-post)

Forecast horizonForecast horizon Forecast horizon Forecast horizon1 2 3 4 5 6 7 8

.5

1

1.5

2

2.5

1980-84MS-AR: RMSPE MAPEQ.95AR: RMSPE MAPEQ.95SETAR: RMSPE MAPEQ.95

1 2 3 4 5 6 7 8

.5

1

1.5

2

2.5

1985-96

1 2 3 4 5 6 7 8

.5

1

1.5

2

2.5

1992-96

Figure 10 Empirical Forecasting Performance of the Hamilton Model.

37

3.3 Predicting Markov-switching VARs

The following discussion is based on Krolzig (2000).

3.3.1 Econometric theory of predicting multiple time series subject to shifts in regime

• Optimal predictor of Markov-switching time series models• Factors resulting in deviations from linear forecasting rule

– Significance of regime shifts– Persistence of the regime generating process– Asymmetry of the regime generating process– Interaction with the autoregressive dynamics

• Concepts

– Predictability of the regime-generating process– Granger-causality of the regimes for the observed variables

3.3.2 Prediction in Markov-switching regression models

I. Switching regression model

yt =

Xtβ1 + ut, ut|Xt, st ∼ NID(0,Σ1) if st = 1

...XtβM + ut, ut|Xt, st ∼ NID(0,ΣM ) if st = M

Thus:p(yt|xt, st = m) is Gaussian with expectationXtβm and varianceΣm.

II. VAR(1) Representation of the hidden Markov Chain

ξt = Fξt−1 + vt, vt ∼ MDS and ξt =[

I(st = 1) · · · I(st = M)]′

Unrestricted ARM−1(1) representation using∑M

m=1 ξmt = 1 :

ζt = Fζt−1 + vt, vt ∼ MDS with ζmt = ξmt − ξm for m = 1, · · · ,M − 1.

Example: Two-state Markov chain

ζt = ρζt−1 + vt, ρ = p11 + p22 − 1

38

Prediction Density

Mixture of normals weighted with thepredicted regime probsPr(st+h = j|Ωt) :

p(yt+h|Ωt) =M∑

j=1

Pr(st+h = j|Ωt)p(yt+h|xt+h, st+h = j)

=M∑

j=1

M∑i=1

Pr(st+h = j|st = i)Pr(st = i|Ωt)

p(yt+h|xt+h, st+h = j)

where thefiltered regime probsPr(st = m|Ωt) are given by theRule of Bayes:

Pr(st = j|Ωt) =p(yt|xt, st = j)Pr(st = j|Ωt−1)

M∑i=1

p(yt|xt, st = i)Pr(st = i|Ωt−1).

One-step prediction density:

p(yt+1|Ωt) =M∑

j=1

M∑i=1

pijPr(st = i|Ωt)

p(yt+1|xt+1,st+1 = j)

MSPE-Optimal Predictor

Weighted average of the predictors of theM regimes,

yt+1|t = E[yt+1|Ωt] =M∑

j=1

Pr(st+1 = j|Ωt)E [yt+1|xt, st+1 = j] ,

whereE[yt+1|xt+1,st+1 = j] = Xt+1βj :

yt+1|t = E[yt+1|Ωt] =M∑

j=1

Xt+1βjPr(st+1 = j|Ωt) = Xt+1βt+1|t

βt+1|t =M∑

j=1

βj

(M∑i=1

pijPr(st = i|Ωt)

)

Multi-step predictions:

yt+h|t = Xt+h[β1, · · · , βM ]ξt+h|t = Xt+h[β1, · · · , βM ]F hξt|t

When can we expect this predictor to outperform a linear forecasting rule?

39

3.3.3 Predictability and Granger-Causality

Unpredictability: The regime generating processst is said to beunpredictableiff the re-gimes are serially independent:

Pr(st+1|st) = Pr(st+1).

If the regime generating processst is unpredictable, then the detection of recent regimeshifts has no predictive value for future regimes

Pr(st+1|st) = Pr(st+1) ⇒ Pr(st+h|Ωt−1) = Pr(st+h).

Granger causality:

st is said to benon-causalfor yt in astrict sense iff

p(yt+1|Ωt;λ) = p(yt+1|Ωt, st;λ).

st is said to benon-causalfor yt in aweaksense iff

E[yt+1|Ωt;λ] = E[yt+1|Ωt, st;λ].

Result for MS-Regression Models

Unpredictability of regimes implies non-causality.

The regimest is Granger non-causal for the observed times series vectoryt (in a strictsense) iff the regime is unpredictable:

ξt+h|t = ξ ⇒ yt+h|t = Xt+hβ

Observational equivalence to a time-invariantlinear model with heteroscedastic non-Gaussianerrors:

yt = Xtβ + wt, f(wt) =M∑

m=1

ξmfu

(wt − Xt(βm − β)

)

40

3.3.4 Prediction of MS time series processes

VAR(1) process with shifts in theintercept.

yt − µy = Mζt + A (yt−1 − µy) + ut

Optimalh-step prediction

yt+h|t − µy = Khζt|t + Ah(yt − µy), Kh =

(h∑

i=1

Ah−iMF i

)

Example: MSI(2)-AR(1) with ζt = ρζt−1 + vt

yt+h|t − µy = αh(yt − µy) + Kh(ν1, ν2, α, ρ) ζt|t,

with Kh(ν1, ν2, α, ρ) = (ν1 − ν2)

(h∑

i=1

αh−iρi

).

Optimal predictor

• Dynamic intercept correction Khζt|twhich depends on the persistence of the regimes:Kh → 0.

• The predictoryt+h|t is linear in ζt|t and the lastp observations ofYt

• But yt+h|t is anon-linear function of the observedYt

as the regime inferenceζt|t is non-linear inYt.

Result: Unpredictability of regimes implies strict non-causality

MS-AR is observationally equivalent to alinear AR model with heteroscedastic, non-Gaussian errors (mixture of normals):

yt+h|t − µy = Ah(yt − µy)

Markovian Shifts in the Mean of a AR Process

Example: AR(1) process with shifts in themeanµ:

yt − µ(st) = α (yt−1 − µ(st−1)) + ut

Sum of two independent processes:

yt − µy = (µ1 − µ2)ζt + zt

zt = αzt−1 + ut, ut ∼ NID(0, σ2), zt = yt − µy − (µ1 − µ2)ζt

ζt = ρζt−1 + vt, vt+1 ∼ MDS, ρ = p11 + p22 − 1

41

Optimal predictor

yt+h|t = µy + (µ1 − µ2)ζt+h|t + zt+h|t

= µy + αh (yt − µy) + (µ1 − µ2)[ρh − αh

]ζt|t.

Result: Unpredictability of regimes does not imply non-causality!

General case of an MS-AR

Consider the model

yt = A(ξt)yt−1 + ut

ξt = Fξt−1 + νt

It follows that ξ1tyt...

ξMtyt

=

p11A1 · · · pM1A1...

...p1MAM · · · pMMAM

ξ1t−1yt−1

...ξMt−1yt−1

+ εt

ηt = Πηt−1 + εt

whereηt = (ξt ⊗ yt) andεt is a MDS, such that

E [ηt+h|ηt] = Πhηt.

As

yt =M∑i=1

ξityt

we have that

E [yt+h|Ωt] =M∑i=1

E [ξit+hyt+h|Ωt]

= (1′M ⊗ IK)E [ηt+h|Ωt]

= (1′M ⊗ IK)ΠhE [ηt|Ωt]

= (1′M ⊗ IK)ΠhE [ηt|Ωt]

= (1′M ⊗ IK)Πh (E [ξt|Ωt] ⊗ yt)

Thusyt+h|t = (1′

M ⊗ IK)Πh(ξt|t ⊗ yt

)

42

3.3.5 Conclusions

(i) Detecting recent regime shifts is essential to predict MS-AR processes.(ii) The predictability of regimes and theirGranger causality for the observed time series

are critical for the predictive value of detected regime shifts.(iii) The optimal predictor differs from a linear prediction rule by including

adynamic intercept correction.(iv) MS-AR processes haveshort memory. The longer the forecast horizon, the better the

linear approximation of the optimal predictor.(v) Forecastability requires the structural stability of the MS-AR.

43

3.4 Impulse-response analysis

3.4.1 Traditional and generalized impulse-response analysis

Measure of the response ofyt+h to a shock or impulseδ at timet, given a historyωt.

Traditional impulse response function:

TIRF(h, δ, ωt−1) = E [yt+h|εt = δ, εt+1 = · · · = εt+h = 0, ωt−1]

−E [yt+h|εt = 0, εt+1 = · · · = εt+h = 0, ωt−1] .

Linear models:TIRF is symmetric, linear and history independent.Nonlinear models:TIRF depends on the sign and size of the shock, as well as the history ofthe process; assumption of no intermediate shocks is problematic.

Generalized impulse response function(introduced by Koop, Pesaran and Potter, 1996):

GIRF(h, δ, ωt−1) = E [yt+h|εt = δ, ωt−1] − E [yt+h|ωt−1] .

For linear models:GIRF = TIRF.

TheGIRF can be interpret as the realisation of the random variable

GIRF(h, εt,Ωt−1) = E [yt+h|εt,Ωt−1] − E [yt+h|Ωt−1] .

Thus various conditional versions of the typeGIRF(h,A,B) can be defined by fixing the shockor the history.Calculation by Monte Carlo simulation.

44

3.4.2 Impulse responses in MS-ARs

• The response of to shocks arising from theGaussian innovationsto each of the variables(corresponds to the impulse response analysis in linear Gaussian VARs).

E [yt+h|ut = δ, ωt−1] − E [yt+h|ut = 0, ωt−1]

• The study of the path of the variables when there is achange in regime such as fromrecession to growth, from recession to high growth, growth to recession or any othercombination between the existing regimes.

E [yt+h|st = j, ωt−1] − E [yt+h|st = i, ωt−1]

• The dynamic when there is a move in the information structurefrom the ergodic distri-bution to certainty regards the state.

E [yt+h|st = j, ωt−1] − E [yt+h|ωt−1]

Note: Responses are linear inδ and independent ofωt−1.

The state-space representation

Consider theMS(M )–AR(p) representation is given by

yt = Mξt + A1yt−1 + . . . + Apyt−p + ut,

whereM =[ν1 : · · · : νM ], A1 = IK + αβ′ + Γ1 andAj = Γj − Γj−1 for 1 < j ≤ p withΓp = 0K .

To derive the impulse-response functions, we use thestacked MS(M )-AR(1) representationin yt = (y′t, . . . , y′t−p+1)

′:yt = Hξt + JAyt−1 + ut,

whereA =

A1 . . . Ap−1 Ap

IK 0 0.. .

...0 IK 0

, H =

M0...0

andJ = [IK : 0 : · · · : 0]

The complete state-space representation involves the VAR(1) representation of theMarkovchain

ξt+1 = Fξt + vt,

whereξt is the unobservable state vector andvt is a martingale difference sequence.

Hence theexpectationof yt+h conditional uponut, ξt, Yt−1 is given by:

yt+h|t = Hξt+h|t + JAyt+h−1|t

where the conditional expectation ofξt+h is

ξt+h|t = Fhξt.

45

The response to shocks arising from the Gaussian innovations to the variables

∂yt+h

∂ujt= JAhιj (14)

If the variance-covariance matrixΣu is regime-dependent, the standardized and orthogonal-ized impulse-responses also become regime-dependent:

∂yt+h

∂εjt= JAhD(ξt)ιj , (15)

whereut = D(ξt)εt andD(ξt) is a lower triangular matrix resulting from the Choleski de-composition ofΣu(ξt) = D(ξt)D(ξt)′.

Response to changes in regime such as from recession to growth

The effects of regime shifts can be measured as the reaction ofxt+h to the information thatst = j (considered as a shift from the unconditional distributionξ or themth regime):

dyt+h = J

(h∑

k=0

AkHFh−k

)(ιj − ξ

)(16)

dyt+h = J

(h∑

k=0

AkHFh−k

)(ιj − ιm) . (17)

Dynamics are generated by:

• Changes of the current state and hence to the cond. expectation of future regimes.• Autoregressive transmission of intercept shifts.

46

References

Akaike, H. (1985). Prediction and entropy. In Atkinson, A. C., and Fienberg, S. E. (eds.),A Celebrationof Statistics, pp. 1–24. New York: Springer-Verlag.

Andrews, D. W. K. (1993). Tests for parameter instability and structural change point.Econometrica,61, 821–856.

Andrews, D. W. K., and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present onlyunder the alternative.Econometrica, 62, 1386–1414.

Ang, A., and Bekaert, G. (1998). Regime switches in interest rates. Research paper 1486, StanfordUniversity.

Banerjee, A., Lazarova, S., and Urga, G. (1998). Bootstrapping sequential tests for multiple structuralbreaks. Discussion paper eco no. 98/24, European University Institute, Florence.

Bates, D. M., and Watts, D. G. (1988).Nonlinear regression and its application. New York: JohnWiley.

Baum, L. E., and Eagon, J. A. (1967). An inequality with applications to statistical estimation for prob-abilistic functions of Markov chains and to a model for ecology.Bull. American MathematicalSociety, 73, 360–363.

Baum, L. E., and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markovchains.Annals of Mathematical Statistics, 37, 1554–1563.

Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring inthe statistical analysis of probabilistic functions of Markov chains.Annals of MathematicalStatistics, 41, 164–171.

Carrasco, M. (1994). The asymptotic distribution of the Wald statistic in misspecified structural change,threshold or Markov switching models. Discussion Paper, GREMAQ.

Clements, M. P., and Krolzig, H. M. (1998). A comparison of the forecast performance of Markov-switching and threshold autoregressive models of US GNP.Econometrics Journal, 1, C47–C75.

Clements, M. P., and Smith, J. (1999). A Monte Carlo study of the forecasting performance of empiricalSETAR models.Journal of Applied Econometrics, 14, 124–141.

Cosslett, S. R., and Lee, L.-F. (1985). Serial correlation in latent discrete variable models.Journal ofEconometrics, 27, 79–97.

Dacco, R., and Satchell, S. (1999). Why do regime-switching models forecast so badly?.Journal ofForecasting, 18, 1–16.

Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternat-ive. Biometrika, 64, 247–254.

Davies, R. B. (1987). Hypothesis testing when a nuisance parameter is present only under the alternat-ive. Biometrika, 74, 33–43.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood estimation from incom-plete data via the EM algorithm.Journal of the Royal Statistical Society, 39, Series B, 1–38.

Dijk, D. v. (1999). Extensions and outlier robust inference. Tinbergen institute research series 200,Erasmus University, Rotterdam.

Eitrheim, Ø., and Teräsvirta, T. (1996). Testing the adequacy of smooth transition autoregressive mod-els. Journal of Econometrics, 74, 59–76.

Escribano, A., and Jorda, O. (1999). Improved testing and specification of smooth transition autore-gressive models. InNonlinear Time Series Analysis of Economic and Financial Data, pp. 289–

47

319. Boston: Kluwer Academic Press.

Garcia, R. (1998). Asymptotic null distribution of the likelihood ratio test in Markov switching models.International Economic Review, 39.

Goldfeld, S. M., and Quandt, R. E. (1973). A Markov model for switching regressions.Journal ofEconometrics, 1, 3–16.

Granger, C. W. J., and Lee, T. H. (1989). Investigation of production, sales and inventory relation-ships using multicointegration and non-symmetric error correction models.Journal of AppliedEconometrics, 4, S145–S159.

Granger, C. W. J., and Swanson, N. (1996). Further developments in the study of cointegrated variables.Oxford Bulletin of Economics and Statistics, 58, 537–554.

Granger, C. W. J., and Teräsvirta, T. (1993).Modelling nonlinear economic relationships. Oxford:Oxford University Press.

Hamilton, J. D. (1988). Rational-expectations econometric analysis of changes in regime. An invest-igation of the term structure of interest rates.Journal of Economic Dynamics and Control, 12,385–423.

Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and thebusiness cycle.Econometrica, 57, 357–384.

Hamilton, J. D. (1990). Analysis of time series subject to changes in regime.Journal of Econometrics,45, 39–70.

Hamilton, J. D. (1994a). State-space models. In Engle, R., and McFadden, D. (eds.),Handbook ofEconometrics, Vol. 4. Amsterdam: North–Holland.

Hamilton, J. D. (1994b).Time Series Analysis. Princeton: Princeton University Press.

Hannan, E. J., and Quinn, B. G. (1979). The determination of the order of an autoregression.Journalof the Royal Statistical Society, B, 41, 190–195.

Hansen, B. E. (1992). The likelihood ratio test under non-standard conditions: Testing the Markovswitching model of GNP.Journal of Applied Econometrics, 7, S61–S82.

Hansen, B. E. (1996a). Erratum: the likelihood ratio test under non-standard conditions: Testing theMarkov switching model of GNP.Journal of Applied Econometrics, 11, 195–199.

Hansen, B. E. (1996b). Inference when a nuisance parameter is not identified under the null.Econo-metrica, 64, 414–430.

Hendry, D. F. (1995).Dynamic Econometrics. Oxford: Oxford University Press.

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems.Transactions ASMEJournal of Basic Engineering, Series D, 82, 35–45.

Kalman, R. E. (1963). New methods in Wiener filtering theory. In Bogdanoff, J. L., and Kozin, F.(eds.),Proceedings of the First Symposium of Engineering Applications of Random FunctionTheory and Probability, pp. 270–388: New York: Wiley.

Kalman, R. E., and Bucy, R. S. (1961). New results in linear filtering and prediction theory.Transac-tions ASME Journal of Basic Engineering, Series D, 83, 95–108.

Kim, C.-J. (1994). Dynamic linear models with Markov-switching.Journal of Econometrics, 60, 1–22.

Kitagawa, G. (1987). Non–gaussian state–space modeling of nonstationary time series.Journal of theAmerican Statistical Association, 82, 1032–1041.

Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multivariatemodels.Journal of Econometrics, 74, 119–147.

48

Krolzig, H.-M. (1997).Markov Switching Vector Autoregressions. Modelling, Statistical Inference andApplication to Business Cycle Analysis. Berlin: Springer.

Krolzig, H.-M. (2000). Predicting Markov-switching vector autoregressive processes. EconomicsDiscussion Paper 2000-W31, Nuffield College, Oxford.

Lin, C.-F., and Teräsvirta, T. (1994). Testing the constancy of regression parameters against continousstructural change.Journal of Econometrics, 62, 211–228.

Lindgren, G. (1978). Markov regime models for mixed distributions and switching regressions.Scand-inavian Journal of Statistics, 5, 81–91.

Luukkonen, R., Saikkonen, P., and Teräsvirta, T. (1988). Testing linearity against smooth transitionautoregressive models.Biometrika, 75, 491–499.

Pesaran, M. H., and Potter, S. M. (1997). A floor and ceiling model of US Output.Journal of EconomicDynamics and Control, 21, 661–695.

Petrie, T. (1969). Probabilistic functions of finite state Markov chains.Annals of Mathematical Statist-ics, 60, 97–115.

Potter, S. M. (1993). A nonlinear approach to US GNP.Journal of Applied Econometrics, 10, 109–125.

Ruud, P. A. (1991). Extensions of estimation methods using the EM algorithm.Journal of Economet-rics, 49, 305–341.

Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6, 461–464.

Sims, C. A. (1980). Macroeconomics and reality.Econometrica, 48, 1–48.

Terasvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressivemodels.Journal of the American Statistical Association, 89, 208–218.

Terasvirta, T. (1998). Modelling economic relationships with smooth transition regressions. In Ullah,A., and Giles, D. (eds.),Handbook of Applied Economic Statistics, pp. 507–555. New York:Marcel Dekker.

Terasvirta, T., and Anderson, H. (1992). Modelling nonlinearities in business cycles using smoothtransition autoregressive models.Journal of Applied Econometrics, 7, S119–S136.

Tiao, G. C., and Tsay, R. S. (1994). Some advances in non-linear and adaptive modelling in time-series.Journal of Forecasting, 13, 109–131.

Tsay, R. S. (1989). Testing and modeling threshold autoregressive processes.Journal of the AmericanStatistical Association, 84, 231–240.

Tsay, R. S. (1998). Testing and modeling multivariate threshold models.Journal of the AmericanStatistical Association, 93, 1188–1202.

49

Contents

1 Introduction . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Linear time series models . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Regime-switching models . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Regime shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 The Conditional Process . . . . . . . . . . . . . . . . . . . . . 61.2.3 The Regime Generating Process . . . . . . . . . . . . . . . . . 6

2 Types of regime-switching models . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Structural change and switching regression models . . . . . . . . . . . . 7

2.1.1 Structural break models . . . . . . . . . . . . . . . . . . . . . 72.1.2 Switching regression model . . . . . . . . . . . . . . . . . . . 72.1.3 Maximum likelihood estimation under normality . . .. . . . . 8

2.2 Threshold models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 The TAR model . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 The SETAR model . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Maximum likelihood estimation under normality . . .. . . . . 10

2.3 Smooth transition autoregressive models . .. . . . . . . . . . . . . . . . 122.3.1 The STAR model . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Maximum likelihood estimation .. . . . . . . . . . . . . . . . 152.3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Markov-switching vector autoregressions . . . . . . . . . . . . . . . . . 182.4.1 The MS-VAR model . . . . . . . . . . . . . . . . . . . . . . . 182.4.2 State-Space Representation . . . . . . . . . . . . . . . . . . . 212.4.3 Related models . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4.4 Regime inference . . . . . . . . . . . . . . . . . . . . . . . . . 242.4.5 Maximum Likelihood estimation. . . . . . . . . . . . . . . . 28

3 Prediction and structural analysis with regime-switching models . . . . . . . . . . 323.1 Predictions of linear and nonlinear stochastic processes . . . . . . . . . . 32

3.1.1 Linear AR(1) model . . . . . . . . . . . . . . . . . . . . . . . 333.1.2 Nonlinear AR(1) model . . . . . . . . . . . . . . . . . . . . . 333.1.3 Methods of calculating multi-step forecasts in nonlinear models 34

3.2 Forecasting performance of non-linear / regime-switching models . . . . 353.2.1 Empirical Findings . . . . . . . . . . . . . . . . . . . . . . . . 353.2.2 Illustrative Example: Hamilton’s model of the US business cycle 35

3.3 Predicting Markov-switching VARs . . . . . . . . . . . . . . . . . . . . 383.3.1 Econometric theory of predicting multiple time series subject to

shifts in regime . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.2 Prediction in Markov-switching regression models . . . . . . . 383.3.3 Predictability and Granger-Causality . . . . .. . . . . . . . . 403.3.4 Prediction of MS time series processes . . . . . . . . . . . . . 413.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Impulse-response analysis .. . . . . . . . . . . . . . . . . . . . . . . . 443.4.1 Traditional and generalized impulse-response analysis. . . . . 443.4.2 Impulse responses in MS-ARs . .. . . . . . . . . . . . . . . . 45

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

50

Date post:	23-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Regime–Switching Models - WordPress.com...modelling of time series subject to shifts in regime....

Documents