State space models - Rob J. Hyndman · 2014. 6. 2. · Outline 1Simple structural models 2Linear...

Rob J Hyndman

State space models

2: Structural models

Outline

1 Simple structural models

2 Linear Gaussian state space models

3 Kalman filter

4 Kalman smoothing

5 Time varying parameter models

State space models 2: Structural models 2

Outline



3 Kalman filter

4 Kalman smoothing



State space models

xt−1 yt

xt yt+1

xt+1 yt+2

xt+2 yt+3

xt+3 yt+4

xt+4 yt+5

ETS state vectorxt = (`t,bt, st, st−1, . . . , st−m+1)


State space models

xt−1 yt

xt yt+1

xt+1 yt+2

xt+2 yt+3

xt+3 yt+4

xt+4 yt+5


ETS models

å yt depends on xt−1.

å The same errorprocess affectsxt|xt−1 and yt|xt−1.


State space models

xt yt

xt+1 yt+1

xt+2 yt+2

xt+3 yt+3

xt+4 yt+4

xt+5 yt+5


Structural models

å yt depends on xt.

å A different errorprocess affectsxt|xt−1 and yt|xt.


Local level model

Stochastically varying level (random walk)observed with noise

yt = `t + εt

`t = `t−1 + ξt

εt and ξt are independent Gaussian white noiseprocesses.

Compare ETS(A,N,N) where ξt = αεt−1.

Parameters to estimate: σ2ε and σ2

ξ .

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).


Local level model


yt = `t + εt

`t = `t−1 + ξt




ξ .

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).


Local level model


yt = `t + εt

`t = `t−1 + ξt




ξ .

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).


Local level model


yt = `t + εt

`t = `t−1 + ξt




ξ .

If σ2ξ = 0, yt ∼ NID(`0, σ

2ε ).


Local linear trend modelDynamic trend observed with noise

yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

εt, ξt and ζt are independent Gaussian whitenoise processes.Compare ETS(A,A,N) where ξt = (α + β)εt−1 andζt = βεt−1

Parameters to estimate: σ2ε , σ

2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.

Model is a time-varying linear regression.State space models 2: Structural models 7


yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.



yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.



yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.



yt = `t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



2ξ , and σ2

ζ .If σ2

ζ = σ2ξ = 0, yt = `0 + tb0 + εt.


Basic structural model

yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1

εt, ξt, ζt and ηt are independent Gaussian white noiseprocesses.Compare ETS(A,A,A).Parameters to estimate: σ2

ε , σ2ξ , σ

2ζ and σ2

η

Deterministic seasonality if σ2η = 0.



yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1


ε , σ2ξ , σ

2ζ and σ2

η




yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1


ε , σ2ξ , σ

2ζ and σ2

η




yt = `t + s1,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

s1,t = −m−1∑j=1

sj,t−1 + ηt

sj,t = sj−1,t−1, j = 2, . . . ,m− 1


ε , σ2ξ , σ

2ζ and σ2

η



Trigonometric models

yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt

sj,t = cosλjsj,t−1 + sinλjs∗j,t−1 + ωj,t

s∗j,t = − sinλjsj,t−1 + cosλjs∗j,t−1 + ω∗

j,t

λj = 2πj/mεt, ξt, ζt, ωj,t, ω∗

j,t are independent Gaussian whitenoise processesωj,t and ω∗

j,t have same variance σ2ω,j

Equivalent to BSM when σ2ω,j = σ2

ω and J = m/2Choose J <m/2 for fewer degrees of freedom



yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



j,t








yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



j,t








yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



j,t








yt = `t +

J∑j=1

sj,t + εt

`t = `t−1 + bt−1 + ξt

bt = bt−1 + ζt



j,t







ETS vs Structural modelsETS models are much more general as theyallow non-linear (multiplicative components).ETS allows automatic forecasting due to itslarger model space.Additive ETS models are almost equivalent tothe corresponding structural models.ETS models have a larger parameter space.Structural models parameters are alwaysnon-negative (variances).Structural models are much easier togeneralize (e.g., add covariates).It is easier to handle missing values withstructural models.












Structural models in R

StructTS(oil, type="level")StructTS(ausair, type="trend")StructTS(austourists, type="BSM")

fit <- StructTS(austourists, type = "BSM")decomp <- cbind(austourists, fitted(fit))colnames(decomp) <- c("data","level","slope",

"seasonal")plot(decomp, main="Decomposition of

International visitor nights")


Structural models in R20

4060

data

2535

45

leve

l

−2.

0−

0.5

slop

e

−10

010

2000 2002 2004 2006 2008 2010

seas

onal

Time

Decomposition of International visitor nights


ETS decomposition20

4060

obse

rved

2535

45

leve

l

0.50

750.

5090

slop

e

−10

05

2000 2002 2004 2006 2008 2010

seas

on

Time

Decomposition by ETS(A,A,A) method


Outline



3 Kalman filter

4 Kalman smoothing



Linear Gaussian SS models

Observation equation yt = f ′xt + εt

State equation xt = Gxt−1 +wt

State vector xt of length pG a p× p matrix, f a vector of length pεt ∼ NID(0, σ2), wt ∼ NID(0,W).

Local level model:f = G = 1, xt = `t.Local linear trend model:f ′ = [1 0],

xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ

]State space models 2: Structural models 15






xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ







xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ







xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ







xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ







xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ







xt =

[`tbt

]G =

[1 10 1

]W =

[σ2ξ 0

0 σ2ζ


Basic structural modelLinear Gaussian state space model

yt = f ′xt + εt, εt ∼ N(0, σ2)

xt = Gxt−1 +wt wt ∼ N(0,W)

f ′ = [1 0 1 0 · · · 0], W = diagonal(σ2ξ , σ

2ζ , σ

2η ,0, . . . ,0)

xt =

`tbts1,t

s2,t

s3,t...

sm−1,t

G =

1 1 0 0 . . . 0 00 1 0 0 . . . 0 00 0 −1 −1 . . . −1 −10 0 1 0 . . . 0 0

0 0 0 1 . . . ......

......

... . . . . . . 0 00 0 0 . . . 0 1 0


Basic structural modelLinear Gaussian state space model

yt = f ′xt + εt, εt ∼ N(0, σ2)

xt = Gxt−1 +wt wt ∼ N(0,W)

f ′ = [1 0 1 0 · · · 0], W = diagonal(σ2ξ , σ

2ζ , σ

2η ,0, . . . ,0)

xt =

`tbts1,t

s2,t

s3,t...

sm−1,t

G =

1 1 0 0 . . . 0 00 1 0 0 . . . 0 00 0 −1 −1 . . . −1 −10 0 1 0 . . . 0 0

0 0 0 1 . . . ......

......

... . . . . . . 0 00 0 0 . . . 0 1 0


Outline



3 Kalman filter

4 Kalman smoothing



Kalman filterNotation:

xt|t = E[xt|y1, . . . , yt] Pt|t = V[xt|y1, . . . , yt]

xt|t−1 = E[xt|y1, . . . , yt−1] Pt|t−1 = V[xt|y1, . . . , yt−1]

yt|t−1 = E[yt|y1, . . . , yt−1] vt|t−1 = V[yt|y1, . . . , yt−1]

Forecasting:yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

Updating or State Filtering:xt|t = xt|t−1 + Pt|t−1f v

−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1f v−1t|t−1f

′Pt|t−1

State Predictionxt+1|t = Gxt|t

Pt+1|t = GPt|tG′ +W







vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1









vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1









vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1









vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1



Iterate for t = 1, . . . , T







vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1




Assume we know x1|0 andP1|0.







vt|t−1 = f ′Pt|t−1f + σ2


−1t|t−1(yt − yt|t−1)


′Pt|t−1




Assume we know x1|0 andP1|0.

Just conditional expectations. So thisgives minimum MSE estimates.


Kalman recursionsKALMANRECURSIONS

2. Forecasting

1. State Prediction 3. State Filtering

Forecast Observation

observation at time t

Filtered State Predicted State Filtered State

Time t-1 Time t Time t

y

x


Initializing Kalman filterNeed x1|0 and P1|0 to get started.

Common approach for structural models:set x1|0 = 0 and P1|0 = kI for a very large k.

Lots of research papers on optimalinitialization choices for Kalman recursions.

ETS approach was to estimate x1|0 and avoidP1|0 by assuming error processes identical.

A random x1|0 could be used with ETS models,and then a form of Kalman filter would berequired for estimation and forecasting.

This gives more realistic prediction intervals.State space models 2: Structural models 20































Local level model

yt = `t + εt εt ∼ NID(0, σ2)

`t = `t−1 + ut ut ∼ NID(0,q2)

Kalman recursions:

yt|t−1 = ˆt−1|t−1

vt|t−1 = pt|t−1 + σ2

ˆt|t = ˆ

t−1|t−1 + pt|t−1v−1t|t−1(yt − yt|t−1)

pt+1|t = pt|t−1(1− v−1t|t−1pt|t−1) + q2


Local level model

yt = `t + εt εt ∼ NID(0, σ2)

`t = `t−1 + ut ut ∼ NID(0,q2)

Kalman recursions:

yt|t−1 = ˆt−1|t−1

vt|t−1 = pt|t−1 + σ2

ˆt|t = ˆ

t−1|t−1 + pt|t−1v−1t|t−1(yt − yt|t−1)

pt+1|t = pt|t−1(1− v−1t|t−1pt|t−1) + q2


Handling missing values

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2

Updating or State Filtering:

xt|t = xt|t−1+Pt|t−1f v−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1−Pt|t−1f v−1t|t−1f

′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate for t = 1, . . . , Tstarting withx1|0 and P1|0.



Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2




′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W


Ignored greyed outsection if yt missing.



Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2




′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W


Ignored greyed outsection if yt missing.


Multi-step forecasting

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2




′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate fort = T + 1, . . . , T + hstarting withxT|T and PT|T.


Multi-step forecasting

Forecasting:

yt|t−1 = f ′xt|t−1

vt|t−1 = f ′Pt|t−1f + σ2




′Pt|t−1

State Prediction

xt|t−1 = Gxt−1|t−1

Pt|t−1 = GPt−1|t−1G′ +W

Iterate fort = T + 1, . . . , T + hstarting withxT|T and PT|T.

Treat future values asmissing.


Kalman filter

What’s so special about the Kalman filter

Very general equations for any model in statespace format.

Any model in state space format can easily begeneralized.

Optimal MSE forecasts

Easy to handle missing values.

Easy to compute likelihood.


Kalman filter








Kalman filter








Kalman filter








Kalman filter








Likelihood calculationθ = all unknown parametersfθ(yt|y1, y2, . . . , yt−1) = one-step forecast density.

Likelihood

L(y1, . . . , yT;θ) =T∏

t=1

fθ(yt|y1, . . . , yt−1)

Gaussian log likelihood

log L = −T2

log(2π)− 1

2

T∑t=1

log vt|t−1 −1

2

T∑t=1

e2t /vt|t−1

where et = yt − yt|t−1.All terms obtained from Kalman filter equations.



Likelihood

L(y1, . . . , yT;θ) =T∏

t=1

fθ(yt|y1, . . . , yt−1)


log L = −T2

log(2π)− 1

2

T∑t=1

log vt|t−1 −1

2

T∑t=1

e2t /vt|t−1




Likelihood

L(y1, . . . , yT;θ) =T∏

t=1

fθ(yt|y1, . . . , yt−1)


log L = −T2

log(2π)− 1

2

T∑t=1

log vt|t−1 −1

2

T∑t=1

e2t /vt|t−1



Structural models in RForecasts from Basic structural model

2000 2002 2004 2006 2008 2010 2012

2030

4050

6070

fit <- StructTS(austourists, type = "BSM")fc <- forecast(fit)plot(fc)


Outline



3 Kalman filter

4 Kalman smoothing



Kalman smoothing

Want estimate of xt|y1, . . . , yT where t < T. That is,xt|T.

xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t

where At = Pt|tG′ (Pt+1|t

)−1.

Uses all data, not just previous data.Useful for estimating missing values:yt|T = f ′xt|T.Useful for seasonal adjustment when one of thestates is a seasonal component.


Kalman smoothing


xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t


)−1.



Kalman smoothing


xt|T = xt|t + At

(xt+1|T − xt+1|t

)Pt|T = Pt|t + At

(Pt+1|T − Pt+1|t

)A′t


)−1.



Kalman smoothing in R

fit <- StructTS(austourists, type = "BSM")sm <- tsSmooth(fit)

plot(austourists)lines(sm[,1],col=’blue’)lines(fitted(fit)[,1],col=’red’)legend("topleft",col=c(’blue’,’red’),lty=1,

legend=c("Filtered level","Smoothed level"))



Time

aust

ouris

ts

2000 2002 2004 2006 2008 2010

2030

4050

60 Filtered levelSmoothed level



fit <- StructTS(austourists, type = "BSM")sm <- tsSmooth(fit)

plot(austourists)

# Seasonally adjusted dataaus.sa <- austourists - sm[,3]lines(aus.sa,col=’blue’)



Time

aust

ouris

ts

2000 2002 2004 2006 2008 2010

2030

4050

60



x <- austouristsmiss <- sample(1:length(x), 5)x[miss] <- NAfit <- StructTS(x, type = "BSM")sm <- tsSmooth(fit)estim <- sm[,1]+sm[,3]

plot(x, ylim=range(austourists))points(time(x)[miss], estim[miss],

col=’red’, pch=1)points(time(x)[miss], austourists[miss],

col=’black’, pch=1)legend("topleft", pch=1, col=c(2,1),

legend=c("Estimate","Actual"))



Time

x

2000 2002 2004 2006 2008 2010

2030

4050

60

●

●

●

●

●●

●

●

●

●

●

●

EstimateActual


Outline



3 Kalman filter

4 Kalman smoothing



Time varying parameter modelsLinear Gaussian state space model

yt = f ′txt + εt, εt ∼ N(0, σ2t )

xt = Gtxt−1 +wt wt ∼ N(0,Wt)

Kalman recursions:

yt|t−1 = f ′txt|t−1

vt|t−1 = f ′tPt|t−1ft + σ2t

xt|t = xt|t−1 + Pt|t−1ftv−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1ftv−1t|t−1f

′tPt|t−1

xt|t−1 = Gtxt−1|t−1

Pt|t−1 = GtPt−1|t−1G′t +Wt


Time varying parameter modelsLinear Gaussian state space model

yt = f ′txt + εt, εt ∼ N(0, σ2t )

xt = Gtxt−1 +wt wt ∼ N(0,Wt)

Kalman recursions:

yt|t−1 = f ′txt|t−1

vt|t−1 = f ′tPt|t−1ft + σ2t

xt|t = xt|t−1 + Pt|t−1ftv−1t|t−1(yt − yt|t−1)

Pt|t = Pt|t−1 − Pt|t−1ftv−1t|t−1f

′tPt|t−1

xt|t−1 = Gtxt−1|t−1

Pt|t−1 = GtPt−1|t−1G′t +Wt


Structural models with covariates

Local level with covariate

yt = `t + βzt + εt

`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0

]Assumes zt is fixed and known (as inregression)

Estimate of β is given by xT|T.

Equivalent to simple linear regression with timevarying intercept.

Easy to extend to multiple regression withadditional terms.





`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0









`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0









`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0









`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0









`t = `t−1 + ξt

f ′t = [1 zt] xt =

[`tβ

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 0






Time varying regression

Simple linear regression with time varyingparameters

yt = `t + βtzt + εt

`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 σ2ζ

]Allows for a linear regression with parametersthat change slowly over time.

Parameters follow independent random walks.

Estimates of parameters given by xt|t or xt|T.State space models 2: Structural models 38




`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 σ2ζ







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 σ2ζ







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 σ2ζ







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[σ2ξ 0

0 σ2ζ




Updating (“online”) regression

Same idea can be used to estimate aregression iteratively as new data arrives.

Simple linear regression with updatingparameters


`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[0 00 0

]Updated parameter estimates given by xt|t.Recursive residuals given by yt − yt|t−1.






`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[0 00 0







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[0 00 0







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[0 00 0







`t = `t−1 + ξt

βt = βt−1 + ζt

f ′t = [1 zt] xt =

[`tβt

]G =

[1 00 1

]Wt =

[0 00 0



Date post:	02-Mar-2021
Category:	Documents
Upload:	others
View:	15 times
Download:	0 times

State space models - Rob J. Hyndman · 2014. 6. 2. · Outline 1Simple structural models 2Linear...

Documents