+ All Categories
Home > Documents > Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t)...

Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t)...

Date post: 15-Oct-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
35
1 Application of the Innovations Algorithm to Application of the Innovations Algorithm to Nonlinear State Nonlinear State - - Space Models Space Models Richard A. Davis Colorado State University (http://www.stat.colostate.edu/~rdavis/lectures) Joint work with: William Dunsmuir, University of New South Wales Gabriel Rodriguez-Yam, Colorado State University Ying Wang, Dept of Public Health, W. Virginia
Transcript
Page 1: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

1

Application of the Innovations Algorithm to Application of the Innovations Algorithm to Nonlinear StateNonlinear State--Space ModelsSpace Models

Richard A. DavisColorado State University

(http://www.stat.colostate.edu/~rdavis/lectures)

Joint work with:William Dunsmuir, University of New South WalesGabriel Rodriguez-Yam, Colorado State UniversityYing Wang, Dept of Public Health, W. Virginia

Page 2: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

2

� Generalized state-space models� Observation driven� Parameter driven

� Innovations algorithm (recursive one-step ahead prediction algorithm) � Applications

- Gaussian likelihood calculations- simulation - generalized least squares estimation

� Time series of counts� Examples (asthma data, polio data)� Generalized linear models (GLM)� Estimating equations (Zeger)� MCEM (Chan and Ledolter)� Importance sampling

- Durbin and Koopman� Approximation to the likelihood (Davis, Dunsmuir, and Wang)� Simulation results

� Examples

Page 3: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

3

Observations: y(t) = (y1, . . ., yt )

States: αααα(t) = (α1, . . ., αt )

Observation equation:

p(yt | αt ):= p(yt | αt , αααα(t-1), y(t-1) )

State equation:

p(αt+1 | αt ):= p(αt+1 | αt , αααα(t-1), y(t) )

Joint density:p(y1, . . . , yn, α1, . . . , αn )

= p(yn | αn , αααα(n-1), y(n-1) )p(αn, αααα(n-1), y(n-1) )= p(yn | αn) p(αn | αααα(n-1), y(n-1) ) p(αααα(n-1), y(n-1) )

=

=

Generalized State-Space Models (parameter driven)

)(p) |(p) |(p 11-j

n

2jjj

n

1jj α

αα

α ∏∏

==

y

Page 4: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

4

Conditional independence:

p(y1, . . . , yn | α1, . . . , αn ) =

Filtering or posterior density:

p(αt | y(t) ) = p(yt | αt )p(αt | y(t-1) )/p(yt | y(t-1) )

Predictive densities:

p(αt+1 | y(t) ) = p(αt | y(t) ) p(αt+1 | αt)dµ(αt)and

p(yt+1 | y(t) ) = p(yt+1 | αt+1) p(αt+1 | y(t) ) dµ(αt+1) :

Parameter driven (cont)

) |(p j

n

1jj α∏

=

y

Page 5: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

5

State equation: State variables follow a regression model with Gaussian AR(1) noise :

αt = ββββΤxt + Wt , Wt = φWt-1 + Zt , {Zt}~WN(0,σ2)

The resulting transition density of the state variables is

p(αt+1 | αt) = n(αt+1 ; ββββΤxt+1 + φ (αt - ββββΤxt), σ2 )

Examples of parameter driven models

Poisson model for time series of counts

Observation equation:..., 1, ,0 ,

!ee)|p( tt

-e

tt

ttt

==ααα

yy

yy

Remark: The case σ2 = 0 corresponds to a log-linear model with Poisson noise.

Page 6: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

6

Examples of parameter driven models

A stochastic volatility model for financial data (Taylor `86):Model:

Yt = σt Zt , {Zt}~IID N(0,1)

αt = φαt-1 + Wt , {Wt}~IID N(0,σ2),

where αt = log σt .

The resulting observation and state transition densities are

p(yt| αt) = n(yt ; 0, exp(2αt ))

p(αt+1 | αt) = n(αt+1 ; φ αt , σ2 )

Properties:

• Martingale difference sequence.

• Stationary.

• Strongly mixing at a geometric rate.

Page 7: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

7

The Innovations AlgorithmInnovations Algorithm (Brockwell and Davis `87): {Xt} is a zero-mean time series with ACVF κ(i,j), then

)ˆ()ˆ(Pˆ1111} , . . . , sp{1,1 t1

XXXXXX ttttttXXt −θ++−θ== ++ �

.)1,1(

,10 ,)1,1(

)1,1(

1

0

2,

11,

1

0,,

0

j

t

jjttt

kjjtt

k

jjkkktt

vttv

,...,t-kvvkt

v

=−

−−−

=−−

θ−++κ=

=

θθ−++κ=θ

κ=

The coefficients θt1, . . . , θtt and prediction errors vt-1 can be computed recursively from the equations,

and

Page 8: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

8

Remarks:

• Innovations algorithm expresses one-step predictor in terms of previousinnovations, , that are uncorrelated.

The Innovations Algorithm(cont)

tt XXXX ˆ,...,ˆ11 −−

),0(~}{ , 2111 σθ++θ+= −++ WNZZZZX tqtqttt �

• If {Xt} is an MA(q) process

then (θt1, . . . , θtt) = (θt1, . . . , θtq,0,…,0) for all t.

• Innovations algorithm is well adapted for ARMA(p,q) models—only need to apply to MA(q) piece (see B&D `96).

Page 9: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

9

The Innovations Algorithm—ApplicationsLikelihood calculation:

Using the IA representation,

we have

)ˆ()ˆ(ˆ111,1111,1 XXXXX tttttt −θ++−θ= −−−−− �

)ˆ(C

ˆ

ˆˆˆ

1

010010001

33

22

11

3,12,11,1

1,22,2

1,1

3

2

1

nnnn

nnnnnnnnn XX

XXXXXX

X

XXX

XXX −=

−−−

θθθ

θθθ

=

−−−−−−

�����

By taking covariances of both sides it follows that

),...,diag( ,)'( 1-n0 vvDC'DCE nnnnnnn ===Γ XX

Page 10: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

10

The Innovations Algorithm—ApplicationsQuadratic form:

∑=

−−−−

−=

−−=

−−=Γ

n

tttt

nnnnn

nnnnnnnnnnnn

vXX

D

CCDC'C'

11

2

1

1111

/)ˆ(

)ˆ()'ˆ(

)ˆ()()'ˆ('

XXXX

XXXXXX

Determinant:

)det()det( 1-n0 vvC'DC nnnn �==Γ

Gaussian likelihood:

}/)ˆ(2/1exp{)()2()( 12

1

2/110

2/−

=

−−

−∑ −−π=Γ tt

n

ttn

nn vXXvvL �

Simulation: If {Zt} ~ iid N(0,1), put

Then

has covariance matrix Γn.nnnnn DC'XX ZX 2/1

1 )',,( −== �

.12/1

01,112/1

21,12/1

1 ZvZvZvX tttttttt−

−−−−−−

−− θ++θ+= �

Page 11: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

11

Time Series of Counts—Notation and Setup

Count data: Y1, . . . , Yn

Regression (explanatory) variable: xt

Model: Distribution of the Yt given xt and a stochastic process αt are indep

Poisson distributed with mean

µt = exp(xtT ββββ + αt).

The distribution of the stochastic process αt may depend on a vector of

parameters γγγγ.

Note: αt = 0 corresponds to standard Poisson regression model.

Primary objective: Inference about β.β.β.β.

Page 12: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

12

Example: Daily Asthma Presentations (1990:1993)

Year 1990

••••••••••

•••••

•••••••••••••••••

•••••

••••••

••••••••••••

••••

•••••••••••

••••••••

•••••••••

•••

•••••

••••••

•••••

••

•••••••••••••••••••••

••

••••••

•••••

•••••••

••••

••••••••••••

•••••••••

••

•••

••••••

•••••••••

••••••••••••••

•••••••

••••••••••

••••••••••

••••••••••••

••••••

•••••

•••••••••••••••••

••••••

••••

•••••••••••

••••••••••••

•••••••••••••

•••••

••••••••••••

••••••••

••••••

••••

06

14

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Year 1991

••••••

••••••••••••

•••••••••

••••••••

•••••

•••••••

••••••

••••••

••••••

•••••••••

••

•••

••••••

•••••••

••••••

••••••••

••••••••••••••

•••••

•••••

•••

•••••••••

••••••••

••••••••

••••••••••••

•••••••

••••

••••••

•••••••••••

•••••

•••••••

••••••••••••

••••

••••••••••

•••••••••••••••••••••••••••

••••••••

•••••

•••••

••••••••••

•••••••

•••••••••••••

•••••

•••••••••••••••••••••••

••••••••••

•••••••••

06

14

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Year 1992

•••••••••

••••••••••••••••••••

••••••••

•••••••••••

•••••••

••

••

••••••••••

••

••

•••••

•••••

••••••

•••••••

•••

••••

••••••••••

•••

•••••••••

•••••••••

••••••

••

••

••

•••••••

•••••••

••••

•••••

••••••

•••••••••

••••••••••••

••••••

•••••••••

••••

••••••••

•••

••••••••

••••••••

•••••••••••••

•••••

••••••

••••••••••

••••••••

•••••••

•••••••••

•••••••••

•••••••

•••••••••

••••••••••••

••••••••

•••••

•••••••0

614

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Year 1993

••••••••••••

••••••••

•••••

•••••

•••••••••••

•••

••••

••

••••

••••

•••••••

•••••••••••

••

•••••••••

••••

••

••

••••••••

•••••••••

••••••••

•••••••••••

•••••••••••

••••

••••••

••

••••••

••

•••••

•••••••••••

•••••••••

•••••••

••••

••••••••••••

•••••••

•••••

••••••••

••••••

•••••••

•••••

•••••••

•••••

•••••

•••••

•••••••••

••••••••••••••••••••

••••

••••

•••••••

•••••••••••••••••••••

••••

•••••••••

••••••••••••0

614

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Page 13: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

13Year

Counts

1970 1972 1974 1976 1978 1980 1982 1984

02

46

810

1214

•••••

••

••

••••

•••••

•••

•••

••••••••

••••••••••••••••

••••••••

••••

•••

•••••

••••

••

••

•••••••

••

•••

••

••

•••

••••

•••••••••••••

••

•••••••••

••••

•••••

•••••

Polio Data With Estimated Regression Function

Page 14: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

15

Parameter-Driven Model for the Mean Function µt

Parameter-driven specification: (Assume Yt | µt is Poisson(µt ))

log µt = xtT ββββ + αt ,

where {αt } is a stationary Gaussian process. e.g. (AR(1) process)

(αt + σ2/2) = φ(αt-1 + σ2/2) + εt , {εt }~IID N(0, σ2(1-φ2)).

Advantages of this model specification:• properties of model (ergodicity and mixing) easy to derive.• interpretability of regression parameters

E(Yt ) = exp(xtT ββββ )Εexp(αt) = exp(xt

T ββββ ), if Εexp(αt) = 1.Disadvantages:

• estimation is difficult-likelihood function not easily calculated (MCEM, importance sampling, estimating eqns).

• model building can be laboriousRemark: See Davis, Dunsmuir, and Wang (1999) for testing of the existence of a latent process and estimating its ACF.

Page 15: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

19

Estimation Methods — Importance Sampling (Durbin and Koopman)

Model:

Yt | αt , xt ∼ Pois(exp(xtT ββββ

++++

αt ))

αt = φ αt-1+ εt , {εt}~IID N(0, σ2)

Relative Likelihood: Let ψ=(ββββ,,,, φ, σ2) and suppose g(yn, ααααn; ψ0) is an approximating joint density for Yn= (Y1, . . . , Yn)' and ααααn= (α1, . . . , αn)'.

nnnnn

nnn

g

nnnnnn

nnn

nnnnn

nnn

nnnn

dgg

ppLL

dggg

pp

dgg

pp

dppL

αyααy

ααy

αyyααy

ααy

ααyαy

ααy

αααy

ψψ

=ψψ

ψψψ

=

ψψ

=

);|();,(

)()|()(

)(

);();|();,(

)()|(

);,();,(

)()|(

)()|()(

000

000

00

Page 16: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

20

Importance Sampling (cont)

where

,);,(

)()|(1~

);|);,(

)()|(

);|();,(

)()|()(

)(

1 0)(

)()(

00

000

= ψ

ψ

ψ=

ψψ

=ψψ

N

jj

nn

jn

jnn

nnn

nnng

nnnnn

nnn

g

gpp

N

gppE

dgg

ppLL

αyααy

yαy

ααy

αyααy

ααy

).;|g( iid ~},...,1;{ 0)( ψ= nn

jn Nj yαα

.ψ̂

Notes:

• This is a “one-sample” approximation to the relative likelihood. That is, for one realization of the αααα’s, we have, in principle, an approximation to the whole likelihood function.

• Approximation is only good in a neighborhood of ψ0. Geyer suggests maximizing ratio wrt ψ and iterate replacing ψ0 with

Page 17: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

21

Importance Sampling — example

phi_0=-0.9

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

200

250

300

phi_0=-0.367

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

220

240

260

280

phi_0=0.029

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

180

220

260

phi_0=0.321

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

150

200

250

Simulation example: Yt | αt ∼ Pois(exp(.7 ++++

αt )),

αt = .5 αt-1+ εt , {εt}~IID N(0, .3), n = 200, N = 1000

Page 18: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

22

Simulation example: Yt | αt ∼ Pois(exp(.7 ++++

αt )), φ= .5, σ2=.3, n = 200, N = 1000

phi_0=-0.9

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

200

250

300

phi_0=-0.367

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

220

260

phi_0=0.029

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

180

220

260

phi_0=0.321

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

200

phi_0=0.523

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

200

phi_0=0.522

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

200

phi_0=0.514

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

200

phi_0=0.552

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

100

200

phi_0=0.503

likel

ihoo

d-1.0 -0.5 0.0 0.5 1.0

100

200

Page 19: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

23

Importance Sampling (cont)

Choice of importance density g:

Durbin and Koopman suggest a linear state-space approximating model

Yt = µt+ xtT ββββ

+ αt+Zt , Zt~N(0,Ht),

with

where the are calculated recursively under the approximating model until convergence.

),,~(~);|g( 110

−− ΓΓψ nnnnn N yyα

,

,1'ˆ )'ˆ(

)'ˆ(t

βx

βxxtt

tt

eH

eyy

t

tttt+α−

+α−

=

+−α−=µ

)|(ˆ ntgt E yα=α

.))'(()(diag

,ˆ~ 1ˆX

ˆXˆX

−+

++

+=Γ

+−=

nnn

nnn

Ee

een

nn

αα

αyyαβ

αβαβ

With this choice of approximating model, it turns out that

where

Page 20: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

24

Importance Sampling (cont)

Components required in the calculation.

• g(yn,ααααn)

• simulate from

� compute

� simulate from

nnn yy ~'~ 1−Γ

)det( nΓ

),~( 11 −− ΓΓ nnnN y

nn y~1−Γ

),( 1−ΓnN 0

Page 21: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

25

Importance Sampling (cont)

Details.

This is the covariance function of a 1-dependent sequence, so thatwhere

.

1000

0100010001

1,2

1,1

θθ

=

�����

nC

.

1000

01001001

)(diag

2

2

2

φ+

φ+φ−φ−φ+φ−

φ−

σ+=Γ −+

�����

βα Xn e

,'nnnn CDC=Γ

φ+

φ+φ−φ−φ+φ−

φ−

σ= −−

2

2

2

21

1000

01001001

))'((

�����

nnE αα

Page 22: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

26

Importance Sampling (cont)

It follows that

and

which can be solved for the vector via the recursion

12

1

1 /)~̂~(~'~ −=

− −=Γ ∑ tt

n

ttnnn vyyyy

))~̂~(('

)~̂~('~

11

1111

nnnn

nnnnnnnn

DC

CCDC

yy

yyy

−=

−=Γ−−

−−−−

nn y~ 1−Γ

1( , )nN −Γ0

,' 11nnnn DC ZU −−=

. 1−Γn

).~̂~(~' 11nnnnnn DC yyy −=Γ −−

All of these calculations can be carried out quickly using the innovations algorithm.

To simulate from note that

where Zn ~ N(0,1), has covariance matrix

Page 23: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

27

Importance Sampling — example

Simulation example: β = .7, φ= .5, σ2=.3, n = 200, N = 1000, 50 realizations plotted

phi_0=-0.5

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

180

220

260

300

phi_0=-0.25

likel

ihoo

d-1.0 -0.5 0.0 0.5 1.0

200

220

240

260

280

phi_0=0

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

160

180

200

220

240

260

280

phi_0=0.25

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

150

200

250

phi_0=0.5

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.0

5010

015

020

025

0

phi_0=0.75

likel

ihoo

d

-1.0 -0.5 0.0 0.5 1.050

100

150

200

250

Page 24: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

28

Conditional density function:

which, by expanding the term, in a neighborhood of ααααn*, and ignoring

third-order + terms yields the approximation

}.2/*))((diag*)(21

)*()X((exp{)|(

X*

*)X*(a

nnTnnn

Tnn

XTnnn

Tnnn

Ge

eep

n

nnT

αααααα

ααβαyyα

βα

βαβα1

−−−+

−+−+−∝

+

++

Estimation Methods — Approximation to the likelihood

Joint density function:

where

},2/)X((exp{!

)det(),( )X(

1

2/1

nnTnn

Tnn

tt

nn Gey

Gp nT

ααβαyαy βα1 −−+−∝ +

=∏

).( 1n

Tnn EG αα=−

},2/exp{)|( )X(nn

Tnn

Tnnn Gep n

T

αααyyα βα1 −−−∝ +

)( βα1 XnT

e +

Page 25: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

29

Approximate likelihood:

(component-wise multiplication for vectors)

Estimation Methods — Approximation to the likelihood

After simplification, we find

),~(~

}.2/))((diag)(21

)()X((exp{)|(

11

*X*

X*)X(a

*

**

−−

+

++

ΓΓ

−−−+

−+−+−∝

nnn

nnTnnn

Tnn

Tnnn

Tnnn

N

Ge

eep

n

nnT

y

αααααα

ααβαyyα

βα

βαβα1

***

12/1

2/1

}Xexp{}exp{}exp{}Xexp{~

},~~5.Xexp{)det()det(

)|(),();(

nnnnn

nnTn

Tn

n

n

nna

nnna

Gppp

αβααβyy

yyβyyααyy

+−=

Γ+Γ

∝=ψ −

Note: We actually expand the joint density for Yn and ααααn in a neighborhood of αααα*.

Page 26: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

30

Estimation Methods — Approximation to the likelihood

Implementation:

1. Let αααα∗ = αααα∗ (ψ) be the converged value of αααα(j) (ψ) , where

2.2.2.2.

Maximize with respect to ψ.

)(~)( 1)1( ψΓ=ψ −+nn

j yα

);( ψnap y

Page 27: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

31

Model: Yt | αt ∼ Pois(exp(.7 ++++

αt )), αt = .5 αt-1+ εt , {εt}~IID N(0, .3), n = 200

Estimation methods:

• Importance sampling (N=1000, ψ0 updated a maximum of 10 times )

beta phi sigma2

mean 0.6982 0.4718 0.3008

std 0.1059 0.1476 0.0899

Simulation Results

• Approximation to likelihood

beta phi sigma2

mean 0.7036 0.4579 0.2962

std 0.0951 0.1365 0.0784

Page 28: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

32

Model: Yt | αt ∼ Pois(exp(.7 ++++

αt )), αt = .5 αt-1+ εt , {εt}~IID N(0, .3), n = 200

0.4 0.6 0.8 1.0

01

23

4

beta

dens

ity

-0.2 0.0 0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

phide

nsity

0.1 0.2 0.3 0.4 0.5 0.6

01

23

45

sigma^2

dens

ity

Approx likelihood

0.4 0.6 0.8 1.0

01

23

4

beta

dens

ity

-0.2 0.0 0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

phi

dens

ity

0.1 0.2 0.3 0.4 0.5 0.60

12

34

sigma^2

dens

ity

Importance Sampling

Page 29: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

33

Approx Like Simulation

0.202 0.210 0.343 -2.690 -2.720 3.4150.113 0.111 0.123-0.454 -0.454 0.1430.396 0.400 0.1140.016 0.012 0.1100.845 0.764 0.1650.104 0.114 0.075

ISˆ β Mean SD

Model for {αt}:αt = φαt-1+εt , {εt}~IID N(0, σ2).

• Importance sampling ( ψ0 updated 5 times for each N=100, 500, 1000, )• Simulation based on 1000 replications and the fitted AL model.

Application to Model Fitting for the Polio Data

Import Sampling Simulation

Intercept 0.203 0.223 0.381Trend(×10-3) -2.675 -2.778 3.979cos(2πt/12) 0.110 0.103 0.124sin(2πt/12) -0.456 -0.456 0.151cos(2πt/6) 0.399 0.401 0.123sin(2πt/6) 0.015 0.024 0.118 φ 0.865 0.777 0.198 σ2 0.088 0.100 0.068

ALˆ β Mean SD GLM

ˆ β SD

GLM

.207 0.078-4.18 1.400-.152 0.097-.532 0.109.169 0.098

-.432 0.101

Page 30: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

34

-15 -10 -5 0 5

0.0

0.04

0.08

0.12

beta1

dens

ity

-0.2 0.2 0.4 0.6 0.8 1.00

12

34

phi

dens

ity

0.0 0.1 0.2 0.3 0.4

02

46

sigma^2

dens

ity

Application to Model Fitting for the Polio Data (cont)

Approx Likelihood

-15 -10 -5 0 5

0.0

0.02

0.04

0.06

0.08

0.10

beta1

dens

ity

-0.2 0.2 0.4 0.6 0.8 1.0

01

23

4

phi

dens

ity

0.0 0.1 0.2 0.3 0.4

02

46

simga^2

dens

ity

Importance Sampling

Page 31: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

35Year

Cou

nts

1970 1972 1974 1976 1978 1980 1982 1984

02

46

8Polio Data: observed and conditional mean (approx like)

cond meanobserved

Page 32: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

36

Application to Sydney Asthma Count Data

Data: Y1, . . . , Y1461 daily asthma presentations in a Campbelltown hospital.

Preliminary analysis identified.

• no upward or downward trend

• annual cycle modeled by cos(2πt/365), sin(2πt/365)

• seasonal effect modeled by

where B(2.5,5) is the beta function and Tij is the start of the jth school term in year i.

• day of the week effect modeled by separate indicatorvariables for Sunday and Monday (increase in admittance on these days compared to Tues-Sat).

• Of the meteorological variables (max/min temp, humidity)and pollution variables (ozone, NO, NO2), only humidity at lags of 12-20 days and NO2(max) appear to have an association.

55.2

1001

100)5,5.2(1)(

−−

−= ijij

ij

TtTtB

tP

Page 33: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

37

Results for Asthma Data—(IS & AL)

Term IS Intercept 0.590Sunday effect 0.138Monday effect 0.229cos(2πt/365) -0.218 sin(2πt/365) 0.200Term 1, 1990 0.188Term 2, 1990 0.183Term 1, 1991 0.080Term 2, 1991 0.177Term 1, 1992 0.223Term 2, 1992 0.243Term 1, 1993 0.379Term 2, 1993 0.127Humidity Ht/20 0.009NO2 max -0.125 AR(1), φ 0.385σ2 0.053

AL Mean SD0.591 0.593 .06580.138 0.139 .05310.231 0.230 .0495

-0.218 -0.217 .04150.179 0.181 .04370.198 0.194 .06380.130 0.129 .06640.075 0.070 .07330.164 0.157 .06650.221 0.214 .06670.239 0.237 .06200.397 0.394 .06250.111 0.108 .06820.010 0.007 .0032

-0.107 -0.108 .03470.788 0.468 .37900.010 0.018 .0153

Page 34: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

38

Asthma Data: observed and conditional mean cond meanobserved

1990

Day of Year

Cou

nts

0 100 200 300

02

46

1991

Day of Year

Cou

nts

0 100 200 300

02

46

8

1992

Day of Year

Cou

nts

0 100 200 300

02

46

810

1214

1993

Day of Year

Cou

nts

0 100 200 300

02

46

810

Page 35: Nonlinear State - Columbia Universityrdavis/lectures/IBM_02.pdf · ) = exp(x t T β β)Ε exp(α t) = exp(x t T β β), if Ε exp(α t) = 1. Disadvantages: • estimation is difficult-likelihood

39

Summary Remarks

1. Importance sampling offers a nice clean method for estimation in parameter driven models.

2. The innovations algorithm allows for quick implementation of importance sampling. Extends easily to higher-order AR structure.

3. Relative likelihood approach is a one-sample based procedure.

4. Approximation to the likelihood is a non-simulation based procedure which may have great potential especially with large sample sizes and/or large number of explanatory variables. .


Recommended