+ All Categories
Home > Documents > Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite...

Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite...

Date post: 23-Mar-2018
Category:
Upload: haphuc
View: 220 times
Download: 2 times
Share this document with a friend
64
Infinite Structured Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar, and Jonathan Huggins Columbia University November, 2011 Wood (Columbia University) ISEDHMM November, 2011 1 / 57
Transcript
Page 1: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite Structured Explicit DurationHidden Markov Models

Frank Wood

Joint work with Chris Wiggins, Mike Dewar, and Jonathan Huggins

Columbia University

November, 2011

Wood (Columbia University) ISEDHMM November, 2011 1 / 57

Page 2: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Hidden Markov Models

Hidden Markov models (HMMs) [Rabiner, 1989] are an important tool fordata exploration and engineering applications.

Applications include

Speech recognition [Jelinek, 1997, Juang and Rabiner, 1985]

Natural language processing [Manning and Schutze, 1999]

Hand-writing recognition [Nag et al., 1986]

DNA and other biological sequence modeling applications [Kroghet al., 1994]

Gesture recognition [Tanguay Jr, 1995, Wilson and Bobick, 1999]

Financial data modeling [Ryden et al., 1998]

. . . and many more.

Wood (Columbia University) ISEDHMM November, 2011 2 / 57

Page 3: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Graphical Model: Hidden Markov Model

πm

θm

y1

y2 yT

y3K

z0 z1 z2 z3 zT

Wood (Columbia University) ISEDHMM November, 2011 3 / 57

Page 4: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Notation: Hidden Markov Model

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

A =

...

......

π1 · · · πm · · · πK...

......

Wood (Columbia University) ISEDHMM November, 2011 4 / 57

Page 5: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

HMM: Typical Usage Scenario (Character Recognition)

Training data: multiple “observed” yt = vt , ht sequences of styluspositions for each kind of character

Task: train |Σ| different models, one for each character

Latent states: sequences of strokes

Usage: classify new stylus position sequences using trained modelsMσ = Aσ,Θσ

P(Mσ|y1, . . . , yT ) ∝ P(y1, . . . , yT |Mσ)P(Mσ)

Wood (Columbia University) ISEDHMM November, 2011 5 / 57

Page 6: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Shortcomings of Original HMM Specification

Latent state dwell times are not usually geometrically distributed

P(zt = m, . . . , zt+L = m|A)

=L∏`=1

P(zt+`+1 = m|zt+` = m,A)

= Geometric(L; πm(m))

Prior knowledge often suggests structural constraints on allowabletransitions

The state cardinality of the latent Markov chain K is usually unknown

Wood (Columbia University) ISEDHMM November, 2011 6 / 57

Page 7: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Explicit Duration HMM / H. Semi-Markov Model

[Mitchell et al., 1995, Murphy, 2002, Yu and Kobayashi, 2003, Yu, 2010]

Latent state sequence z = (s1, r1, . . . , sT , rT)Latent state id sequence s = (s1, . . . , sT )

Latent “remaining duration” sequence r = (r1, . . . , rT )

State-specific duration distribution Fr (λm)

Other distributions the same

An EDHMM transitions between states in a different way than does atypical HMM. Unless rt = 0 the current remaining duration is decrementedand the state does not change. If rt = 0 then the EDHMM transitions to astate m 6= st according to the distribution defined by πst

Wood (Columbia University) ISEDHMM November, 2011 7 / 57

Page 8: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

EDHMM notation

Latent state zt = st , rt is tuple consisting of state identity and time leftin state.

st |st−1, rt−1 ∼

I(st = st−1), rt−1 > 0

Discrete(πst−1), rt−1 = 0

rt |st , rt−1 ∼

I(rt = rt−1 − 1), rt−1 > 0

Fr (λst ), rt−1 = 0

yt |st ∼ Fθ(θst )

Wood (Columbia University) ISEDHMM November, 2011 8 / 57

Page 9: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

EDHMM: Graphical Model

λm

πm

θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3K

Wood (Columbia University) ISEDHMM November, 2011 9 / 57

Page 10: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured HMMs: i.e. left-to-right HMM [Rabiner, 1989]

Example: Chicken pox

Observations vital signs

Latent states pre-infection, infected, post-infectiona

State transition structure can’t go from infected to pre-infection

adisregarding shingles

Structured transitions imply zeros in the transition matrix A, i.e.

p(st = m|st−1 = `) = 0 ∀ m < `

Wood (Columbia University) ISEDHMM November, 2011 10 / 57

Page 11: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bayesian HMM

We will put a prior on parameters so that we can effect a solutionthat conforms to our ideas about what the solution should look like

Structured prior examples

Ai,j = 0 (hard constraints)Ai,j ≈

Pj Ai,j (rich get richer)

Regularization means that we can specify a model with moreparameters than could possibly be needed

infinite complexity (i.e. K →∞) avoids many model selection problems“extra” states can be thought of as auxiliary or nuisance variables

Wood (Columbia University) ISEDHMM November, 2011 11 / 57

Page 12: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bayesian HMM

πm

Hy θm

y1

y2 yT

y3K

z0 z1 z2 z3 zTHz

πm ∼ Hz

θm ∼ Hy

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

Wood (Columbia University) ISEDHMM November, 2011 12 / 57

Page 13: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite HMMs (IHMM) [Beal et al., 2002, Teh et al., 2006]

πm

θm

y1

y2 yT

y3∞

z0 z1 z2 z3 zT

c0

γ

c1

Hy

K →∞

Wood (Columbia University) ISEDHMM November, 2011 13 / 57

Page 14: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Other HMM variants

Sticky Infinite HMMs [Fox et al., 2011]

Extra parameter per state used to bias towards self-transition

Hierarchical HMMs [Murphy, 2001]

State is hierarchical (i.e. sequence of letters composed of strokesequences)

Factorial HMMs [Ghahramani and Jordan, 1996]

Infinite explicit duration HMM [Johnson and Willsky, 2010]

No generative model. Latent history sampling used to assert theexistence of an implicitly defined IEDHMM that can be sampled fromby rejecting HDP-HMM samples that violate transition constraints.

Wood (Columbia University) ISEDHMM November, 2011 14 / 57

Page 15: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite Structured Explicit Duration HMM (ISEDHMM)

Generative framework for HMMs with

Explicitly parameterized duration distributions

Structured transition priors

Countable state cardinality

Fundamental Problems

How to generate structured, dependent, infinite-dimensional transitiondistributions.

How to do inference in HMMs with countable state cardinality andcountable duration distribution support.

[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 15 / 57

Page 16: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Graphical Model

c0 Hr

γ λm

c1 πm

Hy θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3∞

Structured, dependent, infinite-dimensional transition distributions?

Wood (Columbia University) ISEDHMM November, 2011 16 / 57

Page 17: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Poisson process [Kingman, 1993]

Gamma process [Kingman, 1993]

SNΓPs [Rao and Teh, 2009]

Structured, dependent, infinite-dimensional transition distributions[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 17 / 57

Page 18: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Gamma process [Kingman, 1993] as Poisson process overΘ⊗ V ⊗ [0,∞) with rate / mean measure

µ(Θ, V , S) = α(Θ, V )

∫Sγ−1e−γ

A draw from a Gamma process with

α(Θ, V ) = c0Hθ(Θ)Hv (V ).

[Kingman, 1993] has the form

G =∞∑

m=1

γmδ(θm,vm)

where (θm, vm) ∼ Hθ × Hv .

Wood (Columbia University) ISEDHMM November, 2011 18 / 57

Page 19: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Non-disjoint “restricted projections” of Gamma processes aredependent Gamma processes (SNΓPs) [Rao and Teh, 2009]

Hv = Geom(p)

0v0

R0

1v1

R1

2v2

R2

3v3

R3

4v4

R4

5v5

R5

6v6

R6

7v7

8v8

R+

9v9

. . .

. . .

. . .

M

R4

G0 =∑m 6=0

. . . , · · · ,G4 =∑m 6=4

γmδθm , · · ·

Wood (Columbia University) ISEDHMM November, 2011 19 / 57

Page 20: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Normalized dependent ΓP draws are dependent Dirichlet processdraws.1 In the ISEDHMM, DP draws are the dependent, structured,infinite-dimensional transition distributions

D4 =G4

G4(Θ)(1)

=

∑m 6=4 γmδθm∑

θ∈Θ

∑m′ 6=4 γm′δθm′

(2)

=

∑m 6=4 γmδθm∑m′ 6=4 γm′

(3)

=∑m 6=4

γm∑m′ 6=4 γm′

δθm

1a draw from a Dirichlet process [Ferguson, 1973] is an infinite sum of weightedatoms [Sethuraman, 1994] where the weights sum to one.

Wood (Columbia University) ISEDHMM November, 2011 20 / 57

Page 21: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

GRM GR1 GR2 GR3

G2, D2G1, D1 G3, D3 GM , DM

D2, 2D1, 1 D3, 3 DM , M

c0

c1

G = GR+

Structured, dependent, infinite dimensional transition distributionsπm can be formed from draws from DDPs [Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 21 / 57

Page 22: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM Inference: Beam Sampling

We employ the forward-filtering, backward slice-sampling approach for theIHMM of [Van Gael et al., 2008] and EDHMM of [Dewar, Wiggins and W,2011], in which the state and duration variables s and r are sampledconditioned on auxiliary slice variables u.

Net result: efficient, always finite forward backward procedure for samplinglatent states.

Wood (Columbia University) ISEDHMM November, 2011 22 / 57

Page 23: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 24: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 25: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 26: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Slice Sampling: A very useful auxiliary variable sampling trick

Unreasonable Pedagogical Example:

x |λ ∼ Poisson(λ) (countable support)

enumeration strategy for sampling x (impossible)

auxiliary variable u with P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

Note: Marginal distribution of x is

P(x |λ) =∑u

P(x , u|λ)

=∑u

P(x |λ)P(u|x , λ)

=∑u

P(x |λ)I(0 ≤ u ≤ P(x |λ))

P(x |λ)

=∑u

I(0 ≤ u ≤ P(x |λ)) = P(x |λ)

Wood (Columbia University) ISEDHMM November, 2011 24 / 57

Page 27: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Slice Sampling: A very useful auxiliary variable sampling trick

This suggests a Gibbs sampling scheme: alternately sampling from

P(x |u, λ) ∝ I(u ≤ P(x |λ))

finite support, uniform above slice, enumeration possible

P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

uniform between 0 and y = P(x |λ)

then discarding the u values to arrive at x samples marginally distributedaccording to P(x |λ).

Wood (Columbia University) ISEDHMM November, 2011 25 / 57

Page 28: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM Inference: Beam Sampling

Forward-backward slice sampling only has to consider a finite number ofsuccessor states at each timestep. With auxiliary variables

p(ut |zt , zt−1) =I(ut < p(zt |zt−1))

p(zt |zt−1)

and

p(zt |zt−1) = p((st , rt)|(st−1, rt−1))

=

rt−1 > 0, I(st = st−1)I(rt = rt−1 − 1)

rt−1 = 0, πst−1st Fr (rt ;λst ).

one can run standard forward-backward conditioned on u’s.

Wood (Columbia University) ISEDHMM November, 2011 26 / 57

Page 29: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Results

To illustrate IEDHMM learning on synthetic data, five hundred datapointswere generated using a 4 state EDHMM with Poisson durationdistributions

λ = (10, 20, 3, 7)

and Gaussian emission distributions with means

µ = (−6,−2, 2, 6)

all unit variance.

Wood (Columbia University) ISEDHMM November, 2011 27 / 57

Page 30: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data Results

0 100 200 300 400 500−10

−5

0

5

10

time

emission/mean

4 5 6 70

500

1000

number of states

coun

t

number of states

coun

ts

Wood (Columbia University) ISEDHMM November, 2011 28 / 57

Page 31: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data, State Duration Parameter Posterior

0 5 10 15 20 250

100

200

300

duration rate

coun

t

Wood (Columbia University) ISEDHMM November, 2011 29 / 57

Page 32: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data, State Mean Posterior

−10 −5 0 5 100

200

400

600

emission mean

coun

t

Wood (Columbia University) ISEDHMM November, 2011 30 / 57

Page 33: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Nanoscale Transistor Spontaneous Voltage Fluctuation

0 1000 2000 3000 4000 5000−3

−2

−1

0

1

2

time

emission/mean

Wood (Columbia University) ISEDHMM November, 2011 31 / 57

Page 34: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM vs. IHMM: Modeling the Morse Code Cepstrum

Wood (Columbia University) ISEDHMM November, 2011 32 / 57

Page 35: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Wrap-Up

Novel Gamma process construction for dependent, structured, infinitedimensional HMM transition distributions.

Other transition distribution structures (left-to-right) can beimplemented simply by changing “restricted projection regions.”

ISEDHMM framework generalizes the HMM, Bayesian HMM, infiniteHMM, left-to-right HMM, explicit duration HMM, and more.

Wood (Columbia University) ISEDHMM November, 2011 33 / 57

Page 36: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Future Work

Generalize to spatial prior on HMM states (“location”)

Simultaneous location and mappingProcess diagram modeling for systems biology

Applications; seeking “users”

Wood (Columbia University) ISEDHMM November, 2011 34 / 57

Page 37: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Questions?

Thank you!

Wood (Columbia University) ISEDHMM November, 2011 35 / 57

Page 38: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography I

M J Beal, Z Ghahramani, and C E Rasmussen. The Infinite HiddenMarkov Model. In Advances in Neural Information Processing Systems,pages 29–245, March 2002.

M Dewar, C Wiggins, and F Wood. Inference in hidden Markov modelswith explicit state duration distributions. In Submission, 2011.

T.S. Ferguson. A Bayesian analysis of some nonparametric problems. TheAnnals of Statistics, 1(2):209–230, 1973.

E B Fox, E B Sudderth, M I Jordan, and A S Willsky. A StickyHDP-HMM with Application to Speaker Diarization. Annals of AppliedStatistics, 5(2A):1020–1056, 2011.

Z Ghahramani and M I Jordan. Factorial Hidden Markov Models. MachineLearning, 29:245–273, 1996.

J. Huggins and F. Wood. Infinite structured explicit duration hiddenMarkov models. In Under Review, 2012.

Wood (Columbia University) ISEDHMM November, 2011 36 / 57

Page 39: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography II

F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

M Johnson and A S Willsky. The hierarchical Dirichlet process hiddensemi-Markov model. In Proceedings of the Twenty-Sixth ConferenceAnnual Conference on Uncertainty in Artificial Intelligence (UAI-10),pages 252–259, 2010.

B H Juang and L R Rabiner. Mixture Autoregressive Hidden MarkovModels for Speech Signals. IEEE Transactions on Acoustics, Speech,and Signal Processing, 33(6):1404–1413, 1985.

J F C Kingman. Poisson Processes. Oxford Studies in Probability. OxfordUniversity Press, 1993.

A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. HiddenMarkov models in computational biology: Applications to proteinmodelling . Journal of Molecular Biology, 235:1501–1531, 1994.

C. Manning and H. Schutze. Foundations of statistical natural languageprocessing. MIT Press, Cambridge, MA, 1999.

Wood (Columbia University) ISEDHMM November, 2011 37 / 57

Page 40: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography III

C Mitchell, M Harper, and L Jamieson. On the complexity of explicitduration HMM’s. IEEE Transactions on Speech and Audio Processing, 3(3):213–217, 1995.

K P Murphy. Hierarchical HMMs. Technical report, University CaliforniaBerkeley, 2001.

K P Murphy. Hidden semi-markov models (HSMMs). Technical report,MIT, 2002.

R. Nag, K. Wong, and F. Fallside. Script regonition using hidden Markovmodels. In ICASSP86, pages 2071–2074, 1986.

L R Rabiner. A tutorial on hidden Markov models and selected applicationsin speech recognition. In Proceedings of the IEEE, pages 257–286, 1989.

V Rao and Y W Whye Teh. Spatial Normalized Gamma Processes. InAdvances in Neural Information Processing Systems, pages 1554–1562,2009.

Wood (Columbia University) ISEDHMM November, 2011 38 / 57

Page 41: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography IV

T. Ryden, T. Terasvirta, and S. rAsbrink. Stylized facts of daily returnseries and the hidden markov model. Journal of Applied Econometrics,13(3):217–244, 1998.

J. Sethuraman. A constructive definition of Dirichlet priors. StatisticaSinica, 4:639–650, 1994.

D.O. Tanguay Jr. Hidden Markov models for gesture recognition. PhDthesis, Massachusetts Institute of Technology, 1995.

Y W Teh, M I Jordan, M J Beal, and D M Blei. Hierarchical DirichletProcesses. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

J Van Gael, Y Saatci, Y W Teh, and Z Ghahramani. Beam sampling forthe infinite hidden Markov model. In Proceedings of the 25thInternational Conference on Machine Learning, pages 1088–1095. ACM,2008.

Wood (Columbia University) ISEDHMM November, 2011 39 / 57

Page 42: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography V

A.D. Wilson and A.F. Bobick. Parametric hidden Markov models forgesture recognition. Pattern Analysis and Machine Intelligence, IEEETransactions on, 21(9):884–900, 1999.

S Yu and H Kobayashi. An Efficient Forward–Backward Algorithm for anExplicit-Duration Hidden Markov Model. Signal Processing letters, 10(1):11–14, 2003.

S Z Yu. Hidden semi-Markov models. Artificial Intelligence, 174(2):215–243, 2010.

Wood (Columbia University) ISEDHMM November, 2011 40 / 57

Page 43: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM vs. IHMM

02

46

0

5

100

0.5

1

log(1 + duration KL)emission KL

AMI d

iffer

ence

00.5

12468

−0.6

−0.4

−0.2

0

0.2

0.4

log(1 + duration KL)emission KL

AMI d

iffer

ence

Left: data from Poisson duration distribution ISEDHMM, Right: data from IHMM fit with Poisson duration distribution

ISEDHMM

Wood (Columbia University) ISEDHMM November, 2011 41 / 57

Page 44: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Poisson Process [Kingman, 1993]

Poisson process can be defined by the requirement that the randomvariables defined as the counts of the number of “events” inside each of anumber of non-overlapping finite sub-regions of some space should eachhave a Poisson distribution and should be independent of each other.

ΩA

B

C

D

E

N(A) ∼ Poisson(µ(A))

Wood (Columbia University) ISEDHMM November, 2011 42 / 57

Page 45: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Gamma Process as Poisson Process [Kingman, 1993]

Define a Poisson process over the product space Θ⊗ [0,∞) with meanmeasure

µ(Θ, S) = α(Θ)

∫Sγ−1e−γ

where Θ ∈ Ω and S ⊂ [0,∞). A draw from this Poisson process yields acountably infinite set of pairs (θn, γn)n≥1, which can be used to form anatomic random measure

G =∑n≥1

γnδθn .

Wood (Columbia University) ISEDHMM November, 2011 43 / 57

Page 46: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Gamma Process

This discrete measure is draw from G ∼ ΓP(α) (from previous slide)

G =∑n≥1

γnδθn .

A ΓP can be thought of as an unnormalized DP.

Wood (Columbia University) ISEDHMM November, 2011 44 / 57

Page 47: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Dirichlet Processes

D = G/G (Θ) is a sample from a DP with base measure α

D ∼ DP(α).

A draw from a Dirichlet process is an infinite mixture of weighted, discreteatoms.

For the ISEDHMM

each atom is a next state (there are a countably infinite number ofsuch states)

each atoms’ weight is the probability of transitioning to that state

there will be a countably infinite number of such transitiondistributions

Wood (Columbia University) ISEDHMM November, 2011 45 / 57

Page 48: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 49: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 50: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 51: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

SNΓP construction of IEDHMM transition distributions

GRM GR1 GR2 GR3

G2, D2G1, D1 G3, D3 GM , DM

D2, 2D1, 1 D3, 3 DM , M

c0

c1

G = GR+

[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 47 / 57

Page 52: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

To review SNΓPs formally, let V be an arbitrary auxiliary space. In general,one can think of this as a covariate, time, or an index. For V ⊂ V, let

α(Θ,V ) = c0Hθ(Θ)Hv (V )

be the base measure for a gamma process G = ΓP(α) defined over theproduct space Θ⊗ V. Here c0 is a concentration parameter and Hθ andHv are probability measures. We will refer to G as the “base ΓP”.

Wood (Columbia University) ISEDHMM November, 2011 48 / 57

Page 53: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 54: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 55: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 56: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Normalizing Gm yields a draw Dm = Gm/Gm(Θ) distributed according to aDirichlet process with base measure αm

Dm ∼ DP(αm).

Dm(Θ) is not in general independent of Dm′(Θ) because they can shareatoms from G .

Wood (Columbia University) ISEDHMM November, 2011 50 / 57

Page 57: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM [Huggins and W, 2011] (an example ISEDHMM)

Recall the gamma process G = ΓP(α) with base measure

α(Θ,V ) = c0Hθ(Θ)Hv (V ).

Draws G ∼ G are measures over the parameter and covariate productspace Θ⊗ V with the form

G =∞∑

m=1

γmδ(θm,vm)

where (θm, vm) ∼ Hθ × Hv .

In the IEDHMM θm = λm, θm is the duration parameter and outputdistribution parameters for state m.

For pedagogical expediency let T = V = 0, 1, 2, 3, 4, . . . then vm ∈ N.

Wood (Columbia University) ISEDHMM November, 2011 51 / 57

Page 58: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: “spatial” regions for restrictions of base ΓP

Rm = vm m ∈ TR+ = V \ VR = Rm : m ∈ T ∪ R+Rm = R \ Rm m ∈ T

Wood (Columbia University) ISEDHMM November, 2011 52 / 57

Page 59: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Geometric base distribution over states example

Hv = Geom(p)

0v0

R0

1v1

R1

2v2

R2

3v3

R3

4v4

R4

5v5

R5

6v6

R6

7v7

8v8

R+

9v9

. . .

. . .

. . .

M

R4

Wood (Columbia University) ISEDHMM November, 2011 53 / 57

Page 60: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: SNΓP restrictions

From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have

GR ∼ ΓP(αR)

DR ∼ DP(αR).

In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is

αRm(Θ) = c0δθm(Θ)Hv (Rm),

while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).

Wood (Columbia University) ISEDHMM November, 2011 54 / 57

Page 61: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: SNΓP restrictions

From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have

GR ∼ ΓP(αR)

DR ∼ DP(αR).

In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is

αRm(Θ) = c0δθm(Θ)Hv (Rm),

while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).

Wood (Columbia University) ISEDHMM November, 2011 54 / 57

Page 62: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: From restricted ΓPs to DPs

A highly dependent transition distribution for state m is a DP draw Dm

Dm =Gm

Gm(Θ)=

∑R∈Rm

GR∑R′∈Rm

GR′(Θ)

=∑

R∈Rm

GR(Θ)∑R′∈Rm

GR′

GR

GR(Θ)

=∑

R∈Rm

GR(Θ)∑R′∈Rm

GR′DR

Wood (Columbia University) ISEDHMM November, 2011 55 / 57

Page 63: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Mixture of DP construction for transition distributions

DRm ∼ DP(αRm)

DR+ ∼ DP(αR+)

γm ∼ Gamma(c0Hv (Rm), 1)

γ+ ∼ Gamma(c0Hv (R+), 1)

βmk =I(m 6= k)γk

γ+ +∑M

k ′ 6=m γk ′

βm+ =γ+

γ+ +∑M

k ′ 6=m γk ′

Dm =M∑

k 6=m

βmkDRk+ βm+DR+ .

Wood (Columbia University) ISEDHMM November, 2011 56 / 57

Page 64: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Relaxing the dependence hierarchically

These dependent DPs are the base dist.’s for conditional state transitiondistributions Dm ∼ DP(c1Dm). With βm = (βm1, . . . , βmM , βm+)

πm ∼ Dirichlet(c1βm),

Dm =M∑

k=1

πmkδθm + πm+DR+

where c1 is a concentration parameter and πm = (πm1, . . . , πmM , πm+).

The conditional state transition probability row vector πm is finite, sinceprobabilities of transitioning to new states have been merged into a singleprobability πm+ =

∑∞k=M+1 πmk . This “bin” is dynamically split and

joined at inference time.

Wood (Columbia University) ISEDHMM November, 2011 57 / 57


Recommended