+ All Categories
Home > Documents > Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning...

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning...

Date post: 14-Dec-2015
Category:
Upload: dulcie-pierce
View: 217 times
Download: 1 times
Share this document with a friend
19
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei, NIPS 2004
Transcript
Page 1: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Duke University Machine Learning Group

Presented by Kai Ni

February 17, 2006

Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei, NIPS 2004

Page 2: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Outline

• Motivation

• Dirichlet Processes (DP)

• Hierarchical Dirichlet Processes (HDP)

• Infinite Hidden Markov Model (iHMM)

• Results & Conclusions

Page 3: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Motivation

• Problem – “multi-task learning” in which the “tasks” are clustering problems.

• Goal – Share clusters among multiple, related clustering problems. The number of clusters are open-ended and inferred automatically by the model.

• Application– Genome pattern analysis

– Information retrieval of corpus

Page 4: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Hierarchical Model

• A single clustering problem can be analyzed as a Dirichlet process (DP). –

– Draws G from DP are discrete, generally not distinct.

• For J groups, we consider Gj for j=1~J is a group-specific DP.

• To share information, we link the group-specific DPs– If G(τ) is continuous, the draws Gj

have no atoms in common with probability one.

– HDP solution: G0 is itself a draw from a DP(, H)

0 01

~ DP( , ) kk

k

G G G

0 01

~ DP( , ) jkj j j j jk

k

G G G

0 0~ DP( , ( ))jG G

Page 5: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Dirichlet Process & Hierarchical Dirichlet Process

• Three different perspectives– Stick-breaking– Chinese restaurant– Infinite mixture models

• Setup

• Properties of DP– – –

0 0

0 0 0 0

DP HDP

~ ( , ) | , ~ ( , )

| , ~ ( , )j

G DP G G H DP H

G G DP G

Page 6: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Stick-breaking View

• A mathematical explicit form of DP. Draws from DP are discrete.

• In DP

• In HDP

1

~ ( ), ~ 0

withkk

k

Stick Gk

G

01

~ ( )

~

kkk

Stick

Hk

G

1

~ ( , )0

~ 0

kj jkk

DP

Gk

G

πj

Page 7: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

DP – Chinese Restaurant Process

• Exhibit clustering property

• Φ1,…,Φi-1, i.i.d., r.v., distributed according to G; Ө1,…, ӨK to be the distinct values taken on by Φ1,…,Φi-1, nk be # of Φi’= Өk, 0<i’<i,

Page 8: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

HDP – Chinese Restaurant Franchise

• First level: within each group, DP mixture–

– Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be the values taken on by Φj1,…,Φj(i-1), njk be # of Φji’= Ѱjt, 0<i’<i.

• Second level: across group, sharing clusters– Base measure of each group is a draw from DP:

– Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of Ѱjt=Өk, all j, t.

)(~|,~|),,(~ 00 jijijijjjij FxGGGDPG

0 0 0| ~ , ~ ( , )jt G G G DP H

Page 9: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

HDP – CRF graph

• The values of are shared between groups, as well as within groups. This is a key property of HDP.

Integrating out G0

Page 10: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

DP Mixture Model

• One of the most important application of DP: nonparametric prior distribution on the components of a mixture model.

• G can be looked as an infinite mixture model.

)(~|

~|

),(~

1

00

iii

i

kkk

Fx

GG

G

GDPG

Page 11: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

• HDP can be used as the prior distribution over the factors for nested group data.

• We consider a two-level DPs. G0 links the child Gj DPs and forces them to share components. Gj is conditionally independent given G0

HDP mixture model

Page 12: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Infinite Hidden Markov Model

• The number of hidden states is allowed to be countably infinite.

• The transition probabilities given in the ith row of the transition matrix A can be interpreted as mixing proportions = (ai1, ai2, …, aik, …)

• Thus each row of the A in HMM is a DP. Also these DPs must be linked, because they should have same set of “next states”. HDP provides the natural framework for the infinite HMM.

Page 13: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

iHMM via HDP

• Assign observations to groups, where the groups are indexed by the value of the previous state variable in the sequence. Then the current state and emission distribution define a group-specific mixture model.

• Multiple iHMMs can be linked by adding an additional level of Bayesian hierarchy, letting a master DP couple each of the iHMM, each of which is a set of DPs.

Page 14: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

HDP & iHMM

HDP (CRF aspect) iHMM

Group Restaurant J (fixed) By Si-1 (random)

Data Customer xji yi

Hidden factor

Table ji = k, k=1~

Dish k ~ H

Si = k, k=1~

B (Si , : )

DP weights Popularity jk, k=1~ A (Si-1, : )

Likelihood F(xji| ji ) B (Si, yi)

Page 15: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Non-trivialities in iHMM

• HDP assumes a fixed partition of the data into groups while HMM is for time-series data, and the definition of groups is itself random.

• Consider CRF aspect of HDP, the number of restaurant is infinite. Also in the sampling scheme, changing st may affect all subsequent data assignment.

• CRF is natural to describe the iHMM, however it is awkward for sampling. We need to use sampling algorithm from other respects for the iHMM.

Page 16: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

HDP Results

Page 17: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

iHMM Results

Page 18: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Conclusion

• HDP is a hierarchical, nonparametric model for clustering problems involving multiple groups of data.

• The mixture components are shared across groups and the appropriate number is determined by HDP automatically.

• HDP can be extended to infinite HMM model, providing effective inference algorithm.

Page 19: Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Reference

• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes”, NIPS 2004.

• Beal, M.J., Ghahramani, Z. and Rasmussen, C.E., “The Infinite Hidden Markov Model”, NIPS 2002

• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Hierarchical Dirichlet Processes”, Revised version to appear in JASA, 2006.


Recommended