+ All Categories
Home > Documents > Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006....

Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006....

Date post: 19-Dec-2015
Category:
View: 226 times
Download: 1 times
Share this document with a friend
17
Ch 13. Sequential Data Ch 13. Sequential Data (1/2) (1/2) Pattern Recognition and Machine Learning, Pattern Recognition and Machine Learning, C. M. Bishop, 2006. C. M. Bishop, 2006. Summarized by Kim Jin-young Biointelligence Laboratory, Seoul National Univ ersity http://bi.snu.ac.kr/
Transcript

Ch 13. Sequential Data (1/2)Ch 13. Sequential Data (1/2)

Pattern Recognition and Machine Learning, Pattern Recognition and Machine Learning, C. M. Bishop, 2006.C. M. Bishop, 2006.

Summarized by Kim Jin-young

Biointelligence Laboratory, Seoul National University

http://bi.snu.ac.kr/

2 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

ContentsContents

13.1 Markov Models 13.2 Hidden Markov Models

13.2.1 Maximum likelihood for the HMM 13.2.2 The forward-backward algorithm 13.2.3 The sum-product algorithm for the HMM 13.2.4 Scaling factors 13.2.5 The Viterbi Algorithm 13.2.6 Extensions of the HMM

3 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Sequential DataSequential Data

Data dependency exists according to a sequence Weather data, DNA, characters in sentence i.i.d. assumption doesn’t hold

Sequential Distribution Stationary vs. Nonstationary

Markov Model No latent variable

State Space Models Hidden Markov Model (discrete latent variables) Linear Dynamical Systems

4 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Markov ModelsMarkov Models

Markov Chain

State Space Model (HMM)

1 N n 1 n-11

1 n n-12

1 2 1 n n-1 n-23

x ,...,x x |x ,...,x

(x ) x |x

(x ) (x |x ) x |x ,x

N

n

N

n

N

n

p p

p p

p p p

1 N 1 N 1 n n-1 n n2 1

x ,...,x ,z ,...,z (z ) z |z x |zN N

n n

p p p p

(free of Markov assumption of any order with reasonable no. of extra parameters)

5 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Hidden Markov Model (overview)Hidden Markov Model (overview)

Overview Introduction of discrete latent vars.

(based on prior knowledge)

Examples Coin toss Urn and ball

Three Issues(given observation,)

Parameter estimation Prob. of observation seq. Most likely seq. of latent var.

6 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Hidden Markov Model (example)Hidden Markov Model (example)

Lattice Representation

Left-to-right HMM<Handwriting Recognition>

7 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Hidden Markov ModelHidden Markov Model

Given the following,

Joint prob. dist. for HMM is:

Whose elements are:

1 1{ ,... }, { ,... }, { }N Nx x z z X Z θ π, A,φ

1

1

11

n n-11 1

n n n1

(z | )

(z |z , )

(x |z , ) (x | )

k

n j nk

nk

KZk

k

K KZ Zjk

k j

KZ

kk

p

p A

p p

π

A

1 n n-1 m m2 1

( | ) (z | ) (z |z , ) (x |z , )N N

n m

p p p p

X,Z π A φ

(observation,latent var,model parameters)

(initial latent node)

(cond. dist. amonglatent vars)

(emission prob.)

K : 상태의 수 /

N : 총 시간Zn-1j,nk : 시각 n-1 에서 j 상태였다가 시각 n 에서 k 상태로 transition

(initial state, state transition, emission)

8 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

EM Revisited EM Revisited (slide by Seok Ho-sik)(slide by Seok Ho-sik)

General EM Maximizing the log likelihood function

Given a joint distribution p(X, Z|Θ) over observed variables X and latent variables Z, governed by parameters Θ

1. Choose an initial setting for the parameters Θold

2. E step Evaluate p(Z|X,Θold ) – posterior dist. of latent vars

3. M step Evaluate Θnew given by Θnew = argmaxΘQ(Θ ,Θold)

Q(Θ ,Θold) = ΣZ p(Z|X, Θold)ln p(X, Z| Θ)

4. It the covariance criterion is not satisfied, then let Θold Θnew

9 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Estimation of HMM ParameterEstimation of HMM Parameter

The Likelihood Function

Using EM Algorithm E-Step

( | ) ( | )Z

p pX θ X,Z θ (marginalization over latent var Z)

( , ) ( , | ) ln ( , | )old old

Z

Q p pθ θ X Z θ X Z θ

1 1

1 1 1,

( ) ( | , )

( ) ( )

( , ) ( , | , )

( , ) , ( )

oldn n

nk nk nkZ

oldn n n n

n j nk n j nk n j nkZ

z p z

z E z z

z z p z z

z z E z z z z

X θ

z

X θ

z

11

12 1 1 1 1

( , ) ( ) ln

( , ) ln ( ) ln ( | )

Kold

k kk

N K K N K

n j nk jk nk n kn j k n k

Q z

z z A z p x

θ θ

1 n n-1 n n2 1

( , | )

(z ) z |z x |zN N

n n

p

p p p

X Z θ

10 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Estimation of HMM ParameterEstimation of HMM Parameter

1

11

( )

( )

kk K

jj

z

z

M-Step

Initial

Transition

Emission

n1

1

n k n k1

1

( )x

( )

( )(x -μ )(x -μ )

( )

N

nkn

k N

nkn

NT

nkn

k N

nkn

z

z

z

z

( | ) ( | , )k k kp x N x

12

11 2

( , )

( , )

N

n j nkn

jk K N

n j nll n

z zA

z z

(Given Gaussian Emission Density)

11 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Forward-backward AlgorithmForward-backward Algorithm

Probability for a single latent var

Probability for two successive latent vars

n nn n

1 n n n+1 N n n n

( | z ) (z )(z ) (z | )

( )

(x ,...,x ,z ) (x ,...,x |z ) (z ) (z )

( ) ( )

p X pp

p

p p

p p

XX

X X

(parameter estimation)

n-1 n n-1 nn-1 n n-1 n

1 n-1 n-1 n n n+1 N n n n-1 n-1

n-1 n n n n-1 n

( | z ,z ) (z ,z )(z ,z ) (z ,z | )

( )

(x ,...,x ,z ) (x |z ) (x ,...,x |z ) (z |z ) (z )

( )

(z ) (x |z ) (z |z ) (z )

( )

p pp

p

p p p p p

p

p p

p

XX

X

X

X

12 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Forward & Backward VariableForward & Backward Variable

Defining alpha & beta Recursively

Probability of Observation

n

n

n nz

Nz

( | ) (z ) (z )

( | ) (z )

p

p

X θ

X θ

n-1

n-1

n 1 n n n n n-1 n n-1Z

n n+1 N n n+1 n+1 n+1 n+1 nZ

(z ) (x ,...,x ,z ) (x ,z ) (z ) (z |z )

(z ) (x ,...,x |z ) (z ) (x ,z ) (z |z )

p p p

p p p

(probability of observation)

13 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Sum-product AlgorithmSum-product Algorithm

Factor graph representation

Same result as before

1 1 1 1

n-1 n n n-1 n n

(z ) (z ) (x |z )

(z ,z ) (z |z ) (x |z )n

h p p

f p p

1 1 1

1 1

1

1

1

n-1 n-1

n n-1 n n-1

n n

n n

n n n n n

n n nn

(z ) (z )

(z ) (z ,z ) (z )

(z ) (z )

(z ) (z )

(z , ) (z ) (z ) (z ) (z )

(z , ) (z ) (z )(z )

( ) ( )

n n n n

n n n n

n

n n

n n

n n n n

z f f z

f z n z fZ

f z

f z

f z f z

f

p

p

p p

X

X

X X

(alternative to forward-backward algo.)

(We condition on x1,x2…,xN)

14 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Scaling FactorsScaling Factors

Alpha & Beta variable can go to zero exponentially quickly.

What if we rescale Alpha & Beta so that their values remain of order unity?

(Implementation Issue)

1

n-1

nn N 1 n

1 n

n 1 n-1

n n n n-1 n n-1

n+1 N nn

n+1 N 1 n

1 n n+1 n+1 n+1 n+1 nZ

(z )ˆ (z ) (z |x ,...,x )

(x ,...,x )

(x |x ,...,x )

ˆ ˆ(z ) (x ,z ) (z ) (z |z )

(x ,...,x |z )ˆ(z )(x ,...,x |x ,...,x )

ˆ ˆ(z ) (z ) (x ,z ) (z |z )

n

n

nZ

n

pp

c p

c p p

p

p

c p p

15 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

The Viterbi AlgorithmThe Viterbi Algorithm

From max-sum algorithm

Joint dist. by the most probable path

Backtracking the most probable path

1

1 1 1

n n

n+1 1 n n+1 n

(z ) (z )

(z ) max{ln (z ,z ) (z )}n n n n

n n n nn

z f f z

f z n z fz

f

1 1n n 1 n 1 n

,...,

1 1 1 1

n+1 n+1 n+1 n+1 n n

(z ) (z ) max (x ,...,x ,z ,...,z )

(z ) ln (x |z ) ln (z )

(z ) max{ln (x |z ) ln (z |z ) (z )}

n nn

n

f zz z

z

p p

p p

(most likely state sequence)

max max1 )n nk k

(Eq. 13.68 Revised)

16 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

Extensions of HMMExtensions of HMM

Autoregressive HMM Considering long-term

time dependency

Input-output HMM For supervised learning

Factorial HMM For decoding multiple

bits of info.

17 (C) 2007, SNU Biointelligence La

b, http://bi.snu.ac.kr/

ReferencesReferences

HMM A Tutorial On Hidden Markov Models And Selected Applicatio

ns In Speech Recognition (Rabiner)

ETC http://en.wikipedia.org/wiki/Expectation-maximization_algorith

m http://en.wikipedia.org/wiki/Lagrange_multipliers


Recommended