HIDDEN MARKOV MODELS - Idiap Research Instituteodobez/TEACHING/EE613/EE613-HMM.pdf ·...

Post on 25-Jun-2020

2 views 0 download

transcript

EE613Machine Learning for Engineers

HIDDEN MARKOV MODELS

Sylvain CalinonRobot Learning & Interaction Group

Idiap Research InstituteDec. 5, 2019

Outline

2

• Markov models

• Hidden Markov model (HMM)

• Forward-backward algorithm

• Viterbi decoding (dynamic programming)

• Hidden semi-Markov model (HSMM)

• HMM with dynamic features (Trajectory-HMM)

Markov models

3

Markov models - Parameters

4

K possible states

Markov models in language modeling

5

Markov models in language modeling

6

Markov models in language modeling

7

Example of text generated from a 4-gram model, trained on a corpus of 400 million words.

The first 4 words are specified by hand, the model generates the 5th word, and then the results are fed back into the model.

Source: http://www.fit.vutbr.cz/~imikolov/rnnlm/gen-4gram.txt

SAYS IT’S NOT IN THE CARDS LEGENDARY RECONNAISSANCE BY ROLLIE DEMOCRACIES UNSUSTAINABLE COULD STRIKE REDLINING VISITS TO PROFIT BOOKING WAIT HERE AT MADISON SQUARE GARDEN COUNTY COURTHOUSE WHERE HE HAD BEEN DONE IN THREE ALREADY IN ANY WAY IN WHICH A TEACHER …

MLE of transition matrix in Markov models

8

Hidden Markov model (HMM)

Python notebook: demo_HMM.ipynb

Matlab code: demo_HMM01.m

9

In a Markov chain, the state is directly visible to the observer → the transition probabilities are the only parameters.

In an HMM, the state is not directly visible, but an output dependent on the state is visible.

Hidden Markov model (HMM)

10

Hiddenstates

Observedoutput

(emissionprobability)

Image adapted from Wikipedia

Initial state: All we know is that 60% of the days are rainy on average.

The transition probability represents the change of the weather in the underlying Markov chain. Here, there is a 30% chance that tomorrow will be sunny if today is rainy.

Hidden Markov model (HMM)

11

Hiddenstates

Observedoutput

(emissionprobability)

Image adapted from Wikipedia

You can think of an HMM either as:

• a Markov chain with stochastic measurements

• a GMM with latent variables changing over time

The emission probability represents how likely Bob performs a certain activity on each day. if it is sunny, there is a 60% chance that he is outside for a walk. If it is rainy, there is a 50% chance that he cleans his apartment, etc.

Inference problems associated with HMMs

12

Emission/output distributions in HMM

Discrete tables

V1

V2

V3

V1

V2

V3

Gaussian distribution Mixture of Gaussians

13

Transition matrix structures in HMM

14

HMM - Examples of application

15

HMM is used in many fields as a tool for time series or sequences analysis, and in fields where the goal is to recover a data sequence that is not immediately observable:

Speech recognitionSpeech synthesisPart-of-speech taggingNatural language modelingMachine translationGene predictionMolecule kinetic analysisDNA motif discoveryAlignment of bio-sequences (e.g., proteins)Metamorphic virus detectionDocument separation in scanning solutions

CryptoanalysisActivity recognitionProtein foldingHuman motion scienceOnline handwriting recognitionRobotics

Automatic speech recognition

ξt can represent features extracted from the speech signal, and st can represent

the word being spoken. The transition model P(st|st-1) represents the language

model, and the observation model P(ξt|st) represents the acoustic model.

Part of speech tagging

ξt can represent a word, and st represents its part of speech (noun, verb,

adjective, etc.)

Activity recognition

ξt can represent features extracted from a video frame, and st is the class of

activity the person is engaged in (e.g., running, walking, sitting, etc.).

Gene finding

ξt can represent the DNA nucleotides (A,T,G,C), and st can represent whether we

are inside a gene-coding region or not.

16

HMM - Examples of application ξt Observationst Hidden state

GMM

HMM

HMM parameters

From now on, we will consider a single Gaussian as state output

17

Useful intermediary variables in HMM

18

Forward variable

Backward variable

Smoothed node marginals

Smoothed edge marginals

KK

K

Forward algorithm

19

Forward algorithm

20

Forward algorithm

21

ξ1

s1

ξ2

s2

ξ3

s3

ξ4

s4

Forward algorithm

22

ξ1

s1

ξ2

s2

ξ3

s3

ξ4

s4

KK

K

Forward algorithm

23

.94

.06

.94 .94 .94 .94 1

1.06 .06 .06 .06

Low influence of transition probabilities w.r.t. emission probabilities in HMM

.06

.94

.06 .06 .06 .06 1

1.94 .94 .94 .94

24

Learned transition probabilities Transition probabilities manually set

Direction of motion

The color of each datapointcorresponds to the value of the forward variable 𝛼

Useful intermediary variables in HMM

25

Forward variable

Backward variable

Smoothed node marginals

Smoothed edge marginals

26

Backward algorithm

K

K

K

Backward algorithm

27

Useful intermediary variables in HMM

28

Forward variable

Backward variable

Smoothed node marginals

Smoothed edge marginals

These variable are sometimes called "smoothed values" as they combine forward and backward probabilities in the computation.

You can think of their roles as passing "messages" from left to right, and from right to left, and then combining the information at each node.

Smoothed node marginals

29

Smoothed node marginals

30

Conditional independenceproperty

ξ1

s1

ξ2

s2

ξ3

s3

ξ4

s4

Smoothed node marginals

31

These variable are sometimes called "smoothed values" as they combine forward and backward probabilities in the computation.

You can think of their roles as passing "messages" from left to right, and from right to left, and then combining the information at each node.

Useful intermediary variables in HMM

32

Forward variable

Backward variable

Smoothed node marginals

Smoothed edge marginals

Smoothed edge marginals

K

Smoothed edge marginals

34

Smoothed edge marginals

35

EM for HMM

Similar to Markov models

Similarto GMM

K GaussiansM trajectoriesTm points per traj.

36

EM for HMM

37

K GaussiansM trajectoriesTm points per traj.

EM for HMM - Summary

38

K GaussiansM trajectoriesTm points per traj.

EM for HMM - Summary

39

These results can be formally retrieved with EM (also called Baum-Welch algorithm in the context of HMM).

The update rules can be interpreted as normalized counts, with several types of weighted averages required in the computation.

K GaussiansM trajectoriesTm points per traj.

Numerical underflow issue in HMM

40

Numerical underflow issue in HMM

41

This issue is sometimes not covered in textbooks, although it remains very important for practical implementation of HMM!

Summary - Why did we introduce these four intermediary variables in HMM?

Forward variable

Backward variable

Smoothed node marginals

Smoothed edge marginals

42

Summary - Why did we introduce these four intermediary variables in HMM?

43

Viterbi decoding (MAP vs MPE estimates)

Python notebook: demo_HMM.ipynb

Matlab code: demo_HMM_Viterbi01.m

44

Maximum a posteriori Most probable explanation

Viterbi decoding (MAP vs MPE estimates)

45

Maximum a posteriori Most probable explanation

Viterbi decoding - Trellis representation

46

Viterbi decoding - Algorithm

This is the probability of ending up in state i at time step t by taking the most probable path

It tells us the most likely previous state on the most probable path to st = i

47

Viterbi decoding - Trellis representation

48

Viterbi decoding - Example

Image adapted from Kevin P. Murphy (2012), Machine Learning: A Probabilistic Perspective49

Numerical underflow issue in Viterbi

50

Numerical underflow issue in Viterbi

51

Hidden semi-Markov model (HSMM)

Python notebook: demo_HSMM.ipynb

Matlab code: demo_HSMM01.m

52

By artificially duplicating the number of states while keeping the same emission

distribution, other state duration distributions can be modeled

State duration probability in standard HMM

53

The state duration follows a geometric distribution

54

Another approach is to provide an explicit model of the state duration instead of relying on

self-transition probabilities

Hidden semi-Markov model (HSMM)

GMM

HMM

HSMM

Hidden semi-Markov model (HSMM)

Parametric duration distribution

Hidden semi-Markov model (HSMM)

56

Hidden semi-Markov model (HSMM)

57

Hidden semi-Markov model (HSMM)

58

HMM with dynamic features

(Trajectory-HMM)

Matlab code: demo_trajHSMM01.m

59

HMM with dynamic features

60

61

HMM with dynamic features

62

HMM with dynamic features

63

HMM with dynamic features

64

(C=3 here)

D dimensions C derivativesT time steps

HMM with dynamic features

Large sparse matrix

65

HMM with dynamic features

66

HMM with dynamic features

67

Weighted Least Squares!

HMM with dynamic features

HMM with dynamic features

68

HMM with dynamic features - Summary

69

References

70

Hidden Markov model (HMM)

L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77:2:257–285, February 1989

Hidden semi-Markov model (HSMM)

S.-Z. Yu. Hidden semi-Markov models. Artificial Intelligence, 174:215–243, 2010

S. E. Levinson. Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech & Language, 1(1):29–45, 1986

HMM with dynamic features (Trajectory HMM)

S. Furui. Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. on Acoustics, Speech, and Signal Processing, 34(1):52–59, 1986

H. Zen, K. Tokuda, and T. Kitamura. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech and Language, 21(1):153–173, 2007

Appendix

71

72

Markov models - Transition matrix

MLE of transition matrix in Markov models

73

HMM: Smoothed edge marginals

74

Conditional independence property

ξ1

s1

ξ2

s2

ξ3

s3

ξ4

s4

HSMM: Initialization of forward variable

75