Hidden Markov Models - Emory Universitylxiong/cs570/share/slides/08_hmm.pdf · Markov Chains...

Hidden Markov Models

Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Sequential Data

• Time-series: Stock

market, weather, speech,

video

• Ordered: Text, genes

Sequential Data: Tracking

Observe noisy measurements of missile location

Where is the missile now? Where will it be in 1 minute?

Sequential Data: Weather

• Predict the weather tomorrow

using previous information

• If it rained yesterday, and the

previous day and historically it

has rained 7 times in the past

10 years on this date — does

this affect my prediction?

Sequential Data: Weather

• Use product rule for joint distribution of a sequence

• How do I solve this?

• Model how weather changes over time

• Model how observations are produced

• Reason about the model

Markov Chain

• Set S is called the state space

• Process moves from one state to another generating a

sequence of states: x1, x2, …, xt

• Markov chain property: probability of each subsequent

state depends only on the previous state:

Markov Chain: Parameters

• State transition matrix A (|S| x |S|)

A is a stochastic matrix (all rowssum to one)

Time homogenous Markov chain: transition probability between two states does not depend on time

• Initial (prior) state probabilities

Rain Dry

0.70.3

0.2 0.8

• Two states : ‘Rain’ and ‘Dry’.

• Transition probabilities: P(‘Rain’|‘Rain’)=0.3, P(‘Dry’|‘Rain’)=0.7

P(‘Rain’|‘Dry’) =0.2, P(‘Dry’|‘Dry’)=0.8

• Initial probabilities: P(‘Rain’)=0.4 , P(‘Dry’)=0.6

Example of Markov Model

Example: Weather Prediction

• Compute probability of tomorrow’s weather using Markov property

• Evaluation: given today is dry, what’s the probability that tomorrow is dry and the next day is rainy?

• Learning: give some observations, determine the transition probabilities

P({‘Dry’,’Dry’,’Rain’} )

= P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)

= 0.2*0.8*0.6

Hidden Markov Model (HMM)

• Stochastic model where

the states of the model

are hidden

• Each state can emit an

output which is observed

HMM: Parameters

• State transition matrix A

• Emission / observation

conditional output probabilities B

• Initial (prior) state probabilities

Low High

0.70.3

0.2 0.8

DryRain

0.6 0.60.4 0.4

Example of Hidden Markov Model

• Two states : ‘Low’ and ‘High’ atmospheric pressure.

• Two observations : ‘Rain’ and ‘Dry’.

• Transition probabilities:

P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7

P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8

• Observation probabilities :

P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4

P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3

• Initial probabilities:

P(‘Low’)=0.4 , P(‘High’)=0.6

Example of Hidden Markov Model

• Suppose we want to calculate a probability of a sequence of

observations in our example, {‘Dry’,’Rain’}.

• Consider all possible hidden state sequences:

P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +

P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} ,

{‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})

where first term is :

P({‘Dry’,’Rain’} , {‘Low’,’Low’})=

P({‘Dry’,’Rain’} | {‘Low’,’Low’}) P({‘Low’,’Low’}) =

P(‘Dry’|’Low’)P(‘Rain’|’Low’) P(‘Low’)P(‘Low’|’Low)

= 0.4*0.4*0.6*0.4*0.3

Calculation of observation sequence probability

Example: Dishonest Casino

• A casino has two dices that it switches between with

5% probability

• Fair dice

• Loaded dice


• Initial probabilities

• State transition matrix

• Emission probabilities


• Given a sequence of rolls by the casino player

• How likely is this sequence given our model of how the casino works? – evaluation problem

• What sequence portion was generated with the fair die, and what portion with the loaded die? – decoding problem

• How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded and back? – learning problem

HMM: Problems

• Evaluation: Given parameters and observation sequence,

find probability (likelihood) of observed sequence

- forward algorithm

• Decoding: Given HMM parameters and observation

sequence, find the most probable sequence of hidden states

- Viterbi algorithm

• Learning: Given HMM with unknown parameters and

observation sequence, find the parameters that maximizes

likelihood of data

- Forward-Backward algorithm

HMM: Evaluation Problem

• Given

• Probability of observed sequence

Summing over all possible hidden state values at all

times — KT exponential # terms

s1

s2

si

sK

s1

s2

si

sK

s1

s2

sj

sK

s1

s2

si

sK

a1j

a2j

aij

aKj

Time= 1 t t+1 T

o1 ot ot+1 oT = Observations

Trellis representation of an HMM

HMM: Forward Algorithm

• Instead pose as recursive problem

• Dynamic program to compute forward probability in state St = k

after observing the first t observations

• Algorithm:

- initialize: t=1

- iterate with recursion: t=2, … t=k …

- terminate: t=T

t

kttk

HMM: Problems



- forward algorithm



- Viterbi algorithm



likelihood of data

- Forward-Backward algorithm

HMM: Decoding Problem 1

• Given

• Probability that hidden state at time t was k

We know how to compute the first part using

forward algorithm

HMM: Backward Probability

• Similar to forward probability, we can express as a

recursion problem

• Dynamic program

• Initialize

• Iterate using recursion

tttk


• Probability that hidden state at time t was k

• Most likely state assignment

Forward-

backward

algorithm


• Given

• What is most likely state sequence?

probability of most likely sequence of

states ending at state ST=k

HMM: Viterbi Algorithm

• Compute probability recursively over t

• Use dynamic programming again!

HMM: Viterbi Algorithm

• Initialize

• Iterate

• Terminate

Traceback

HMM: Computational Complexity

• What is the running time for the forward algorithm,

backward algorithm, and Viterbi?

O(K2T) vs O(KT)!

HMM: Problems



- forward algorithm



- Viterbi algorithm



likelihood of data

- Forward-Backward, Baum-Welch algorithm

HMM: Learning Problem

• Given only observations

• Find parameters that maximize likelihood

• Need to learn hidden state sequences as well

HMM: Baum-Welch (EM) Algorithm

• Randomly initialize parameters

• E-step: Fix parameters, find expected state assignment

Forward-backward

algorithm


• Expected number of times we will be in state i

• Expected number of transitions from state i

• Expected number of transitions from state i to j


• M-step: Fix expected state assignments, update

parameters

HMM: Problems



- forward algorithm



- Viterbi algorithm



likelihood of data

- Forward-Backward (Baum-Welch) algorithm

HMM vs Linear Dynamical Systems

• HMM

• States are discrete

• Observations are discrete or continuous

• Linear dynamical systems

• Observations and states are multivariate Gaussians

• Can use Kalman Filters to solve

Linear State Space Models

• States & observations are Gaussian

• Kalman filter: (recursive) prediction and

update

More examples

• Location prediction

• Privacy preserving data monitoring

Next Location Prediction: Definitions

Source: A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

o Personalization

• Individual-based methods only utilize the history of one object to predict its

future locations.

• General-based methods use the movement history of other objects

additionally (e.g. similar objects or similar trajectories) to predict the object’s

future location.

Next Location Prediction: Classification of

Methods

Source: A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

o Temporal Representation

• Location-series representations define trajectories as a set of

sequenced locations ordered in time.

• Fixed-interval time representations use a fixed time interval

between two consecutive locations

• Variable-interval time representations allow variable

transition times between sequenced locations


Methods

o Spatial Representation

• Grid-based methods divide space into fixed-size cells which

can be simple rectangular regions

• Frequent/dense regions using clustering methods such as

density-based algorithms such as DBSCAN and hierarchical

clustering.

• Semantic-based methods use semantic features of locations

in addition to the geographic information, e.g. home, bank,

school.


Methods

o Mobility Learning Method

• Model-based (formulate the movement of moving objects using

mathematical models) Markov Chains

Recursive Motion Function (Y. Tao et. al., ACM SIGMOD 2004)

Semi-Lazy Hidden Markov Model (J. Zhou et. al., ACM SIGKDD 2013)

Deep learning models

• Pattern-based (exploit pattern mining algorithms for prediction) Trajectory Pattern Mining (A. Monreale et. Al., ACM SIGKDD 2009)

• Hybrid Recursive Motion Function + Sequential Pattern Mining (H. Jeung et. al., ICDE 2008)


Methods

Preliminary Results

Prediction error for different prediction length using (a) Brinkhoff , and (b) Periodical Synthetic dataset

(a) (b)

Date post:	16-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Hidden Markov Models - Emory Universitylxiong/cs570/share/slides/08_hmm.pdf · Markov Chains...

Documents