Hidden Markov ModelsTerminology and Basic Algorithms
MotivationWe make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent and identically distributed (iid) ...
but this assumption is not always the best, fx (1) measurements of weather patterns, (2) daily values of stocks, (3) acoustic features in successive time frames used for speech recognition, (4) the composition of texts, (5) the composition of DNA, or ...
Markov ModelsIf the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e.
If the distributions p(xn | xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain ...
then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is
Markov Models
then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is
If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e.
A sequence of observations:The model, i.e. p(xn | xn-1):
If the distributions p(xn | xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain ...
If the distributions p(xn | xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain ...
Markov Models
then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is
If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e.
A sequence of observations:The model, i.e. p(xn | xn-1):
Extension A higher order Markov chain
Hidden Markov ModelsWhat if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable?
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
H H L L H
Observations
Latent values
Hidden Markov Models
Markov Model
Hidden Markov Model
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
H H L L H
Observations
Latent values
What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable?
Hidden Markov Models
Markov Model
Hidden Markov Model
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
H H L L H
Observations
Latent values
What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable?
Computational problems
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable?
H H L L H
Observations
Latent values
Hidden Markov Models
Markov Model
Hidden Markov Model
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
The predictive distribution
p(xn+1 | x1,...,xn)
for observation xn+1 can be shown to depend on all previous observations, i.e. the sequence of observations is not a Markov chain of any order ...
Hidden Markov ModelsWhat if the n'th observation in a chain of observations is influenced by a corresponding latent variable?
Markov Model
Hidden Markov ModelH H L L H
Observations
Latent values
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
The joint distribution
Hidden Markov ModelsWhat if the n'th observation in a chain of observations is influenced by a corresponding latent variable?
Markov Model
Hidden Markov ModelH H L L H
Observations
Latent values
If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
The joint distributionTransition probabilities
Emission probabilities
Transition probabilitiesNotation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3 ...
Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn | zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector ...
The probability of going from state j to state k is:
The probability of state k being the initial state is:
Transition probabilitiesNotation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3 ...
Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn | zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector ...
The probability of going from state j to state k is:
The probability of state k being the initial state is:
Transition ProbabilitiesNotation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3 ...
Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn | zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector ...
The probability of going from state j to state k is:
The probability of state k being the initial state is:
The transition probabilities:
Transition ProbabilitiesNotation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3 ...
Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn | zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector ...
State transition diagram
The probability of going from state j to state k is:
The probability of state k being the initial state is:
Emission probabilitiesEmission probabilities: The conditional distributions of the observed variables p(xn | zn) from a specific state
If the observed values xn are discrete (e.g. D symbols), the emission probabilities is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable ...
Emission probabilitiesEmission probabilities: The conditional distributions of the observed variables p(xn | zn) from a specific state
If the observed values xn are discrete (e.g. D symbols), the emission probabilities is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable ...
Emission probabilitiesEmission probabilities: The conditional distributions of the observed variables p(xn | zn) from a specific state
If the observed values xn are discrete (e.g. D symbols), the emission probabilities is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable ...
znk = 1 iff the n'th latent variable in the sequence is in state k, otherwise it is 0, i.e. the product just picks the emission probabilities corresponding to state k ...
HMM joint probability distribution
If A and are the same for all n then the HMM is homogeneous
Observables: Latent states: Model parameters:
HMM joint probability distribution
If A and are the same for all n then the HMM is homogeneous
Observables: Latent states: Model parameters:
HMMs as a generative model
Model M: A run follows a sequence of states:
H H L L H
And emits a sequence of symbols:
A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state ...
HMMs as a generative model
End
A special End-state can be added to generate finite output
Model M: A run follows a sequence of states:
H H L L H
And emits a sequence of symbols:
A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state ...
Using HMMs
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
Using HMMs
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
Using HMMs
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
The sum has KN terms, but it can be computed in O(K2N) time ...
The forward-backward algorithm(zn) is the joint probability of observing x1,...,xn and being in state zn
(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
(zn) is the joint probability of observing x1,...,xn and being in state zn
(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
Using (zn) and (zn) we get the likelihood of the observations as:
The forward-backward algorithm
The forward algorithm(zn) is the joint probability of observing x1,...,xn and being in state zn
The -recursion
The -recursion
The forward algorithm(zn) is the joint probability of observing x1,...,xn and being in state zn
Recursion:
The forward algorithm(zn) is the joint probability of observing x1,...,xn and being in state zn
Recursion:
Basis:
The forward algorithm(zn) is the joint probability of observing x1,...,xn and being in state zn
Recursion:
Basis:
Takes time O(K2N) and space O(KN) using memorization
The backward algorithm(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
The -recursion
The -recursion
The backward algorithm
Recursion:
(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
The backward algorithm
Recursion:
Basis:
(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
The backward algorithm(zn) is the conditional probability of future observation xn+1,...,xN assuming being in state zn
Recursion:
Basis:
Takes time O(K2N) and space O(KN) using memorization
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
Using HMMs
Predicting the next observation
Predicting the next observation
Predicting the next observation
Determine the likelihood of the sequence of observations
Predict the next observation in the sequence of observations
Find the most likely underlying explanation of the sequence of observation
Using HMMs
The Viterbi algorithm: Finds the most probable sequence of states generating the observations ...
The Viterbi algorithm(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn
h
Intuition: Find the longest path from column 1 to column n, where length is its total probability, i.e. the probability of the transitions and emissions along the path ...
The Viterbi algorithm(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn
Recursion:
The Viterbi algorithm(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn
Recursion:
Basis:
Takes time O(K2N) and space O(KN) using memorization
(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn
Recursion:
Basis:
Takes time O(K2N) and space O(KN) using memorization
The path itself can be retrieve in time O(KN) by backtracking
The Viterbi algorithm
Summary Introduced hidden Markov models (HMMs)
The forward-backward algorithms for determining the likelihood of a sequence of observations, and predicting the next observation in a sequence of observations.
The Viterbi-algorithm for finding the most likely underlying explanation (sequence of latent states) of a sequence of observation
Next: How to implement the basic algorihtms (forward, backward, and Viterbi) in a numerically sound manner.
Lecture TitleSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Summary