Date post: | 14-Dec-2015 |
Category: |
Documents |
Author: | gunner-salyards |
View: | 213 times |
Download: | 0 times |
Marjolijn Elsinga amp Elze de Groot 1
Markov Chains Markov Chains and and
Hidden Markov ModelsHidden Markov Models
Marjolijn Elsinga
amp
Elze de Groot
Marjolijn Elsinga amp Elze de Groot 2
Andrei A MarkovAndrei A Markov
Born 14 June 1856 in Ryazan RussiaDied 20 July 1922 in Petrograd Russia
Graduate of Saint Petersburg University (1878)
Work number theory and analysis continued fractions limits of integrals approximation theory and the convergence of series
Marjolijn Elsinga amp Elze de Groot 3
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 4
Markov Chains (1)Markov Chains (1)
Emitting states
Marjolijn Elsinga amp Elze de Groot 5
Markov Chains (2)Markov Chains (2)
Transition probabilities
Probability of the sequence
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 2
Andrei A MarkovAndrei A Markov
Born 14 June 1856 in Ryazan RussiaDied 20 July 1922 in Petrograd Russia
Graduate of Saint Petersburg University (1878)
Work number theory and analysis continued fractions limits of integrals approximation theory and the convergence of series
Marjolijn Elsinga amp Elze de Groot 3
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 4
Markov Chains (1)Markov Chains (1)
Emitting states
Marjolijn Elsinga amp Elze de Groot 5
Markov Chains (2)Markov Chains (2)
Transition probabilities
Probability of the sequence
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 3
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 4
Markov Chains (1)Markov Chains (1)
Emitting states
Marjolijn Elsinga amp Elze de Groot 5
Markov Chains (2)Markov Chains (2)
Transition probabilities
Probability of the sequence
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 4
Markov Chains (1)Markov Chains (1)
Emitting states
Marjolijn Elsinga amp Elze de Groot 5
Markov Chains (2)Markov Chains (2)
Transition probabilities
Probability of the sequence
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 5
Markov Chains (2)Markov Chains (2)
Transition probabilities
Probability of the sequence
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 6
Key property of Markov ChainsKey property of Markov Chains
The probability of a symbol xi depends only on the value of the preceding symbol xi-1
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 7
Begin and End statesBegin and End states
Silent states
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 8
Example CpG IslandsExample CpG Islands
CpG = Cytosine ndash phosphodiester bond ndash Guanine
100 ndash 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches
of the genome (start regions of genes)High chance of mutation into a thymine (T)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 9
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 10
DiscriminationDiscrimination
48 putative CpG islands are extractedDerive 2 models
- regions labelled as CpG island (lsquo+rsquo model)
- regions from the remainder (lsquo-rsquo model)
Transition probabilities are set- Where Cst+ is number of times letter t follows letter s
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 11
Maximum Likelihood EstimatorsMaximum Likelihood Estimators
Each row sums to 1Tables are asymmetric
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 12
Log-odds ratioLog-odds ratio
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 13
Discrimination shownDiscrimination shown
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 14
Simulation lsquo+rsquo modelSimulation lsquo+rsquo model
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 15
Simulation lsquo-rsquo modelSimulation lsquo-rsquo model
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 16
Todays topicsTodays topics
Markov chains
Hidden Markov models- Viterbi Algorithm- Forward Algorithm- Backward Algorithm- Posterior Probabilities
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 17
Hidden Markov Models (HMM) (1)Hidden Markov Models (HMM) (1)
No one-to-one correspondence between states and symbols
No longer possible to say what state the model is in when in xi
Transition probability from state k to l
πi is the ith state in the path (state sequence)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 18
Hidden Markov Models (HMM) (2)Hidden Markov Models (HMM) (2)
Begin state a0k
End state a0k
In CpG islands example
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 19
Hidden Markov Models (HMM) (3)Hidden Markov Models (HMM) (3)
We need new set of parameters because we decoupled symbols from states
Probability that symbol b is seen when in state k
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 20
Example dishonest casino (1)Example dishonest casino (1)
Fair die and loaded dieLoaded die probability 05 of a 6 and
probability 01 for 1-5Switch from fair to loaded probability
005Switch back probability 01
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 21
Dishonest casino (2)Dishonest casino (2)
Emission probabilities HMM model that generate or emit sequences
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 22
Dishonest casino (3)Dishonest casino (3)
Hidden you donrsquot know if die is fair or loaded
Joint probability of observed sequence x and state sequence π
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 23
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 24
Viterbi AlgorithmViterbi Algorithm
CGCG can be generated on different ways and with different probabilities
Choose path with highest probability
Most probable path can be found recursively
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 25
Viterbi Algorithm (2)Viterbi Algorithm (2)
vk(i) = probability of most probable path ending in state k with observation i
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 26
Viterbi Algorithm (3)Viterbi Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 27
Viterbi AlgorithmViterbi Algorithm
Most probable path for CGCG
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 28
Viterbi AlgorithmViterbi AlgorithmResult with casino example
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 29
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 30
Forward Algorithm (1)Forward Algorithm (1)Probability over all possible paths
Number of possible paths increases exponentonial with length of sequence
Forward algorithm enables us to compute this efficiently
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 31
Forward Algorithm (2) Forward Algorithm (2)
Replacing maximisation steps for sums in viterbi algorithm
Probability of observed sequence up to and including xi requiring πi = k
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 32
Forward Algorithm (3)Forward Algorithm (3)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 33
Three algorithmsThree algorithmsWhat is the most probable path for generating a
given sequence
Viterbi AlgorithmHow likely is a given sequence
Forward AlgorithmHow can we learn the HMM parameters given a
set of sequences
Forward-Backward (Baum-Welch) Algorithm
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 34
Backward Algorithm (1)Backward Algorithm (1)Probability of observed sequence from xi to the
end of the sequence requiring πi = k
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 35
Disadvantage AlgorithmsDisadvantage Algorithms
Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer
can be solved by doing the algorithms in log space calculating log(vl(i))
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 36
Backward AlgorithmBackward Algorithm
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 37
Posterior State Probability (1)Posterior State Probability (1)
Probability that observation xi came from state k given the observed sequence
Posterior probability of state k at time i when the emitted sequence is known
P(πi = k | x)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 38
Posterior State Probability (2)Posterior State Probability (2)First calculate probability of producing entire
observed sequence with the ith symbol being produced by state k
P(x πi = k) = fk (i) bk (i)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 39
Posterior State Probability (3) Posterior State Probability (3)
Posterior probabilities will then be
P(x) is result of forward or backward calculation
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 40
Posterior Probabilities (4)Posterior Probabilities (4)
For the casino example
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 41
Two questionsTwo questions
How would we decide if a short strech of genomic sequence comes from a CpG island or not
How would we find given a long piece of sequence the CpG islands in it if there are any
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 42
Prediction of CpG islandsPrediction of CpG islands
First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the lsquo+rsquo state a CpG island is predicted
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 43
Prediction of CpG islandsPrediction of CpG islandsSecond Way Posterior Decoding
- function
- g(k) = 1 for k Є A+ C+ G+ T+
- g(k) = 0 for k Є A- C- G- T-
- G(i|x) is posterior probability according to the model that base i is in a CpG island
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 44
Summary (1)Summary (1)
Markov chain is a collection of states where a state depends only on the state before
Hidden markov model is a model in which the states sequence is lsquohiddenrsquo
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)
Marjolijn Elsinga amp Elze de Groot 45
Summary (2)Summary (2)
Most probable path viterbi algorithmHow likely is a given sequence forward
algorithmPosterior state probability forward and
backward algorithms (used for most probable state of an observation)