Hidden Markov ModelLecture #6
Reminder: Finite State Markov ChainAn integer time stochastic process, consisting of a domain D of m states {1,,m} andAn m dimensional initial distribution vector ( p(1),.., p(m)).An mm transition probabilities matrix M= (ast) For each integer L, a Markov Chain assigns probability to sequences (x1xL) over D (i.e, xi D) as follows:
Similarly, (X1,, Xi ,)is a sequence of probability distributions over D.
Ergodic Markov ChainsThe Fundamental Theorem of Finite-state Markov Chains:If a Markov Chain is ergodic, then It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix.The distributions Xi , as i, converges to V. A Markov chain is ergodic if :All states are recurrent (ie, the graph is strongly connected)It is not peridoic
Use of Markov Chains: Sequences with CpG IslandsRecall from last class: In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG.
Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone.
Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes.
These areas are called CpG islands (p denotes pair).
Modeling sequences with CpG Island
The + model: Use transition matrix A+ = (a+st), Where: a+st = (the probability that t follows s in a CpG island)
The - model: Use transition matrix A- = (a-st), Where: a-st = (the probability that t follows s in a non CpG island)
CpG Island: Question 1We solved the following question:
Question 1: Given a short stretch of genomic data, does it come from a CpG island ?
By modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:
CpG Island: Question 2Now we solve the 2nd question:
Question 2: Given a long piece of genomic data, does it contain CpG islands in it, and where?
For this, we need to decide which parts of a given long sequence of letters is more likely to come from the + model, and which parts are more likely to come from the model. This is done by using the Hidden Markov Model, to be defined.
Question 2: Finding CpG IslandsGiven a long genomic str with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic):C+T+G+A+C-T-G-A-The problem is that we dont know the sequence of states which are traversed, but just the sequence of letters.Therefore we use here Hidden Markov Model
Hidden Markov ModelA Markov chain (s1,,sL):and for each state s and a symbol x we have p(Xi=x|Si=s)Application in communication: message sent is (s1,,sm) but we receive (x1,,xm) . Compute what is the most likely message sent ?Application in speech recognition: word said is (s1,,sm) but we recorded (x1,,xm) . Compute what is the most likely word said ?
Hidden Markov ModelNotations:Markov Chain transition probabilities: p(Si+1= t|Si = s) = astEmission probabilities: p(Xi = b| Si = s) = es(b)For Markov Chains we know:
What is p(s,x) = p(s1,,sL;x1,,xL) ?
Indepndence assumptions:We assume the following joint distribution for the full chain:This factorization encodes the following conditional independence assumptions:p(si | s1,,si-1,x1,,xi-1) = p(si | si-1) and p(ri | s1,,si,x1,,xi-1) = p(xi | si)
Hidden Markov Modelp(Xi = b| Si = s) = es(b), means that the probability of xi depends only on the probability of si.Formally, this is equivalent to the conditional independence assumption: p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi) Thus
Hidden Markov ModelExercise: Using the definition of conditional probability:P(X|Y) = P(X,Y)/P(Y), prove formally that the equality p(Xi = xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi)implies that for any Y {x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL}, such that si is in Y, it holds that: p(Xi=xi|Y) = esi(xi)
Hidden Markov Model for CpG IslandsThe states:Domain(Si)={+, -} {A,C,T,G} (8 values)In this representation P(xi| si) = 0 or 1 depending on whether xi is consistent with si . E.g. xi= G is consistent with si=(+,G) and with si=(-,G) but not with any other state of si.
Use of HMM: A posteriori beliefThe conditional probability of a variable X given the evidence e: This is the a posteriori belief in X, given evidence e
This query is also called Belief update.We use HMM to compute our posteriori belief on a sequence, given some information on it (usually (x1,,xL ))
Hidden Markov ModelQuestions:Given the visible sequence x=(x1,,xL), find:A most probable (hidden) path.The probability of x.For each i = 1,..,L, and for each state k, p(si=k| x)
1. Most Probable state pathFirst Question: Given an output sequence x = (x1,,xL),A most probable path s*= (s*1,,s*L) is one which maximizes p(s|x).
Most Probable path (cont.) Since
we need to find s which maximizes p(s,x)
Viterbis algorithm for most probable pathThe task: computevl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l .Let the states be {1,,m}Idea: for i=1,,L and for each state l, compute:
Viterbis algorithm for most probable pathvl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l .Exercise: For i = 1,,L and for each state l:
Viterbis algorithms1s2sL-1sLX1X2XL-1XLsiXiFor i=1 to L do for each state l :vl(i) = el(xi) MAXk {vk(i-1)akl }ptri(l)=argmaxk{vk(i-1)akl} [storing previous state for reconstructing the path]Termination: Initialization: v0(0) = 1 , vk(0) = 0 for k > 00We add the special initial state 0.
2. Computing p(x)Given an output sequence x = (x1,,xL),Compute the probability that this sequence was generated:The summation taken over all state-paths s generating x.
Forward algorithm for computing p(x) The task: computeIdea: for i=1,,L and for each state l, compute:
fl(i) = p(x1,,xi;si=l ), the probability of all the paths which emit (x1,..,xi) and end in state si=l. Use the recursive formula:
Forward algorithm for computing p(x)s1s2sL-1sLX1X2XL-1XLsiXiFor i=1 to L do for each state l :fl(i) = el(xi) k fk(i-1)akl Initialization: f0(0) := 1 , fk(0) := 0 for k>00Similar to Viterbis algorithm:
3. The distribution of Si, given xGiven an output sequence x = (x1,,xL),Compute for each i=1,,l and for each state k the probability that si = k. This helps to reply queries like: what is the probability that si is in a CpG island, etc.
Solution in two stages For each i and each state k, compute p(si=k | x1,,xL).2. Do the same computation for every i = 1,..,L but without repeating the first task L times.
Computing for a single i:
Decomposing the computationP(x1,,xL,si) = P(x1,,xi,si) P(xi+1,,xL | x1,,xi,si)
(by the equality p(A,B) = p(A)p(B|A ).
P(x1,,xi,si)= fsi(i) F(si), so we are left with the task to compute P(xi+1,,xL | x1,,xi,si) B(si)
Decomposing the computations1s2Si+1sLX1X2Xi+1XLsiXiExercise: Show from the definitions of Markov Chain and Hidden Markov Chain that:P(xi+1,,xL | x1,,xi,si) = P(xi+1,,xL | si)Denote P(xi+1,,xL | si) B(si).
Decomposing the computationSummary:P(x1,,xL,si) = P(x1,,xi,si) P(xi+1,,xL | x1,,xi,si)
Equality due to independence of {xi+1,,xL}, and {x1,,xi} | si} by the Exercise. = P(x1,,xi,si) P(xi+1,,xL | si) F(si)B(si)
F(si): The Forward algorithm:s1s2sL-1sLX1X2XL-1XLsiXiFor i=1 to L do for each state l :F(si) = esi(xi)si-1 F (si-1)asi-1si Initialization: F (0) = 10The algorithm computes F(si) = P(x1,,xi,si) for i=1,,L (namely, considering evidence up to time slot i).
B(si): The backward algorithmThe task: Compute B(si) = P(xi+1,,xL|si) for i=L-1,,1 (namely, considering evidence after time slot i).{first step, step L-1: Compute B(sL-1).}P(xL| sL-1) = sLP(xL ,sL |sL-1) = sL P(sL |sL-1) P(xL |sL )
The combined answer1. To compute the probability that Si=si given that {x1,,xL} run the forward algorithm and compute F(si) = P(x1,,xi,si), run the backward algorithm to compute B(si) = P(xi+1,,xL|si), the product F(si)B(si) is the answer (for every possible value si).
2. To compute these probabilities for every si simply run the forward and backward algorithms once, storing F(si) and B(si) for every i (and every value of si). Compute F(si)B(si) for every i.
Time and Space Complexity of the forward/backward algorithmsTime complexity is O(m2L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. Space complexity is also O(m2L).