+ All Categories
Home > Documents > Unit 16: Hidden Markov Models - Numerical...

Unit 16: Hidden Markov Models - Numerical...

Date post: 25-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
24
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 16: Hidden Markov Models
Transcript
Page 1: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1

Computational Statistics withApplication to Bioinformatics

Prof. William H. PressSpring Term, 2008

The University of Texas at Austin

Unit 16: Hidden Markov Models

Page 2: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 2

• Markov models– discrete states, discrete time steps– transition between states at each time by specified probabilities– how to find periodicity or reducibility

• take high power of the matrix by successive squares method– “irreducibility and aperiodicity imply ergodicity”

• Hidden Markov Models– we don’t observe the states, but instead “symbols” from them

• which have a probabilistic distribution for each state– forward-backward algorithm estimates the states from the data

• forward (backward) pass incorporates past (future) data at each time– example: gene finding in the galaxy Zyzyx

• (fewer irrelevant complications than in our galaxy)• use NR3’s HMM class

• Baum-Welch re-estimation– uses the data to improve the estimate of the transition and symbol probabilities– it’s an EM method!– pure “unsupervised learning”– we try it on the Zyzyx data

• can find the right answers starting from amazingly crude guesses!

• Hidden Semi-Markov Models [aka Generalized HMMs (GHMMs)]– give each state a specified residence time distribution– can be implemented by expanding the number of states in an HMM– we try it on Zyzyx and get somewhat better gene finding

Unit 16: Hidden Markov Models (Summary)

Page 3: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 3

Markov Models

• Directed graph– may (usually does) have loops

• Discrete time steps• Each time step, state advances

– with probabilities labeled on outgoing edges– self loops also ok

• “Markov” because no memory– knows only what state in now

• Markov models especially important because exists a fast algorithm for “parsing” their state from observed data

– so-called “Hidden Markov Models” (HMMs)

Page 4: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 4

from i to j

population vector

transpose, because of the way “from/to” are defined

A (right) stochastic matrix has non-negative entries with rows summing to 1

Page 5: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 5

Note the two different ways of drawing the same Markov model:

directed graph with loops adding the time dimension, directed graph, no loops (can’t go backward in time)

Page 6: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 6

Xi

(Xj

Aij)s+i ≥

Xj

s+j

Every Markov model has at least one stationary (equilibrium) state

Is there a nullspace? Yes, because columns all sum to zero, hence linearly dependent. But how do we know that this s has nonnegative components?

AT s = s ⇐⇒ (AT − 1) s = 0

define a vector s+ with components s+i ≡ |si|∀j,

Xi

Aijs+i =

Xi

|Aij | |si|

≥¯̄̄̄¯Xi

Aijsi

¯̄̄̄¯ = |sj | = s+j

summing over j,11

so the ≥ must be an = in the sum, and hence also for each term, implying

(This is actually a simple special case of the Perron-Frobenius theorem.)

Xi

Aijs+i = s

+j

Therefore s+ is the desired (positive) stationary state, qed.

Page 7: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 7

Does every Markov model eventually reach a unique equilibrium from all starting distributions (ergodicity)?

Not necessarily. Two things can go wrong:

1. More than one equilibrium distribution. (Fails test of “irreducibility”.)2. Limit cycles. (Fails test of “aperiodicity”.)

The theorem is: Irreducibility and aperiodicity imply ergodicity.

Page 8: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 8

Easy to diagnose a particular Markov model numerically by taking a high power of its transition matrix (by successive squaring)

Say, take AT to the power 232, which requires 2 x 32 M3 operations.

If the columns of the result are all identical, then it’s ergodic. (Done.)

Otherwise:1. Zero rows are states that become unpopulated. Ignore their corresponding columns.2. See if remaining columns are self-reproducing under AT (eigenvectors of unit eigenvalue). When yes, they are equilbria.3. When no, they are part of limit cycles.

So, this example has an unpopulated stateand no equilibria.

Page 9: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 9

In a Hidden Markov Model, we don’t get to observe the states, but instead we see a “symbol” that each state probabilistically emits when it is entered

sequence of (hidden) states

sequence of (observed) symbols

What can we say about the sequence of states, given a sequence of observations?

Page 10: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 10

Let’s try to estimate the probability of being in a certain state at a certain time

Define as the probability of state i at time t given (only) the data up to and including t . “forward estimate”

huge sum over all possible paths! likelihood (or Bayes probability with uniform prior) of that exact path and the exact observed data

As written, this is computationally unfeasible.But it satisfies an easy recurrence!

The Forward-Backward algorithm.

Page 11: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 11

Define as the probability of state i at time t given (only) the data to the future of t . “backward estimate”

uniform prior: no data to the future of N-1

Now, there is a backward recurrence!

And the grand estimate using all the data is (“forward-backward algorithm”)

Likelihood or Bayes probability of the data. Actually, it’s independent of t !You could use its numerical value to compare different models.

Worried about multiplying the α’s and β’s as independent probabilities?Markov guarantees that they are conditionally independent given i, and Pt (i) ∝ Pt (data | i)

Page 12: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 12

Let’s work a biologically motivated example

In the galaxy Zyzyx, the Qiqiqi lifeform has a genome consisting of a linear sequence of amino acids, each chosen from 26 chemical possibilities, denoted A-Z.

Genes alternate with intergenic regions.

In intergenic regions, all A-Z are equiprobable.In genes, the vowels AEIOU are more frequent.

Genes always end with Z.

The length distribution of genes and intergenicregions is known (has been measured).

Can we find the genes?

On Earth, it’s 20 amino acids, with the additional complication of a genetic code mapping three base-4 codons(ACGT) into one a.a. Our example thus simplifies by having no ambiguity on reading frame, and also no ambiguity of strand.

qnkekxkdlscovjfehvesmdeelnzlzjeknvjgetyuhgvxlvjnvqlmcermojkrtuczgrbmpwrjtynonxveblrjuqiydehpzujdogaensduoermiadaaustihpialkxicilgktottxxwawjvenowzsuacnppiharwpqviuammkpzwwjboofvmrjwrtmzmcxdkclvkyvkizmckmpvwfoorbvvrnvuzfwszqithlkubjruoyyxgwvfgxzlzbkuwmkmzgmnsyb

qnkekxkdlscovjfehvesmdeelnzlzjeknvjgetyuhgvxlvjnvqlmcermojkrtuczgrbmpwrjtynonxveblrjuqiydehpzujdogaensduoermiadaaustihpialkxicilgktottxxwawjvenowzsuacnppiharwpqviuammkpzwwjboofvmrjwrtmzmcxdkclvkyvkizmckmpvwfoorbvvrnvuzfwszqithlkubjruoyyxgwvfgxzlzbkuwmkmzgmnsyb

(pvowell = 0.45)

genes

intergenes

Page 13: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 13

The model:

Int i,j,mstat=3;

MatDoub b(mstat,26,0.), a(mstat,mstat,0.);

a[0][0] = 1.-1./250.;

a[0][1] = 1.-a[0][0];

a[1][1] = 1.-1./50.;

a[1][2] = 1.-a[1][1];

a[2][0] = 1.;

for (i=0;i<26;i++) b[0][i] = 1./26.;

for (i=0;i<5;i++) b[1][i] = pvowel/5.;

for (i=5;i<26;i++) b[1][i] = (1.-pvowel)/21.;

b[2][25] = 1.;

HMM hmm(a,b,seq);

hmm.forwardbackward();

The code looks like this (NR3 C++ fragment)

Page 14: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 14

Embed in a mex-file and return hmm.pstate to Matlab.Matlab has its own HMM functions, but I haven’t mastered them. (Could someone do this and show this example?)

actual start

the right “z”

another “z”

state G

state Z

enough chance excess vowels to make it not completely sure!

The forward-backward results on the previous data are:

[pstate pcorrect loglike merit] = hmmmex(0.45,1);plot(1:260,pstate(1:260,2),'b')hold onplot(1:260,pstate(1:260,3),'r') All the C++ code for this

example is on the course website as “hmmmex.cpp” –but beware, it’s not cleaned up or made pretty!

Page 15: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 15

Bayesian re-estimation of the transition and output matrices(Baum-Welch re-estimation)

Given the data, we can re-estimate A as follows

So, estimating as an average over the data,

the backward recurrence says that these are equal

number of i → j transitions

number of i states

(note that L cancels)

Page 16: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 16

Similarly, re-estimate b

number of i states emitting k

number of i states

Hatted A and b are improved estimates of the hidden parameters.With them, you can go back and re-estimate α and β. And so forth.

Does this remind you of the EM method?It should! It is another special case. Can prove by the same kind of convexity method as previously that Baum-Welch re-estimation always increases the overall likelihood L, iteratively to an (as usual possibly only local) maximum.

Notice that re-estimation doesn’t require any additional information, or any training data. It is pure “unsupervised learning”.

Page 17: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 17

Before (previous result) After re-estimation (data size N=105)

how log-likelihood increases with iteration number

parsing (forward-backward) can work on even small fragments, but re-estimation takes a lot of data

Page 18: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 18

On many problems, re-estimation can hill-climb to the “right” answer from amazingly crude initial guesses

a[0][0] = 1.-1./100.;a[0][1] = 1.-a[0][0];a[1][1] = 1.-1./100.;a[1][2] = 1.-a[1][1];a[2][0] = 1.;for (i=0;i<26;i++) {

b[0][i] = 1./26.;b[1][i] = 1./26.;b[2][i] = 1./26.;

}

genes and intergenes alternate and are each about 100 long

there is a one-symbol gene end marker

but we don’t know anything about which symbols are preferred in genes, end-genes, or intergenes

log-likelihood increases monotonically accuracy (in this example we know the “right” answers!)

a period of stagnation is not unusual

this 1st step is an artifact: it is calling all states as 0, which happens to be true ~90% of the time

Page 19: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 19

0.99397 0.00000 1.00000 0.00603 0.96991 0.00000 0.00000 0.03009 0.00000

A 0.03746 0.09039 0.00245 E 0.03935 0.09196 0.00218 I 0.03684 0.08903 0.00266 O 0.03875 0.08992 0.00109 U 0.03740 0.09214 0.00144 B 0.03772 0.02590 0.00096 C 0.03891 0.02716 0.00686 D 0.03945 0.02792 0.00140 F 0.03862 0.02515 0.00037 G 0.03888 0.02505 0.00057 H 0.03884 0.02874 0.00116 J 0.03652 0.02926 0.00188 K 0.03838 0.02777 0.00069 L 0.03836 0.02673 0.00113 M 0.03823 0.02822 0.00035 N 0.03885 0.02639 0.00005 P 0.03888 0.02677 0.00493 Q 0.03880 0.02743 0.00572 R 0.04055 0.02844 0.00115 S 0.03933 0.02862 0.00769 T 0.03923 0.02393 0.00053 V 0.03924 0.02772 0.00057 W 0.03862 0.02826 0.00308 X 0.03820 0.02945 0.00039 Y 0.03871 0.02759 0.00045 Z 0.03586 0.00006 0.95026

A

bYes, it discovered all the vowels.

It discovered Z, but didn’t quite figure out that state 3 always emits Z

1/0.00603 = 1661/0.03009 = 33.2 why these values?

state I state G state Z

Final estimates of the transition and symbol probability matrices:

Page 20: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 20

An obvious flaw in the model: Self-loops in Markov models must always give (discrete approximation of) exponentially distributed residence times

But Qiqiqi genes and intergenes are roughly gamma-law distributed in length(In fact, they’re exactly gamma-law, because that’s how I constructed the genome – not from a Markov model!)

mug = 50.;sigg = 10.;mui = 250.;ag = SQR(mug/sigg);bg = mug/SQR(sigg);Gammadev rani(2.,2./mui,ran.int32());Gammadev rang(ag,bg,ran.int32());

Can we make the results more accurate by somehow incorporating length info?

exit event

waiting time to an event in a Poisson process is exponentially distributed

Page 21: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 21

Generalized Hidden Markov Model (GHMM)also called Hidden Semi-Markov Model (HSMM)

the idea is to impose (or learn by re-estimation) an arbitrary probability distribution for the residency time τ in each state

can be thought of as an ordinary HMM where every state gets expanded into a “timer” cluster

output symbol probabilities identical for all states in a timer(equal those of the state before it was expanded)

arbitrary distribution with τ ≤ n Gamma-law distribution

τ ∼ [p1, (1− p1)p2, (1− p1)(1− p2)p3, . . .] τ ∼ Gamma(α, p)hτi = α

p

Page 22: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 22

So, our intergene-gene-Z example becomes:

Int i,j,n01=n0+n1,mstat=n01+1,niter=40;Doub len0=250.,len1=50.;MatDoub b(mstat,26,0.), a(mstat,mstat,0.);for (i=0;i<n0;i++) {

a[i][i] = 1.-n0/len0;a[i][i+1] = 1.-a[i][i];

}for (i=n0;i<n01;i++) {

a[i][i] = 1.-n1/len1;a[i][i+1] = 1.-a[i][i];

}a[n01][0] = 1.;for (j=0;j<n01;j++) for (i=0;i<26;i++) b[j][i] = 1./26.;b[n01][25] = 1.;

input values for n0 and n1 (we’ll try various choices)initialize the model like this:

initial guess for lengths (need not be this perfect)

Gamma-law timers

tell it about Z, but not about vowels

Page 23: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 23

HMM hmm(a,b,seq);hmm.forwardbackward();for (i=1;i<niter;i++) {

hmm.baumwelch();collapse_to_ghmm(hmm,n0,n1);hmm.forwardbackward();

}

We can use NR3’s HMM class for GHMMs by the kludge of averaging the output probabilities after each Baum-Welch re-estimation

void collapse_to_ghmm(HMM &hmm, Int n0, Int n1) {Int i,j,n01=n0+n1;Doub sum;for (j=0;j<26;j++) {

for (sum=0.,i=0;i<n0;i++) sum += hmm.b[i][j];sum /= n0;for (sum=0.,i=n0;i<n01;i++) sum += hmm.b[i][j];sum /= n1;for (i=n0;i<n01;i++) hmm.b[i][j] = sum;

}}

with

See hmmmex.cpp. Actually this is not quite right, because it should be a weighted average by the number of times each state is occupied. The right way to do this would be to overload hmm.baumwelch with a slightly modified version that does the average properly. In this example the effect would be negligible.

Page 24: Unit 16: Hidden Markov Models - Numerical Recipesnumerical.recipes/.../lectures2008/16-HiddenMarkovModels.pdf · 2008-05-07 · • Hidden Semi-Markov Models [aka Generalized HMMs

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 24

So how well do we do?

accuracy =0.9629

table =0.1466 0.02270.0144 0.8163

Accuracy wrt genes shown as

TP FNFP TN

n0 = n1 = 1 (previous HMM)

accuracy =0.9690

table =0.1498 0.01950.0115 0.8192

n0 = 2, n1 = 5

accuracy =0.9726

table =0.1518 0.01750.0099 0.8208

n0 = 3, n1 = 8typically, for this example, it’s starting a gene ~5 too late

or ~3 too early

sensitivity = TP/(TP+FN)specificity = TN/(FP+TN)

For whole genes (length ~50), the sensitivity and specificity are basically 1.0000, because, with pvowel=0.45, the gene is highly statistically significant. What the HMM or GHMM does well is to call the boundaries as exactly as possible.

Obviously there’s a theoretical bound on the achievable accuracy, given that the exact sequence of an FP or FN might also occur as a TP or TN. Can you calculate or estimate the bound?


Recommended