Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | eustace-oconnor |
View: | 213 times |
Download: | 0 times |
What if a new genome comes?
• We just sequenced the porcupine genome
• We know CpG islands play the same role in this genome
• However, we have no known CpG islands for porcupines
• We suspect the frequency and characteristics of CpG islands are quite different in porcupines
How do we adjust the parameters in our model?
LEARNING
Problem 3: Learning
Re-estimate the parameters of the model based on training
data
Two learning scenarios
1. Estimation when the “right answer” is known
Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good
(experimental) annotations of the CpG islands
GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls
2. Estimation when the “right answer” is unknown
Examples:GIVEN: the porcupine genome; we don’t know how frequent are the
CpG islands there, neither do we know their composition
GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice
QUESTION:Update the parameters of the model to maximize P(x|)
1. When the right answer is known
Given x = x1…xN
for which the true = 1…N is known,
Define:
Akl = # times kl transition occurs in Ek(b) = # times state k in emits b in x
We can show that the maximum likelihood parameters (maximize P(x|)) are:
Akl Ek(b)
akl = ––––– ek(b) = –––––––
i Aki c Ek(c)
1. When the right answer is known
Intuition: When we know the underlying states, Best estimate is the average frequency of transitions & emissions that occur in the training data
Drawback: Given little data, there may be overfitting:P(x|) is maximized, but is unreasonable0 probabilities – VERY BAD
Example:Given 10 casino rolls, we observe
x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3 = F, F, F, F, F, F, F, F, F, F
Then:aFF = 1; aFL = 0eF(1) = eF(3) = .2; eF(2) = .3; eF(4) = 0; eF(5) = eF(6) = .1
Pseudocounts
Solution for small training sets:
Add pseudocounts
Akl = # times kl transition occurs in + rkl
Ek(b) = # times state k in emits b in x + rk(b)
rkl, rk(b) are pseudocounts representing our prior belief
Larger pseudocounts Strong prior belief
Small pseudocounts ( < 1): just to avoid 0 probabilities
' '' '
( ) ( )then , and ( )
( ) ( ( ') ( ))kl kl k k
kl kkl kl k kl b
A r E b r ba e b
A r E b r b
Pseudocounts
Example: dishonest casino
We will observe player for one day, 600 rolls
Reasonable pseudocounts:
r0F = r0L = rF0 = rL0 = 1;
rFL = rLF = rFF = rLL = 1;
rF(1) = rF(2) = … = rF(6) = 20 (strong belief fair is fair)
rL(1) = rL(2) = … = rL(6) = 5 (wait and see for loaded)
Above #s pretty arbitrary – assigning priors is an art
2. When the right answer is unknown
We don’t know the true Akl, Ek(b)
Idea:
• We estimate our “best guess” on what Akl, Ek(b) are
• We update the parameters of the model, based on our guess
• We repeat
The general process for finding θ in this case is1. Start with an initial value of θ.2. Find θ’ so that p(x1,..., xn|θ’) > p(x1,..., xn|θ) 3. set θ = θ’.4. Repeat until some convergence criterion is met.
A general algorithm of this type is the Expectation Maximization algorithm, which we will meet later. For the specific case of HMM, it is the Baum-Welch training.
2. When the right answer is unknown
2. When the right answer is unknown
We don’t know the true Akl, Ek(b)Starting with our best guess of a model M with parameters :
Given x = x1…xN
for which the true = 1…N is unknown,
We can get to a provably more likely parameter set = (akl, ek(b))
Principle: EXPECTATION MAXIMIZATION
1. E-STEP: Estimate Akl, Ek(b) in the training data
2. M-STEP: Update = (akl, ek(b)) according to Akl, Ek(b)
3. Repeat 1 & 2, until convergence
Baum Welch training
We start with some values of akl and ek(b), which define prior values of θ. Baum-Welch training is an iterative algorithm which attempts to replace θ by a θ* s.t.
p(x|θ*) > p(x|θ)Each iteration consists of few steps:
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Baum Welch training
In case 1 we computed the optimal values of akl and ek(b), (for the optimal θ) by simply counting the number Akl of transitions from state k to state l, and the number Ek(b) of emissions of symbol b from state k, in the training set. This was possible since we knew all the states.
Si= lSi-1= k
xi-1= b
… …
xi= c
Baum Welch training
When the states are unknown, the counting process is replaced by averaging process:For each edge si-1 si we compute the average number of “k to l” transitions, for all possible pairs (k,l), over this edge. Then, for each k and l, we take Akl to be the sum over all edges.
Si= ?Si-1= ?
xi-1= b xi= c
… …
Baum Welch training
Similarly, For each edge si b and each state k, we compute the average number of times that si=k, which is the expected number of “k → b” transmission on this edge. Then we take Ek(b) to be the sum over all such edges. These expected values are computed as follows:
Si= ?Si-1= ?
xi-1= b xi= c
s1 SisL
X1 Xi XL
Si-1
Xi-1.. ..
Baum Welch: step 1a Count expected number of state
transitions
For each i and for each k,l, compute the posterior state transitions probabilities:
P(si-1=k, si=l | x,θ)For this, we use the forwards and backwards algorithms
Estimating new parameters
• So,
fk(i) akl el(xi+1) bl(i+1)
Akl = i P(i = k, i+1 = l | x, ) = i –––––––––––––––––
P(x | )
• Similarly,
Ek(b) = [1/P(x | )] {i | xi = b} fk(i) bk(i)
k l
xi+1
akl
el(xi+1)
bl(i+1)fk(i)
x1………xi-1xi+2………xN
xi
Reminder: finding posterior state probabilities
•p(si=k,x) = fk(si) bk(si) (since these are independent events){fk(i) bk(i)} for every i, k are computed by one run of the backward/forward algorithms.
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
fk(i) = p(x1,…,xi,si=k ), the probability that in a path which emits (x1,..,xi), state si=k. bk(i)= p(xi+1,…,xL|si=k), the probability that a path emits (xi+1,..,xL), given that state si=k.
Baum Welch: Step 1a (cont)
Claim:
s1 SisL
X1 Xi XL
Si-1
Xi-1.. ..
)|(
)()()(),|,(
xp
ibxeaifxlsksp lilklk
ii1
1
(akl and el(xi) are the parameters defined by , and fk(i-1), bk(i) are the forward and backward functions)
Step 1a: Computing P(si-1=k, si=l | x,θ)
P(x1,…,xL,si-1=k,si=l|) = P(x1,…,xi-1,si-1=k|) aklel(xi ) P(xi+1,…,xL |si=l,)
= fk(i-1) aklel(xi ) bl(i)
Via the forward algorithm
Via the backward algorithm
s1 s2 sL-1 sL
X1 X2 XL-1 XL
Si-1
Xi-1
si
Xi
x
p(si-1=k,si=l | x, ) = fk(i-1) aklel(xi ) bl(i)
)|( xp
Step 1a (end)
For each pair (k,l), compute the expected number of state transitions from k to l, as the sum of the expected number of k to l transitions over all L edges :
)()()1()|(
1
),|,()|(
1
1
11
ibxeaifxp
xlskspxp
A
li
L
ilklk
L
iiikl
Step 1a for many sequences:When we have n input sequences (x1,..., xn ), then Akl is given by:
11 1
1 1
1( = , = , )
( )
1( 1) ( ) ( )
( )
|n L
kl i ijj i
n Lj j
kl l ik ljj i
jA p s k s lp x
f i a e x b ip x
x
where and are the forward and backward algorithms
for under .
j jk lf b
jx
Baum-Welch: Step 1b count expected number of symbols emissions
for state k and each symbol b, for each i where Xi=b, compute the expected number of times that Si=k.
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi=b
),...,(/)()(
),...(/),...(
),...|(
Likik
LiL
Li
xxpsbsf
xxpksxxp
xxksp
1
11
1
Baum-Welch: Step 1b
For each state k and each symbol b, compute the expected number of emissions of b from k as the sum of the expected number of times that si = k, over all i’s for which xi = b.
bxi
kkk
i
ibifxp
bE:
)()()|(
)(
1
Step 1b for many sequences
When we have n sequences (x1,..., xn ), the expected number of emissions of b from k is given by:
1 :
1( ) ( ) ( )
( ) ji
nj j
k k kjj i x b
E b f i b ip x
Summary of Steps 1a and 1b: the E part of the Baum Welch training
These steps compute the expected numbers Akl of k,l transitions for all pairs of states k and l, and the expected numbers Ek(b) of transmitions of symbol b from state k, for all states k and symbols b.
The next step is the M step, which is identical to the computation of optimal ML parameters when all states are known.
Baum-Welch: step 2
'' '
( ) , and ( )
( ')kl k
kl kkl kl b
A E ba e b
A E b
Use the Akl’s, Ek(b)’s to compute the new values of akl and ek(b). These values define θ*.
The correctness of the EM algorithm implies that: p(x1,..., xn|θ*) p(x1,..., xn|θ)
i.e, θ* increases the probability of the data
This procedure is iterated, until some convergence criterion is met.
The Baum-Welch AlgorithmInitialization:
Pick the best-guess for model parameters
(or arbitrary)
Iteration:1. Forward
2. Backward
3. Calculate Akl, Ek(b)
4. Calculate new model parameters akl, ek(b)
5. Calculate new log-likelihood P(x | )
GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION
Until P(x | ) does not change much
Viterbi training: maximizing the probabilty of the most probable path
States are unknown.Viterbi training attempts to maximizes the probability of a most probable path, ie the value of
p(s(x1),..,s(xn) , x1,..,xn |θ)Where s(xj) is the most probable (under θ) path for xj.We assume only one sequence (n=1).
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Viterbi training (cont)
Start from given values of akl and ek(b), which define prior values of θ. Each iteration:Step 1: Use Viterbi’s algoritm to find a most probable path s(x) , which maximizes p(s(x), x|θ).
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Viterbi training (cont)
Step 2. Use the ML method for HMM with known parameters, to find θ* which maximizes p(s(x) , x|θ*)
Note: In Step 1. the maximizing argument is the path s(x), in Step 2. it is the parameters θ*.
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Viterbi training (cont)
3. Set θ=θ* , and repeat. Stop when paths are not changed.
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Claim 2 : If s(x) is the optimal path in step 1 of two different iterations, then in both iterations θ has the same values, and hence
p(s(x) , x |θ) will not increase in any later iteration. Hence the algorithm can terminate in this case.
Coin-Tossing Example
0.9
Fair loaded
head head
tailtail
0.9
0.1
0.1
1/2 1/4
3/41/2
H1 H2 HL-1 HL
X1 X2 XL-1 XL
Hi
Xi
L tosses Fair/Loaded
Head/Tail
Start
1/2 1/2
Example : Homogenous HMM, one sample
Start with some probability tables Iterate until convergenceE-step: Compute p (hi|hi -1,x1,…,xL) from p(hi, hi -1 | x1,…,xL) which is computed using the forward- backward algorithm as explained earlier.
M-step: Update the parameters simultaneously: i p(hi=1 | hi-1=1, x1,…,xL)+p(hi=0 | hi-1=0, x1,…,xL)/(L-1)
H1 H2 HL-1 HL
X1 X2 XL-1 XL
Hi
Xi
1
1)|( 1ii hhp
5.0
5.0)()|( 101 hphhp
H1 H2 HL-1 HL
X1 X2 XL-1 XL
Hi
Xi
Coin-Tossing Example
9.01.0
1.09.0P
Numeric example: 3 tosses
Outcomes: head, head, tail
Coin-Tossing ExampleNumeric example: 3 tosses
Outcomes: head, head, tail
P(x1=head,h1=loaded)= P(loaded1) P(head| loaded1)= 0.5*0.75=0.375
P(x1=head,h1=fair)= P(fair1) P(head| fair1)= 0.5*0.5=0.25
First coin is loaded {step 1- forward}
F(hi)=P(x1,…,xi,hi) = P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)
hi-1
Recall:
Coin-Tossing Example - forwardNumeric example: 3 tosses Outcomes: head, head, tail
P(x1,…,xi,hi) = P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)hi-1
P(x1=head,h1=loaded)= P(loaded1) P(head| loaded1)= 0.5*0.75=0.375
P(x1=head,h1=fair)= P(fair1) P(head| fair1)= 0.5*0.5=0.25
{step 1}
P(x1 =head,x2 =head,h2 =loaded) = P(x1,h1) P(h2 | h1) P(x2 | h2) =p(x1 =head , loaded1) P(loaded2 | loaded1) P(x2 =head | loaded2) +p(x1 =head , fair1) P(loaded2 | fair1) P(x2 =head | loaded2) = 0.375*0.9*0.75 + 0.25*0.1*0.75=0.253125+ 0.01875= 0.271875
h1
{step 2}
P(x1 =head,x2 =head,h2 =fair) = p(x1 =head , loaded1) P(fair2 | loaded1) P(x2 =head | fair2) +p(x1 =head , fair1) P(fair2 | fair1) P(x2
=head | fair2) = 0.375*0.1*0.5 + 0.25*0.9*0.5= 0.01875 + 0.1125= 0.13125
Coin-Tossing Example - forwardNumeric example: 3 tosses Outcomes: head, head, tail
P(x1,…,xi,hi) = P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)hi-1
P(x1 =head,x2 =head,h2 =loaded) = 0.271875
P(x1 =head,x2 =head,h2 =fair) = 0.13125
{step 2}
P(x1 =head,x2 =head, x3 =tail ,h3 =loaded) = P(x1, x2 ,h2) P(h3 | h2) P(x3 | h3) = p(x1 =head , x2 =head, loaded2) P(loaded3 | loaded2) P(x3
=tail | loaded3) +p(x1 =head , x2 =head, fair2) P(loaded3 | fair2) P(x3 =tail | loaded3) = 0.271875 *0.9*0.25 + 0.13125 *0.1*0.25=0.6445
h2
{step 3}
P(x1 =head,x2 =head, x3 =tail ,h3 =fair) = p(x1 =head , x2 =head, loaded2) P(fair3 | loaded2) P(x3 =tail | fair3) +p(x1 =head , x2 =head, fair2) P(fair3 | fair2) P(x3 =tail | fair3) = 0.271875 *0.1*0.5 + 0.13125 *0.9*0.5=0.07265
Coin-Tossing Example - backwardNumeric example: 3 tosses Outcomes: head, head, tail
b(hi) = P(xi+1,…,xL|hi)= P(xi+1,…,xL|hi) = P(hi+1 | hi) P(xi+1 | hi+1) b(hi+1)
P(x3=tail | h2=loaded)=P(h3=loaded | h2=loaded) P(x3=tail | h3=loaded)+
P(h3=fair | h2=loaded) P(x3=tail | h3=fair)=0.9*0.25+0.1*0.5=0.275
P(x3=tail | h2=fair)=P(h3=loaded | h2=fair) P(x3=tail | h3=loaded)+
P(h3=fair | h2=fair) P(x3=tail | h3=fair)=0.1*0.25+0.9*0.5=0.475
{step 1}
hi+1
Coin-Tossing Example - backwardNumeric example: 3 tosses Outcomes: head, head, tail
P(x3=tail | h2=loaded)=0.275
P(x3=tail | h2=fair)=0.475
{step 1}
P(x2 =head,x3 =tail | h1 =loaded) = P(loaded2 | loaded1) *P(head| loaded)* 0.275 +P(fair2 | loaded1) *P(head|fair)*0.475=0.9*0.75*0.275+0.1*0.5*0.475=0.209
{step 2}
P(x2 =head,x3 =tail | h1 =fair) = P(loaded2 | fair1) *P(head|loaded)* 0.275 +P(fair2 | fair1) * P(head|fair)*0.475=0.1*0.75*0.275+0.9*0.5*0.475=0.234
b(hi) = P(xi+1,…,xL|hi)= P(xi+1,…,xL|hi) = P(hi+1 | hi) P(xi+1 | hi+1) b(hi+1)
hi+1
p(x1,…,xL,hi ,hi+1)=f(hi) p(hi+1|hi) p(xi+1| hi+1) b(hi+1)
Coin-Tossing ExampleOutcomes: head, head, tail
f(h1=loaded) = 0.375 , f(h1=fair) = 0.25
b(h2=loaded) = 0.275 , b(h2=fair) = 0.475
P(x1=head,h1=loaded)= P(loaded1) P(head| loaded1)= 0.5*0.75=0.375
P(x1=head,h1=fair)= P(fair1) P(head| fair1)= 0.5*0.5=0.25
{step 1}Recall:
Coin-Tossing Example
Outcomes: head, head, tail
f(h1=loaded) = 0.375 , f(h1=fair) = 0.25
b(h2=loaded) = 0.275 , b(h2=fair) = 0.475
p(x1,…,xL,h1 ,h2)=f(h1) p(h1|h2) p(x2| h2) b(h2)
p(x1,…,xL,h1=loaded ,h2=loaded)=0.375*0.9*0.75*0.275=0.0696
p(x1,…,xL,h1=loaded ,h2=fair)=0.375*0.1*0. 5*0.475=0.0089p(x1,…,xL,h1=fair ,h2=loaded)=0.25*0.1*0.75*0.275=0.00516p(x1,…,xL,h1=fair ,h2=fair)=0.25*0.9*0. 5*0.475=0.0534
Coin-Tossing Example
p(hi|hi -1,x1,…,xL)=p(x1,…,xL,hi ,hi-1)/p(hi-1,x1,…,xL)
f(hi-1)*b(hi-1)
=f(hi-1) p(hi-1|hi) p(xi| hi) b(hi)/(f(hi-1)*b(hi-1))
M-step
M-step: Update the parameters simultaneously:
(in this case we only have one parameter - )
(i p(hi=loaded | hi-1=loaded, x1,…,xL)+
p (hi=fair | hi-1=fair, x1,…,xL))/(L-1)
Variants of HMMs
Higher-order HMMs
• How do we model “memory” larger than one time point?
• P(i+1 = l | i = k) akl
• P(i+1 = l | i = k, i -1 = j) ajkl
• …• A second order HMM with K states is equivalent to a first order
HMM with K2 states
state HH state HT
state TH state TT
aHHT
aTTH
aHTTaTHH aTHT
aHTH
Modeling the Duration of States
Length distribution of region X:
E[lX] = 1/(1-p)
• Geometric distribution, with mean 1/(1-p)
This is a significant disadvantage of HMMs
Several solutions exist for modeling different length distributions
X Y
1-p
1-q
p q
Solution 1: Chain several states
X Y
1-p
1-q
p
qXX
Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)
Solution 2: Negative binomial distribution
Duration in X: m turns, where– During first m – 1 turns, exactly n – 1 arrows to next state are
followed– During mth turn, an arrow to next state is followed
m – 1 m – 1
P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n
X
p
XX
p
1 – p 1 – p
p
…… Y
1 – p
Example: genes in prokaryotes
• EasyGene:
Prokaryotic
gene-finder
Larsen TS, Krogh A
• Negative binomial with n = 3
Solution 3: Duration modeling
Upon entering a state:
1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probs
Disadvantage: Increase in complexity:
Time: O(D2)Space: O(D)
where D = maximum duration of state
X
Learning – EM in ABO locus
Tutorial #08
© Ydo Wexler & Dan Geiger
Example: The ABO locusA locus is a particular place on the chromosome. Each locus’ state (called genotype) consists of two alleles – one parental and one maternal. Some loci (plural of locus) determine distinguished features. The ABO locus, for example, determines blood type.
N
N
N
N
N
N
N
N
N
N
N
N oooo
baba
obob
bbbb
oaoa
aaaa
//
//
//
//
//
// ,,,,,
Suppose we randomly sampled N individuals and found that Na/a have genotype a/a, Na/b have genotype a/b, etc. Then, the MLE is given by:
The ABO locus has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. The first two genotypes determine blood type A, the next two determine blood type B, then blood type AB, and finally blood type O.
We wish to estimate the proportion in a population of the 6 genotypes.
The ABO locus (Cont.)However, testing individuals for their genotype is a very expensive test. Can we estimate the proportions of genotype using the common cheap blood test with outcome being one of the four blood types (A, B, AB, O) ?The problem is that among individuals measured to have blood type A, we don’t know how many have genotype a/a and how many have genotype a/o. So what can we do ?We use the Hardy-Weinberg equilibrium rule that tells us that in equilibrium the frequencies of the three alleles a,b,o in the population determine the frequencies of the genotypes as follows: a/b= 2a b, a/o= 2a o, b/o= 2b o, a/a= [a]2, b/b= [b]2, o/o= [o]2. So now we have three parameters that we need to estimate.
The Likelihood FunctionLet X be a random variable with 6 values xa/a, xa/o ,xb/b, xb/o, xa/b , xo/o denoting the six genotypes. The
parameters are = {a ,b, o}.
The probability P(X= xa/b | ) = 2a b.
The probability P(X= xo/o | ) = o o. And so on for the other four genotypes.
What is the probability of Data={B,A,B,B,O,A,B,A,O,B, AB} ?
215232 222)|( oobaobboaaDataP
Obtaining the maximum of this function yields the MLE.
ABO loci as a special case of HMM
Model the ABO sampling as an HMM with 6 states (genotypes): a/a, a/b, a/o, b/b, b/o, o/o, and 4 outputs (blood types): A,B,AB,O. Assume 3 transitions types: a, b and o, and a state is determined by 2 successive transitions. The probability of transition x is x .
Emission is done every other state, and is determined by the state.Eg, ea/o(A)=1, since a/o produces blood type A.
aoa/o a/b
A AB
a/b
AB
b b a a
A faster and simpler EM for ABO loci
Can be solved via the Baum-Welch EM training. This is quite inefficient: for L sampling it requires running the forward and backward algorithm on HMM of length 2L, even that there are only 6 distinct genotypes. Direct application of the EM algorithm yields a simpler and more efficient way.
The EM algorithm in Bayes’ nets
E-step
Go over the data:
Sum the expectations of a hidden variables that you get from this data element
M-step
For every hidden variable x
Update your belief according to the expectation you calculated in the last E-step
EM - ABO Example
a/b/o (hidden)
A / B / AB / O (observed)
Datatype #people
A 100
B 200
AB 50
O 50
We choose a “reasonable” = {0.2,0.2,0.6}
= {a ,b, o} is the parameter we need to evaluate
EM - ABO Example
E-step:
n
iimlLElE
11 |
li
iii mlLP
mlLPmlLPmlLE
),(
),()|(|
1
111
M-step:
l
i
iil lE
lE
]|[
]|[1
With l = allele and m = blood type
EM - ABO Example
E-step:
we compute all the necessary elements 0),( 1 ABMoLP
baABMaLP ),( 1
baABMbLP ),( 1
oaAMoLP ),( 1
)(),( 1 oaaAMaLP
0),( 1 AMbLP
21 ),( oOMoLP
0),( 1 OMaLP
0),( 1 OMbLP
obBMoLP ),( 1
)(),( 1 obbBMbLP
0),( 1 BMaLP
EM - ABO Example
0= {0.2,0.2,0.6}
n=400 (data size)
E-step (1st step):
m
obal
m
n
iobal
i
i
mlLP
maLPn
mlLP
maLPaE
,,1
1
1,,
1
1
),(
),(
),(
),(
Datatype #people
A 100
B 200
AB 50
O 50
EM - ABO Example
0= {0.2,0.2,0.6}
n=400 (data size)
obalobalobalobal
mobal
m
n
iobal
i
i
OlP
OaP
ABlP
ABaP
BlP
BaP
AlP
AaP
mlLP
maLPn
mlLP
maLPaE
,,,,,,,,
,,1
1
1,,
1
1
),(
),(50
),(
),(50
),(
),(200
),(
),(100
),(
),(
),(
),(
Datatype #people
A 100
B 200
AB 50
O 50E-step (1st step):
EM - ABO Example
0= {0.2,0.2,0.6}
n=400 (data size)
800502
500200)2(
)(100
),(
),(50
),(
),(50
),(
),(200
),(
),(100
),(
),(
),(
),(
,,,,,,,,
,,1
1
1,,
1
1
ba
ba
oaa
oaa
obalobalobalobal
mobal
m
n
iobal
i
i
OlP
OaP
ABlP
ABaP
BlP
BaP
AlP
AaP
mlLP
maLPn
mlLP
maLPaE
Datatype #people
A 100
B 200
AB 50
O 50E-step (1st step):
EM - ABO Example
0= {0.2,0.2,0.6}
n=400 (data size)
Datatype #people
A 100
B 200
AB 50
O 50
1400502
50)2(
)(2000100
),(
),(50
),(
),(50
),(
),(200
),(
),(100
),(
),(
),(
),(
,,,,,,,,
,,1
1
1,,
1
1
ba
ba
obb
obb
obalobalobalobal
mobal
m
n
iobal
i
i
OlP
ObP
ABlP
ABbP
BlP
BbP
AlP
AbP
mlLP
mbLPn
mlLP
mbLPbE
E-step (1st step):
EM - ABO Example
0= {0.2,0.2,0.6}
n=400 (data size)
180500)2(
200)2(
100
),(
),(50
),(
),(50
),(
),(200
),(
),(100
),(
),(
),(
),(
,,,,,,,,
,,1
1
1,,
1
1
obb
ob
oaa
oa
obalobalobalobal
mobal
m
n
iobal
i
i
OlP
OoP
ABlP
ABoP
BlP
BoP
AlP
AoP
mlLP
moLPn
mlLP
moLPoE
Datatype #people
A 100
B 200
AB 50
O 50E-step (1st step):
EM - ABO Example 0= {0.2,0.2,0.6}
n=400 (data size)
M-step (1st step):
20.018014080
80
]|[
]|[0
01
l
a lE
aE
35.018014080
140
]|[
]|[0
01
l
b lE
bE
45.018014080
180
]|[
]|[0
01
l
o lE
oE
EM - ABO Example1= {0.2,0.35,0.45}
E-step (2nd step):
842
50)2(
)(100
),(
),(
,,1
1
ba
ba
oaa
oaa
mobal
m mlLP
maLPnaE
1522
50)2(
)(200
ba
ba
obb
obbbE
16450)2(
200)2(
100
obb
ob
oaa
oaoE
EM - ABO Example
M-step (2nd step):
21.016415284
84
]|[
]|[1
12
l
a lE
aE
38.016415284
152
]|[
]|[1
12
l
b lE
bE
41.016415284
164
]|[
]|[1
12
l
o lE
oE
1= {0.2,0.35,0.45}
EM - ABO Example2= {0.21,0.38,0.41}
E-step (3rd step):
842
50)2(
)(100
ba
ba
oaa
oaaaE
1562
50)2(
)(200
ba
ba
obb
obbbE
16050)2(
200)2(
100
obb
ob
oaa
oaoE
EM - ABO Example
M-step (3rd step):
21.016015684
84
]|[
]|[1
12
l
a lE
aE
39.016015684
156
]|[
]|[1
12
l
b lE
bE
40.016015684
160
]|[
]|[1
12
l
o lE
oE
2= {0.29,0.56,0.15}
No change