Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | nelson-eppes |
View: | 216 times |
Download: | 0 times |
Large Vocabulary Large Vocabulary Unconstrained Unconstrained Handwriting Handwriting RecognitionRecognition
J Subrahmonia
Pen Technologies
IBM T J Watson Research Center
Pen Technologies
Pen-based interfaces in mobile computing
Mathematical Formulation
H : Handwriting evidence on the basis of which a recognizer will make its decision– H = {h1, h2, h3, h4,…,hm}
W : Word string from a large vocabulary– W = {w1, w2, w3, w4,…., wn}
Recognizer :– )|( HWW p
Wargmax
Mathematical Formulation
)()|(
)(
)()|(
)|(
WWH
H
WWH
HWW
pp
p
pp
p
W
W
W
argmax
argmax
argmax
SOURCECHANNEL
Source Channel Model
WRITER DIGITIZER FEATURE EXTRACTOR
DECODER
H
W
CHANNEL
Source Channel Model
)()|(
)|(
WWH
HWW
pp
p
W
W
argmax
argmax
Handwriting Modeling : HMMs
LanguageModeling
SEARCH STRATEGY
Hidden Markov Models
Memoryless Model
Add Memory
Hide Something
Markov Model Mixture Model
Hide Something
Add Memory
Hidden Markov Model
Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988
Memoryless ModelCOIN : Heads (1) : probability p Tails (0) : probability 1-p
Flip the coin 10 times (IID Random sequence)
Sequence : 1 0 1 0 0 0 1 1 1 1
Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p = p)-(1p
46
Add Memory – Markov Model2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9
Experiment :Flip COIN 1, Note the outcomeIf ( outcome = Head) Flip Coin 1Else Flip Coin 2End
Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9Sequence 1010 : Probability = 0.9*0.1*0.1*0.1
State Sequence Representation
1 2
1 : 0.9
0 : 0.1
1 : 0.1
0 : 0.9
Observed Output Sequence Unique State Sequence
Hide the states => Hidden Markov Model
s1 s2
0.9
0.1
0.1
0.90.90.1
0.10.9
0.10.9
0.90.1
Why use Hidden Markov Models Instead of Non-hidden?
Hidden Markov Models can be smaller – less parameters to estimate
States may be truly hidden– Position of the hand– Positions of articulators
Summary of HMM Basics We are interested in assigning probabilities p(H)
to feature sequences Memoryless model
– This model has no memory of the past Markov noticed that is some sequences the future
depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future
Hide the states : HMM
n
i
pp1
)()( hiH
)|()|( 1 ispp hih11,...,hihi
),()( SHH ppS
Hidden Markov Models
Given a observed sequence H– Compute p(H) for decoding– Find the most likely state sequence for a
given Markov model (Viterbi algorithm)– Estimate the parameters of the Markov
source (training)
Compute p(H)
s1 s3
0.5
0.3
0.2
0.4p(a)p(b)
0.50.5
0.70.3
0.5
0.1
s20.30.7
0.80.2
Compute p(H) – contd.
Compute p(H) where H = a a b b Enumerate all ways of producing h1=a
s1 s1
s2
s2 s2
s2 s3
0.5x0.8
0.3x0.7
0.2
0.4x0.5
0.5x0.3
0.2
0.40
0.21
0.04
0.03
Compute p(H) – contd. Enumerate all ways of producing
h1=a h2=a
s1 s1
s2
s2 s2
s2 s3
0.5x0.8
0.3x0.7
0.2
0.4x0.5
0.5x0.3
0.2
s1
s2
s2 s2
s2 s3
0.5x0.8
0.3x0.70.2
0.4x0.5
0.5x0.3
0.2
s2
s3
0.4x0.5
0.5x0.3
Compute p(H)
Can save computation by combining paths
s1 s1
s2
s2
s2 s3
s1
s2
s2
s2 s3
s2
s3
Compute p(H)
Trellis Diagram
s1
s2
s3
0 a aa aab aabb
.5x.8 .5x.8 .5x.2 .5x.2
.4x.5 .4x.5 .4x.5 .4x.5
.3x.7 .3x.7 .3x.3 .3x.3
.5x.3 .5x.3 .5x.7 .5x.7
.2 .2 .2 .2 .2
.1 .1 .1 .1 .1
Basic Recursion Prob (Node) =
sum (Prob(predecessor) x Prob (predecessor->node) ) Boundary condition : Prob (s, 0) = 1
s1
s2
s3
0 a aa aab aabb
1.0 s1, a : 0.4
1.0 0.4 .16 .016 .0016
s1, a : 0.4 s1, a : 0.4 s1, a : 0.4
s1, 0 : .08s1, a : .21s2, a : .04
0.20.33 .182 .054 .01256
s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066
s1, 0 : .0032s1, b : .0144s2, b : .0364
s1, 0 : .00032s1, b : .00144s2, b : .0108
s2, 0 : .033s1, a : .03
0.02 0.063 .0677 .0691 .020156
s2, 0 : 0.02s2, 0 : .0182s2, a : .0495
s2, 0 : .0054s2, b : .0637
s2, 0 : .001256s2, b : .0189
More Formally –Forward Algorithm
)|()(
)|()|()(
)(
1
ssPs
ssPssPs
s
st
st
t
ht
Find Most Likely Path for aabb- Dynamic Prog. or Viterbi
Max Prob (Node) =
MAX(Max(predecessor) x Prob (predecessor->node) )
s1
s2
s3
0 a aa aab aabb
1.0 s1, a : 0.4 s1, a : .16 s1, b : .016 s1,b : .0016
s1, 0 : .08s1, a : .21s2, a : .04
s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066
s1, 0 : .0032s1, b : .0144s2, b : .0168
s1, 0 : .00032s1, b : .00144s2, b : .00336
s2, 0 : .021s1, a : .03
s2, 0 : 0.02s2, 0 : .0084s2, a : .0315
s2, 0 :.00168s2, b : .0294
s2, 0 : .000336s2, b : .00588
Training HMM parameters1/3
1/3
1/3
1/2
1/2
1/21/2
p(a)p(b) =H = abaa
.000385 .000578 .000868
.001302 .001157 .002604 .001736
p(H) = .008632
Training HMM parameters1t
2t
3t
4t
5t
ic = A posterior probability of path i = )(Hppi
1c 2c 3c 4c 5c 6c 7c.045 .067 .134 .100 .201 .150 .301
46.0)(
:
363.0)(
637.0)(
838.0223)(
1
64213
7532
543211
tp
New
cccctc
ccctc
ccccctc
34.0)( 2 tp 20.0)( 3 tp
Training HMM parameters
1t
2t
4t
5t
29.0),(
71.0),(
:
246.0),(
592.02),(
1
1
3211
543211
btp
atp
New
cccbtc
cccccatc
Training HMM parameters
.71
.29.68.32
.64
.36
.60
.40
.34
.46
.20
.60
.40
1p 2p 3p 4p 5p 6p 7p
0.00108 0.00129 0.00404 0.00212 0.00537 0.00253 0.00791
008632.002438.0)( Hp
Keep on repeating : 600 iterations : p(H) = .037037037Another initial parameter set : p(H) = 0.0625
Training HMM parameters
Converges to local maximum There are 7 (atleast) local maxima Final solution depends on starting point Speed of convergence depends on
starting point
Training HMM parameters : Forward Backward algorithm
Improves on enumerating algorithm by using the Trellis
Results in reduction from exponential computation to linear computation
Forward Backward Algorithm
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
asas
as
bs
j
Forward Backward Algorithm
= Probability that hj is produced by and the complete output is H
=
),( Htp ij
it)().()().(1 bjiiaj st|PtPs hj
)(1 aj s = Probability of being in state and producing the output h1, .. hj-1
as
)( bj s = Probability of being in state and producing the output hj+1,..hm
bs
Forward Backward Algorithm
Transition count
)(/),()|( HHH ptptC ji
)|()(
)|()|()(
)(
1
ssPs
ssPssPs
s
st
st
t
1ht
Training HMM parameters Guess initial values for all parameters Compute forward and backward pass
probabilities Compute counts Re-estimate probabilities
BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M