Compact WFSA basedLanguage Model and Its Application
in Statistical Machine Translation
Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu
Interactive Digital Media Technology Research Center, CASIA
2
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
3
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
4
Task
N-gram Language Model assign probabilities to string of words or tokens Let wL denote a string of L tokens over a fixed
vocabulary
smoothing techniques– back-off
– Define
1 11 1 1
1 1
ˆ( ) ( | ) ( | )L L
L i ii i i n
i i
P w P w w P w w
1 1 11 1
1 2
1 11 1
( ) ( | )
( ) ( ) others
( ) 1.0
i ii i k i k
i i k i ii k i k
i ii k i k
w w LMP w w
w P w
w w LM
5
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
6
Problems
Query in trie structure Useless queries Problems in Forward
Query Problems in Back-off
Query
7
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
8
Solution
Another point of view a random procedure a continuous process
Benefit Speed up Forward Query Speed up Back-off Query
Goal Fast Compact
9
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
10
Our Approaches
FAST WFSA 5-turple M=(Q, Σ, I, F, δ )
Definition
Q a set of states
I a set of initial states
F a set of final states
Σa alphabet which represents the input and output labels
δ δ Q×(Σ∪{ε}), a transition relation
( ) { | ( , ) }i iL M w q w F
11
Our Approaches
FAST WFSA 5-turple M=(Q, Σ, I, F, δ )
Example
Q a set of states
I a set of initial states
F a set of final states
Σa alphabet which represents the input and output labels
δ δ Q×(Σ∪{ε}), a transition relation
a bw
12
Our Approaches
Compact Trie Sort Array
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
13
Our Approaches
Compact Trie Sort Array Link index
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
14
Our Approaches
WFSA-based LM Trie structure
Note:– Tf triggers corresponding to forward query– Tb triggers spontaneously without any input
– reaches to the leaves– carries out back-off queries
Q the nodes in trie
I the root of trie
F Each node of trie except the root
Σ the alphabet of input sentences
δ forward transition Tf and roll-back transition Tb
15
Our Approaches
WFSA-based LM
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
16
Our Approaches
WFSA-based LM
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
Probability
Back-off
Index
Probability
Back-off
Index
Roll-back index
17
Our Approaches
WFSA-based LM
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
Probability
Back-off
Index
Probability
Back-off
Index
Roll-back index
Cross Layer
18
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
19
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
20
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
21
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
22
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
23
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
24
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
25
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
26
Our Approaches
Query Method
1w
2w
3w 4w
4w
2w 3w
3w 4w
5w4w
4w 6w
5w
5w5w
6w
order = 1
order = 2
order = 3
order = 4
6w
27
Our Approaches
State Transitions
28
Our Approaches
Query LM
29
Our Approaches
For HPB SMT For a source sentence
– A huge number of LM queries– Ten Millions– Most of these are repetitive
Hash cache
Hash
WFSA-based LM
Yes
No
Query
30
Our Approaches
For HPB SMT Hash cache
– Small & fast– Hash size 24bit
– 16M
– Simple operation– Additive Operation– Bitwise Operation
Hash clear– For each sentence
s1 w1
state input
s2 w2
sn wn
……
……
sa
value
sb
sy
……
sx
sz
31
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
32
Results
Setup LM Toolkit: SRILM Decoder: Hierarchical phrase-based translation
system Test data: IWSLT-07(489) & NIST-06(1664) Training data:
Tasks ModelParallel
sentencesChinese words
English words
IWSLT-07TM[1] 0.38M 3.0M 3.1M
LM[2] 1.3M —— 15.2M
NIST-06TM[3] 3.4M 64M 70M
LM[4] 14.3M —— 377M
[1] The parallel corpus of BTEC (Basic Traveling Expression Corpus) and CJK (China-Japan-Korea corpus)[2] The English corpus of BTEC+CJK+CWMT2008[3] LDC2002E18, LDC2002T01, LDC2003E07, LDC2003E14, LDC2003T17, LDC2004T07, LDC2004T08, LDC2005T06, LDC2005T10, LDC2005T34, LDC2006T04, LDC2007T09[4] LDC2007T07
33
Results
Storage Space The storage sizes increase about 35% Linearly dependent with the nodes of trie Acceptable
Tasks n-grams SRILM (Mb) WFSA (Mb) Δ (%)
IWSLT-074 65.7 89.1 35.6
5 89.8 119.5 33.1
NIST-064 860.3 1190.4 38.4
5 998.5 1339.7 34.2
The comparison of LM size between SRILM and WFSA
34
Results
Query Speed WFSA
– 60% in 4-grams– 70% in 5-grams
WFSA+cache– Speed up by 75%
n-grams methods IWSLT-07(s) NIST-06(s)
4
SRILM 163 15433
WFSA 70 6251
WFSA+cache 42 3907
5
SRILM 261 25172
WFSA 85 7944
WFSA+cache 59 6128
35
Results
Analysis Repetitive queries and back-off queries in SMT 4-gram
– back-off queries are widely existed– most of these queries are repetitive
WFSA based LM can speed up queries effectively
Tasks Back-off Repetitive
IWSLT-07 60.5% 95.5%
NIST-06 60.3% 96.4%
36
Outline
Task
Problems
Solution
Our Approach
Results
Conclusion
37
Conclusion
A faster WFSA-based LM Faster forward query Faster back-off query
A compact WFSA-based LM Trie structure
A simple caching technique For SMT system
Other fields Speech recognition Information retrieval
Thanks!