Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 1 times |
Using CTW as a language modeler in Dasher
Martijn van Veen
05-02-2007
Signal Processing Group
Department of Electrical Engineering
Eindhoven University of Technology
2/21
OverviewOverview
• What is Dasher– And what is a language model
• What is CTW– And how to implement it in Dasher
• Decreasing the model costs
• Conclusions and future work
3/21
DasherDasher
• Text input method
• Continuous gestures
• Language model
• Let’s give it a try!Dasher
4/21
Dasher: Language ModelDasher: Language Model
• Conditional probability for each alphabet symbol, given the previous symbols
• Similar to compression methods
• Requirements: – Sequential– Fast– Adaptive
• Model is trained
• Better compression -> faster text input
5/21
Dasher: Language modelDasher: Language model• PPM: Prediction by Partial Match• Predictions by models of different order• Weight factor for each model
6/21
Dasher: Language modelDasher: Language model
• Asymptotically PPM reduces to fixed order context model
• But the incomplete model works better!
7/21
CTW: Tree modelCTW: Tree model
• Source structure in the model, parameters memoryless
• KT estimator:a = number of zeros b = number of ones
8/21
CTW: Context treeCTW: Context tree
• Context-Tree Weighting: combine all possible tree models up to a maximum depth
10/21
CTW: ImplementationCTW: Implementation
• Current implementation– Ratio of block probabilities stored in each
node– Efficient but patented
• Develop a new implementation– Use only integer arithmetic, avoid divisions – Represent both block probabilities as fractions– Ensure denominators equal by cross-
multiplication– Store the numerators, scale if necessary
12/21
ResultsResults
• Comparing PPM and CTW language models
– Single file
– Model trained with English text
– Model trained with English text and user input
Input file CTW PPM Differenc
e
Book 2 2.632 2.876 8.48 %
NL 4.356 5.014 13.12 %
Input file CTW PPM Difference
GB 2.847 3.051 6.69 %
Book 2 2.380 2.543 6.41 %
Book 2 2.295 2.448 6.25 %
Input file CTW PPM Difference
Book 2 1.979 2.177 9.10 %
NL 2.364 2.510 5.82 %
14/21
CTW: Model costsCTW: Model costs
• Actual model and alphabet size fixed -> Optimize weight factor alpha– Per tree -> not enough parameters– Per node -> not enough adaptivity– Optimize alpha per depth of the tree
15/21
CTW: Model costsCTW: Model costs
• Exclusion: only use Betas of the actual model
• Iterative process– Convergent?
• Approximation: To find actual model
use Alpha = 0.5
16/21
CTW: Model costsCTW: Model costs
• Compression of an input sequence
– Model costs significant, especially for short sequence
– No decrease by optimizing alpha per depth?
Symbols Alpha 0.5 Alpha after exclusion
Without model costs
100 5.73 5.21 4.94
1.000 4.22 4.07 3.68
10.000 3.12 3.07 2.77
100.000 2.33 2.32 2.13
600.000 1.95 1.95 1.83
17/21
CTW: Model costsCTW: Model costs
SymbolsAlpha 0.5
Alpha after exclusion
Max. probability
in root
Without model costs
100 0.8437 0.8117 0.8113 0.7022
1.000 0.6236 0.6213 0.6209 0.5330
10.000 0.3830 0.3792 0.3794 0.3276
100.000 0.2661 0.2652 0.2647 0.2389
600.000 0.2248 0.2242 0.2241 0.2098
• Maximize probability in the root, instead of the probability per depth
– Exclusion based on alpha = 0.5 almost optimal
18/21
CTW: Model costsCTW: Model costs
Language Alpha 0.5 Alpha after exclusion
GB 2.01 2.04
NL 4.34 4.36
Results in Dasher scenario:
• Trained model
– Negative effect if no user text is available
• Trained with concatenated user text
– Small positive effect if user text added to training text, and very similar to it
Language Alpha 0.5 Alpha after exclusion
GB 2.30 2.28
NL 4.12 4.13
19/21
ConclusionsConclusions
• New CTW Implementation– Only integer arithmetic– Avoids patented techniques– New decomposition tree structure
• Dasher language model based on CTW– 6 percent more are accurate predictions than
PPM-D
• Decreasing the model costs – Only insignificant decrease possible with our
method
20/21
Future workFuture work
• Make CTW suitable for MobileDasher– Decrease memory usage– Decrease number of computations
• Combine language models – Select locally best model, or weight models
together
• Combine languages in 1 model– Models differ in structure or in parameters?