Characterization of state Characterization of state merging strategies which merging strategies which
ensure identification in the ensure identification in the limit from complete datalimit from complete data
(II)(II)
Cristina BibireCristina Bibire
State of the art
A new approach
Motivation
Results
Further Research
Bibliography
State of the ArtState of the Art• 1967 - Gold was the first one to formulate the process of learning formal languages.
• 1973 - Trakhtenbrot and Barzdin described a polynomial time algorithm (TB algorithm) for constructing the smallest DFA consistent with a completely labelled training set (a set that contains all the words up to a certain length).
• 1978 - Gold rediscovers the TB algorithm and applies it to the discipline of grammatical inference (uniformly complete samples are not required). He also specifies the way to obtain indistinguishable states using the so called state characterization matrices.
• 1992 - Oncina and Garcia propose the RPNI (Regular Positive and Negative Inference) algorithm.
• 1992 - Lang describes TB algorithm and generalize it to produce a (not necessarily minimum) DFA consistent with a sparsely labelled tree. The algorithm (Traxbar) can deal with incomplete data sets as well as complete data sets.
State of the ArtState of the Art• 1997 - Lang and Pearlmutter organized the Abbadingo One contest. The competition presented the challenge of predicting, with 99% accuracy, the labels which an unseen FSA would assign to test data given training data consisting of positive and negative examples.
• Price was able to win the Abbadingo One Learning Competition by using an evidence-driven state merging (EDSM) algorithm. Essentially, Price realized that an effective way of choosing which pair of nodes to merge next within the APTA would simply involve selecting the pair of nodes whose sub-tree share the most similar labels.
• As a post-competition work, Lang proposed W-EDSM. In order to improve the running time of the EDSM algorithm, we only consider merging nodes that lie within a fixed sized window from the root node of the APTA
• An alternative windowing method (Blue-fringe algorithm) is also described by Lang, Pearlmutter and Price. It uses a red and blue colouring scheme to provide a simple but effective way of choosing the pool of merge candidates at each merge level in the search.
State of the ArtState of the Art
TB algorithm
Gold’s algorithm
RPNI
Traxbar
EDSM
W-EDSM
Blue-fringe
State of the ArtState of the ArtL=Any string without an odd number of consecutive 0’s after an odd number of consecutive 1’s
0
0
42
1
3
0
0, 1
1 1
11
0 0
TB algorithmTB algorithm
,0,1,00,01,11,000,001,011,100,110,111,...11111
10,010,101,...11101
S
S
00000
0000
0
0λ
000
1
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1110
1111
001
010
011
100
101
110
111
00 01 10 11
0 10
0
0 0 0 0 0 0 0 0
0 0 0
0 1
1
11
1
1
1
1
1
1
11
00001
11110
11111
1 0 1
(0,λ)
1101
TB algorithmTB algorithmλ
1
1000
1001
1010
1011
1100
1101
1110
1111
100
101
110
111
10 11
1
0 0 0 0
0 0
0 1
1
11
1
1
0
1
0
11110
11111
11100
11101
11000
11001
10110
10111
10100
10101
10010
10011
10000
10001
11010
11011
0 0 0 0 0 0 01 1 1 1 1 1 1 1
(11,λ)
1101
TB algorithmTB algorithm
λ
1
1
0
1
0
1000
1001
1010
1011
100
101
10
0 0
0 1
1
0
1
10110
10111
10100
10101
10010
10011
10000
10001
0 0 01 1 1 1
(1000,10)
λ
1
1
0
1
0
1001
1010
1011
100
101
100
0
1
11
10110
10111
10100
10101
10010
10011
0 0 01 1 1
0
(1001,1)
TB algorithmTB algorithm
(1001,1)
λ
1
1
0
1
0
100
100 11
10110
10111
10100
10101
1010
1011
101
0 1
0 01 1
0
(1010,101)
(1011,101)
λ
1
1
0
1
0
100
100 11
101
0,1
0
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}} {0,1}{0,1}
(0,λ) dist? No
λ0 1
0
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}} {0,1}{0,1} {(0,{(0,λλ)})}
(0,λ) dist? No
λ0 1
0
(1,λ) dist? Yes add 1 to S1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{0,1}{0,1}
{0,10,11}{0,10,11}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
(0,λ) dist? No(1,λ) dist? Yes(10,λ) dist? (10,1) dist?
YesYes
add 10 to S
add 1 to S
λ0 1
0
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
(0,λ) dist? No(1,λ) dist? Yes(10,λ) dist? (10,1) dist?
YesYes
add 10 to S
add 1 to S
(11,λ) dist? No
λ0 1
0
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
(0,λ) dist? No(1,λ) dist? Yes(10,λ) dist? (10,1) dist?
YesYes
add 10 to S
add 1 to S
(11,λ) dist? No(100,λ) dist? (100,1) dist? (100,10) dist?
YesYes
Yesadd 100 to S
λ0 1
0
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
(101,λ) dist? YesYesYesYes
add 101 to S(101,1) dist? (101,10) dist? (101,100) dist?
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010{0,11,1000,1001,1010,1011},1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
(1000,10) dist? No1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010{0,11,1000,1001,1010,1011},1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ),),(1000,10)(1000,10)}}
(1000,10) dist? (1001,1) dist?
NoNo
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010{0,11,1000,1001,1010,1011},1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ),),(1000,10),(1000,10),(1001,1)(1001,1)}}
(1000,10) dist? (1001,1) dist?
No
(1010,101) dist? NoNo
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010{0,11,1000,1001,1010,1011},1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ),),(1000,10),(1000,10),(1001,1),(1001,1),(1010,101)(1010,101)}}
(1000,10) dist? (1001,1) dist?
No
(1010,101) dist? NoNo
(1011,101) dist? No
1
Gold’s algorithmGold’s algorithm
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
1000
1001
1010
100
101
110
10 11
1
λ0 1
0
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
SS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010{0,11,1000,1001,1010,1011},1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ),),(1000,10),(1000,10),(1001,1),(1001,1),(1010,101),(1010,101),(1011,101)(1011,101)}}(1000,10) dist?
(1001,1) dist? No
(1010,101) dist? NoNo
(1011,101) dist? No
1
Gold’s algorithmGold’s algorithmSS SSΣΣ-S-S listlist
{{λλ}}
{{λλ,1},1}
{{λλ,1,10},1,10}
{{λλ,1,10,100},1,10,100}
{{λλ,1,10,100,101},1,10,100,101}
{0,1}{0,1}
{0,10,11}{0,10,11}
{0,11,100,101}{0,11,100,101}
{0,11,101,1000,1001}{0,11,101,1000,1001}
{0,11,1000,1001,1010,1011}{0,11,1000,1001,1010,1011}
{(0,{(0,λλ)})}
{(0,{(0,λλ)})}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ))}}
{(0,{(0,λλ),(11,),(11,λλ),(1000,10),),(1000,10),(1001,1),(1010,101),(1001,1),(1010,101),(1011,101)(1011,101)}}
λ
1
1
0
1
0
100
100 11 101
0,1
0
10 101
RPNI RPNI
,0,1,100,110,1001,10000
1000,10010
S
S
1000
1001
100
110
10 11
1
0
0
0 0
0 1
10000
λ0 1
0
1
λ
1
1
0
1
0
100
100
1
0
K
Fr
TraxbarTraxbarA variation of the Trakhtenbrot and Barzdin algorithm was implemented by Lang. The modifications made to the algorithm were needed to maintain consistency with incomplete training sets. For instance, unlabeled nodes and missing transitions in the APTA needed to be considered.
The simple extensions added to the Trakhtenbrot and Barzdin algorithm are as follows.
If node q2 is to be merged with node q1 then:• labels of labelled nodes in the sub-tree rooted at q2 must be copied over their respective unlabeled nodes in the sub-tree rooted at q1;• transitions in any of the nodes in the sub-tree rooted at q2 that do not exist in their respective node in the sub-tree rooted at q1 must be copied in.
As a result of these changes, the Traxbar algorithm will produce a (not necessarily minimum size) DFA that is consistent with the training set.
TraxbarTraxbar
1001
1010
100
101
110
10 11
1
0 0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
λ0 1
0
1
1000
,0,1,100,110
10,101,1000,1010,1011,10010,10100,10110
S
S
(0,λ) (11,λ)
0
1001
1010
100
101
110
10
1
0
0
0 0
0 1
1
1
10010
10100
1011
10110
0 0
1
1
1000
11
λ,0
0
1000
1001
1010
100
101
10
0 0
0
0 1
1
10010
10100
1011
10110
0 0
1
λ,0,11,11
0
0
1
0
1 1
TraxbarTraxbar
1000
1001
1010
100
101
10
0 0
0
0 1
1
10010
10100
1011
10110
0 0
1
λ,0,11,11
0
0
1
0
1 1(1000,10) (1001,1)
1010
101
10,1000
0
1
1
10100
1011
10110
0 0
1001
100
0
10010
1
λ,0,11,11
0
0
1
0
1 1
0
0
1010
101
10,1000,10010
0
1
1
10100
1011
10110
0 0
100
λ,0,11,11
0
0
1,100
1
0
1 1
0
0
1
(1010,101)
(1011,101)
TraxbarTraxbar
101,1010,10100,1011,1011
0
10,1000,1001
0
1
100
λ,0,11,11
0
0
1,100
1
0
1 1
0
0
10,1
EDSMEDSM
The general idea of the EDSM approach is to avoid bad merges by selecting the pair of nodes within the APTA which has the highest score. It is expected that the scoring will indicate the correctness of each merge, since on average, a merge that survives more label comparisons is more likely to be correct.
A post-competition version of the EDSM algorithm as described by Lang, Pearlmutter and Price is included below.
• Evaluate all possible pairings of nodes within the APTA.• Merge the pair of nodes which has the highest calculated score. • Repeat the steps above until no other nodes within the APTA can be merged.
The score is calculated by assigning one point for each overlapping label node within the sub-tree rooted at the nodes considered for merging
Windowed-EDSMWindowed-EDSMTo improve the running time of the EDSM algorithm, it is suggested that we only consider merging nodes that lie within a fixed sized window from the root node of the APTA.
• In breadth-first order, create a window of nodes starting from the root of the APTA.
• Evaluate all possible merge pairs within the window.
• Merge the pair of nodes which has the highest number of matching labels within its sub-trees.
• If the merge reduces the size of the window, in breadth-first order, include the number of nodes needed to regain the fixed size of the window.
• If no merge is possible within the given window, increase the size of the window by a factor of 2.
• Terminate when no merges are possible.
The recommended size of the window is twice the number of states in the target DFA.
Blue-FringeBlue-FringeAn alternative windowing method to that used by the W-EDSM algorithm is also described by Lang, Pearlmutter and Price. It uses a red and blue colouring scheme to provide a simple but effective way of choosing the pool of merge candidates at each merge level in the search. The Blue-fringe windowing method aids in the implementation of the algorithm and improves on its running time.
• Colour the root of the APTA red.
• Colour the non-red children of each red node blue.
• Evaluate all possible pairings of red and blue nodes.
• Promote the first blue node which is distinguishable from each red node.
• Otherwise, merge the pair of nodes which have the highest number of matching labels within their sub-trees.
• Terminate when there are no blue nodes to promote and no possible merges to perform.
A new approachA new approachGoal: to design an algorithm capable of incrementally learn new information without forgetting previously acquired knowledge and without requiring access to the original set of samples.
We denote by:
= the set of all automata having the alphabet
Let be any of the algorithms defined so far (TB, Gold,
RPNI, Traxbar, EDSM, W-EDSM, Blue-fringe)
A * *a lg : A
a lgS ,S A *: ,m A A ,m A s A
a lgS s ,S A
A A
A new approachA new approach
,0,1,100,110,10000 1001
1000,10010
S s
S
0
1
1
0
1
0
3
2
0
0
0
1
0
1
A
A new approachA new approach
,0,1,100,110,10000 1001
1000,10010
S s
S
0
1
1
0
1
0
3
2
0
0
0
1
0
1
A new approachA new approach
,0,1,100,110,10000 1001
1000,10010
S s
S
0
1
1
0
1
0
3
2
0
0
0
0
1
A new approachA new approach
,0,1,100,110,10000 1001
1000,10010
S s
S
0
1
1
0
1
0
3
2
0
00
1
A new approachA new approach
0
1
1
0
1
0
3
2
0
0
0
1
1
0
1
0
3
2
0
0
1
1
(q,1)
q
A
,0,1,100,110,10000 1001
1000,10010
S s
S
A new approachA new approach
0
1
1
0
1
0
3
2
0
0
1
A
1001,0,1,100,110, ,10000
1000,10010
S s
S
0
1
1
0
1
0
3
2
0
0
1
A
MotivationMotivation
Q: Why do we need this algorithm?
1. In many practical applications, acquisition of a representative training data is expensive and time consuming.
2. It might be the case that a new sample is introduced after several days, months or even years
3. We might have lost the initial database
ResultsResults
a lg S s ,S m a lg S ,S ,s Lemma 1 It is not always true that:
Lemma 2 Let such that . It is not always true that:*S ,S ,S S S
L a lg S ,S L a lg S ,S
Lemma 3 Let such that . It is not always true that:*S ,S ,S S S
L a lg S ,S L a lg S ,S
ResultsResultsSketch of the Proof (for Lemma1 and Lemma2):
3 5
3 5 6
4
,
, ,
S a a
S a a a
S a
Sketch of the Proof (for Lemma3):
3 5 6
2
2 4
, ,
,
S a a a
S a
S a a
Further ResearchFurther Research
o To determine the complexity of the algorithm and to test it on large/sparse data
o To determine how much time and resources we save using this algorithm
instead of the classical ones
o To design an algorithm to deal with new introduced negative samples
o To find the answer to the question: when is the automaton created with this
method weakly equivalent with the one obtained with the entire sample?
o To improve the software in order to be able to deal with new samples
BibliographyBibliography• Colin de la Higuera, José Oncina, Enrique Vidal. “Identification of DFA:
Data-Dependent versus Data-Independent Algorithms”
• Rajesh Parekh, Vasant Honavar. “Learning DFA from Simple Examples”
• Michael J. Kearns, Umesh V. Vazirani “An Introduction to Computational
Theory”
• J. Oncina, P. Garcia. “A polynomial algorithm to infer regular languages”
• Dana Angluin. “Inference of Reversible Languages”
• P. Garcia, A. Cano, J. Ruiz. “A comparative study of two algorithms for
automata identification”
• P. Garcia , A. Cano, J. Ruiz. “Inferring subclasses of regular languages
faster using RPNI and forbidden configurations”
• K.J. Lang, B.A. Pearlmutter, R.A. Price. “Results of the Abbadingo One
DFA Learning Competition and a New Evidence-Driven State Merging
Algorithm”
BibliographyBibliography
• Takashi Yokomori. “Grammatical Inference and Learning”.
• M. Sebban, J.C. Janodet, E. Tantini. ”BLUE : a Blue-Fringe Procedure for
Learning DFA with Noisy Data”
• Kevin J. Lang. “Random DFA’s can be Approximately Learned from
Sparse Uniform Examples”
• P. Dupont, L. Miclet, E. Vidal. “What is the search space of the regular
inference?”
• Sara Porat, Jerome A. Feldman. “Learning Automata from Ordered
Examples”
• Orlando Cicchello, Stefan C. Kremer. “Inducing Grammars from Sparse
Data Sets: A Survey of Algorithms and results”