Post on 07-Jul-2020
transcript
Combinatorics of Finite Wordsand Suffix Automata
Gabriele Fici
Dipartimento di Informatica e ApplicazioniUniversita di Salerno (Italy)
CAI 2009 - Thessaloniki
20 May 2009
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
A is a finite set of letters (the alphabet).
A finite word w is an element of A∗.
Its length |w | is the number of its letters.
The empty word ε has length 0.
Let w = a1a2 . . . an be a word.a1 . . . ai , with 1 ≤ i ≤ n, and ε are the prefixes of w .aj . . . an, with 1 ≤ j ≤ n, and ε are the suffixes of w .aj . . . ai , with 1 ≤ i , j ≤ n, and ε are the factors of w .
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Example
A = {a,n,b, c}, w = banana
|banana| = 6
ba is a prefix of banana
nana is a suffix of banana
a, ba, ε, banana are factors of banana
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .
balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.
differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211
finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...
many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Combinatorics of Finite Words
Some famous classes of finite words:
palindromes: wR = w . Ex. level .balanced words over two letters (say a and b): all thefactors of the same length have the same number of a’s(and of b’s) up to 1. Ex. abaababaabaab.differentiable words: words over {1,2} such that their RunLength Encoding is still a word over {1,2}.Ex. 2211212212211finite prefixes of (right) infinite words: Thue-Morse,Fibonacci, Kolakoski,...many many others.
Intersections: 12112121121 is a balanced differentiablepalindromic prefix of the Fibonacci word over {1,2}...
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
What’s the target?
Target
Classify the words through their combinatorial properties.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The suffix automaton
Definition (Blumer et al. 1985 - Crochemore 1986)The suffix automaton of the word w is the minimal deterministicautomaton recognizing the suffixes of w .
ExampleThe suffix automaton of aabbabb:
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
0-0
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Algorithmically
Theorem (Blumer et al. 1985 - Crochemore 1986)The suffix automaton of a word w over a fixed alphabet A canbe built in time and space O(|w |).
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
One way to build the SA
Build a non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
One way to build the SA
Build a non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w .
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endset(b) = {3,4,6,7}, Endset(abb) = Endset(bb) = {4,7}.
We define on Fact(w) the equivalence:
u ∼ v ⇔ Endset(u) = Endset(v)
Then Fact(w)/ ∼ is the set of states of the SA of w .
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w .
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endset(b) = {3,4,6,7}, Endset(abb) = Endset(bb) = {4,7}.
We define on Fact(w) the equivalence:
u ∼ v ⇔ Endset(u) = Endset(v)
Then Fact(w)/ ∼ is the set of states of the SA of w .
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w .
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endset(b) = {3,4,6,7}, Endset(abb) = Endset(bb) = {4,7}.
We define on Fact(w) the equivalence:
u ∼ v ⇔ Endset(u) = Endset(v)
Then Fact(w)/ ∼ is the set of states of the SA of w .
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
The number of states (classes) of the SA is noted |Qw |.
The bounds on |Qw | are well known:
|w |+ 1 ≤ |Qw | ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
ProblemCharacterize the class of words for which |Qw | = |w |+ 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
The number of states (classes) of the SA is noted |Qw |.
The bounds on |Qw | are well known:
|w |+ 1 ≤ |Qw | ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
ProblemCharacterize the class of words for which |Qw | = |w |+ 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
The number of states (classes) of the SA is noted |Qw |.
The bounds on |Qw | are well known:
|w |+ 1 ≤ |Qw | ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
ProblemCharacterize the class of words for which |Qw | = |w |+ 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
The number of states (classes) of the SA is noted |Qw |.
The bounds on |Qw | are well known:
|w |+ 1 ≤ |Qw | ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
ProblemCharacterize the class of words for which |Qw | = |w |+ 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w .
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w .
v is a bispecial factor of w if it is both left and right special.
Example (w = aabbabb)
LS = {ε,a,b,ab,abb}, RS = {ε,a,b}, BIS = {ε,a,b}
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w .
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w .
v is a bispecial factor of w if it is both left and right special.
Example (w = aabbabb)
LS = {ε,a,b,ab,abb}, RS = {ε,a,b}, BIS = {ε,a,b}
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
Theorem (Sciortino, Zamboni 2007)
If |A| = 2 then the following conditions are equivalent for a wordover A:
|Qw | = |w |+ 1Every left special factor of w is a prefix of ww is a prefix of a standard sturmian word.
Without restriction on the cardinality of A we have the formula:
Lemma
|Qw | = |w |+ 1 + |D(w)|
where D(w) is the set of left special factors which are notprefixes.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of states
Theorem (Sciortino, Zamboni 2007)
If |A| = 2 then the following conditions are equivalent for a wordover A:
|Qw | = |w |+ 1Every left special factor of w is a prefix of ww is a prefix of a standard sturmian word.
Without restriction on the cardinality of A we have the formula:
Lemma
|Qw | = |w |+ 1 + |D(w)|
where D(w) is the set of left special factors which are notprefixes.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Property LSP
DefinitionA word has property LSP if every left special factor is a prefix.
Corollary
|Qw | = |w |+ 1 ⇐⇒ w has property LSP
ProblemCharacterize the class of words having the property LSP, overan arbitrary fixed alphabet A.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Property LSP
DefinitionA word has property LSP if every left special factor is a prefix.
Corollary
|Qw | = |w |+ 1 ⇐⇒ w has property LSP
ProblemCharacterize the class of words having the property LSP, overan arbitrary fixed alphabet A.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Property LSP
DefinitionA word has property LSP if every left special factor is a prefix.
Corollary
|Qw | = |w |+ 1 ⇐⇒ w has property LSP
ProblemCharacterize the class of words having the property LSP, overan arbitrary fixed alphabet A.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The binary case
For binary words we have the formula:
|Qw | = 2|w | − Hw − Pw
Hw is the minimal length of a prefix of w occurring only once,Pw is the maximal length of a left special prefix of w.
As a corollary we obtain a new characterization of standardsturmian words:
Corollary
w is a prefix of a stand. sturm. word⇔ |w | = Hw + Pw + 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The binary case
For binary words we have the formula:
|Qw | = 2|w | − Hw − Pw
Hw is the minimal length of a prefix of w occurring only once,Pw is the maximal length of a left special prefix of w.
As a corollary we obtain a new characterization of standardsturmian words:
Corollary
w is a prefix of a stand. sturm. word⇔ |w | = Hw + Pw + 1.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
0-0
Hw = 2 since aa occurs only once.Pw = 1 since a is left special.
|Qw | = 2 · 7− 2− 1 = 11
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we give the formula:
Lemma
Ew = |Qw |+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we give the formula:
Lemma
Ew = |Qw |+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
The number of edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we give the formula:
Lemma
Ew = |Qw |+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
0-0
G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε,a,b} ∪ {ε,a}
|G(w)| = 3 ⇒ Ew = 11 + 3− 1 = 13.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata
Further Research
ProblemDoes this approach can be applied to other data structures(factor oracle, suffix tree/trie, suffix array, etc.)?
ProblemCharacterize the words having property LSP (i.e. every leftspecial factor is a prefix).
ProblemCompute the average size of the SA for particular class ofwords.
Gabriele Fici Combinatorics of Finite Words and Suffix Automata