CPM 2005CPM 2005 11
An Optimal Algorithm for An Optimal Algorithm for Online Square DetectionOnline Square Detection
Gen-Huey Chen, Jin-Ju Hong, Hsueh-I LuGen-Huey Chen, Jin-Ju Hong, Hsueh-I Lu
National Taiwan UniversityNational Taiwan University
CPM 2005CPM 2005 22
OutlineOutlineThe definitions of the square detection problem The definitions of the square detection problem
and the online square detection problemand the online square detection problemThe techniques of the algorithm in [Cro86] for tThe techniques of the algorithm in [Cro86] for t
he square detection problemhe square detection problemOur algorithm for the online square detection prOur algorithm for the online square detection pr
oblemoblemConclusionConclusion
CPM 2005CPM 2005 33
Square Detection ProblemSquare Detection ProblemSquare: a nonempty string of the form XXSquare: a nonempty string of the form XXE.g. “a b c a b c” is a square.E.g. “a b c a b c” is a square.
“ “a b c a b c a” is not a square.a b c a b c a” is not a square.
Input: a string Input: a string SSSquare detection problem:Square detection problem:
Is there a square in Is there a square in SS??
CPM 2005CPM 2005 44
Online Square Detection ProbleOnline Square Detection Problemm
Leung, Peng, and Ting in COCOON’04Leung, Peng, and Ting in COCOON’04Input: a string Input: a string SSLet Let mm be the unknown smallest integer s.t. be the unknown smallest integer s.t. SS[1..[1..
mm] contains a square.] contains a square.Online square detection problem:Online square detection problem:
Determine Determine mm as soon as as soon as SS[[mm] is read.] is read.An An OO((mm log log22mm)-time algorithm )-time algorithm [LPT04][LPT04]
An An OO((mm log logββ)-time algorithm in our paper)-time algorithm in our paper
CPM 2005CPM 2005 55
Algorithm in [Cro86] forAlgorithm in [Cro86] forSquare Detection ProblemSquare Detection Problem
forfor kk = 1 = 1 toto pp // // pp: # of blocks: # of blocks
{ { ifif a square ends in a square ends in BBii thenthen returnreturn YES; } YES; }
returnreturn NO; NO;
B1 B2 B3 B4 . . . Bp
CPM 2005CPM 2005 66
ff-factorization-factorizationLet Let ddkk denote the starting position of the denote the starting position of the kk-th bloc-th bloc
k k BBkk..
BBkk is is SS[[ddkk]] if if SS[[ddkk] does not occur before ] does not occur before ddkk, or , or the the
longest prefix of longest prefix of SS[[ddkk....nn] that occurs before ] that occurs before ddkk..
1 2 3 4 5 6 7 8 9 10 111 2 3 4 5 6 7 8 9 10 11 ……E.g. E.g. SS = a a a b b a b a b a a … = a a a b b a b a b a a …
BB11 BB22 BB33 BB44 BB55 BB66
CPM 2005CPM 2005 77
ff-factorization (cont.)-factorization (cont.)
A square ending in A square ending in BBkk is centered either in is centered either in BBkk-1-1
or in or in BBkk..
. . . Bk-1 Bk
CPM 2005CPM 2005 88
Square Ending in the Square Ending in the kk-th Block-th BlockCase 1. The square is entirely in the Case 1. The square is entirely in the kk-th block.-th block.
Case 2. The square begins in the (Case 2. The square begins in the (kk-1)-st block.-1)-st block.Case 2.1. The square is centered in the (Case 2.1. The square is centered in the (kk-1)-st block.-1)-st block.
Case 2.2. The square is centered in the Case 2.2. The square is centered in the kk-th block.-th block.
Case 3. The square begins before the (Case 3. The square begins before the (kk-1)-st block -1)-st block and centered in the (and centered in the (kk-1)-st or -1)-st or kk-th block.-th block.
…
…
…
…
CPM 2005CPM 2005 99
Our Algorithm for OnlineOur Algorithm for OnlineSquare Detection ProblemSquare Detection Problem
forfor ii = 1 = 1 toto nn // // n n = |= |S|S|
{ compute the { compute the ff-factorization of -factorization of SS[1..[1..ii];];
ifif a square ends at a square ends at SS[[ii] ] thenthen returnreturn ii; }; }
returnreturn NO-SQUARE; NO-SQUARE;
CPM 2005CPM 2005 1010
Square Ending at Square Ending at SS[[ii]] in in BBkk
Case 1. The square is entirely in the Case 1. The square is entirely in the kk-th block.-th block.
Case 2. The square begins in the (Case 2. The square begins in the (kk-1)-st block.-1)-st block.Case 2.1. The square is centered in the (Case 2.1. The square is centered in the (kk-1)-st block.-1)-st block.
Case 2.2. The square is centered in the Case 2.2. The square is centered in the kk-th block.-th block.
Case 3. The square begins before the (Case 3. The square begins before the (kk-1)-st block -1)-st block and centered in the (and centered in the (kk-1)-st or -1)-st or kk-th block.-th block.
…
…
…
…
CPM 2005CPM 2005 1111
LL((ii11, , ii22, , ii)-square)-square::
RR((ii11, , ii22, , ii)-square)-square::
S
i1 c i2 i
i1 c < i2
S
i1 ci2 i
i2 c < i
j
i1 j < i2
j
i1 j < i2
CPM 2005CPM 2005 1212
Square Ending at Square Ending at SS[[ii]] in in BBkk
Case 1. The square is entirely in the Case 1. The square is entirely in the kk-th block.-th block.
Case 2. The square begins in the (Case 2. The square begins in the (kk-1)-st block.-1)-st block.Case 2.1. The square is centered in the (Case 2.1. The square is centered in the (kk-1)-st block.-1)-st block.
Case 2.2. The square is centered in the Case 2.2. The square is centered in the kk-th block.-th block.
Case 3. The square begins before the (Case 3. The square begins before the (kk-1)-st block -1)-st block and centered in the (and centered in the (kk-1)-st or -1)-st or kk-th block.-th block.
…
…
…
…
LL((ddkk-1-1, , ddkk, , ii)-square :)-square :
RR((ddkk-1-1, , ddkk, , ii)-square :)-square :
RR(1, (1, ddkk-1-1, , ii)-square :)-square :
dk-1 dk i
1 dk-1 i
CPM 2005CPM 2005 1313
Our Algorithm for OnlineOur Algorithm for OnlineSquare Detection ProblemSquare Detection Problem
forfor ii = 1 = 1 toto nn // // n n = |= |S|S|
{ compute the { compute the ff-factorization of -factorization of SS[1..[1..ii];];
let let SS[[ii] belong to ] belong to BBkk;;
ifif an an LL((ddkk-1-1, , ddkk, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
ifif an an RR((ddkk-1-1, , ddkk, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
ifif an an RR(1, (1, ddkk-1-1, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
}}
returnreturn NO-SQUARE; NO-SQUARE;
amortizedO(logβ)time
CPM 2005CPM 2005 1414
Longest Common ExtensionsLongest Common ExtensionsFor positions For positions ii11ii22ii33 in in SS
XXRR((ii11, , ii22, , ii33)): longest common right extension of : longest common right extension of
positions positions ii11 and and ii22 with boundary with boundary ii33
1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
E.g. E.g. SS = a b a b b a b a b a = a b a b b a b a b a
XXLL((ii22, , ii33, , ii11)): longest common left extension of p: longest common left extension of p
ositions ositions ii22 and and ii33 with boundary with boundary ii11
XXRR(3, 8, 10) = 2(3, 8, 10) = 2XXLL(4, 9, 2) = 3(4, 9, 2) = 3
CPM 2005CPM 2005 1515
Head Extension Function: Head Extension Function: XXRR(1, (1, jj, ,
ii))If the string If the string SS is read character by character, in is read character by character, in
the the ii-th iteration, for all -th iteration, for all jjii, , XXRR(1, (1, jj, , ii) can be c) can be c
omputed in omputed in OO(1) time with totally (1) time with totally OO((ii)-time pr)-time preprocessing.eprocessing.
1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
E.g. E.g. SS = a b a b b a b a b a = a b a b b a b a b a
XXRR(1,(1,jj,10) ,10) 10 0 2 0 0 4 0 3 0 110 0 2 0 0 4 0 3 0 1
We call We call XXRR(1, (1, jj, , ii) ) the head extension functionthe head extension function
CPM 2005CPM 2005 1616
LL((ii11, , ii22, , ii))-square-square
Y Z Y ZS
i1 j i2 i
CPM 2005CPM 2005 1717
[ML84] [ML84] SS has an has an LL((ii11, , ii22, , ii)-square)-square if and only if ther if and only if ther
e is an index e is an index jj with with ii11jj<<ii22 such that such that XXRR((jj, , ii22, , ii)) = = ||SS[[ii
22....ii]|]| and and XXLL((jj-1, -1, ii22-1, -1, ii11)) + + XXRR((jj, , ii22, , ii)) ||SS[[jj....ii22-1]|-1]|..
LL((ii11, , ii22, , ii))-square-square
Y Z Y ZS
i1 j i2 i
S[1..i-1] contains no square.
=
CPM 2005CPM 2005 1818
Detecting Detecting LL((ddkk-1-1, , ddkk, , ii))-squares-squares
Let Let zz((jj)) = = ||SS[[jj....ddkk-1]|-1]|--XXLL((jj-1,-1,ddkk-1,-1,ddkk-1-1) ) for all for all jj in in BBkk-1-1
In the In the ii-th iteration: is there an index -th iteration: is there an index jj in in BBkk-1-1 s.t. s.t. XXRR
((jj, , ddkk, , ii)) = = zz((jj))??
Y Z Y =Z ?S
dk-1 j dk i
z(j)
CPM 2005CPM 2005 1919
In the In the ddkk-th iteration -th iteration (preprocessing)(preprocessing)
Compute Compute zz((jj)) for all for all jj in in BBkk-1-1
Build the suffix tree of Build the suffix tree of BBkk-1-1$$
For all For all uu, compute, compute
min{min{zz((jj)| )| jj ↔ a leaf in ↔ a leaf in uu’s subtree}’s subtree}
Y Z YS
dk-1 j dk i
z(j)
u
z(j)O(|Bk-1|logβ) time
CPM 2005CPM 2005 2020
In the In the ii-th iteration-th iteration
If |If |SS[[ddkk....ii]| equals the value stored in ]| equals the value stored in uu
a square ends at position a square ends at position ii
Y Z Y =Z ?S
dk-1 j dk i
z(j)
u
z(j)
S[dk..i]
CPM 2005CPM 2005 2121
RR((ii11, , ii22, , ii))-square-square
Y Z Y ZS
i1 i2 j i
CPM 2005CPM 2005 2222
RR((ii11, , ii22, , ii))-square-square
[ML84] [ML84] SS has an has an RR((ii11, , ii22, , ii)-square)-square if and only if the if and only if the
re is an index re is an index jj with with ii22<<jj<<ii such that such that XXRR((ii22, , j+1j+1, , ii)) = = ||
SS[[jj+1..+1..ii]|]| and and XXLL((ii22-1, -1, jj, , ii11)) + + XXRR((ii22, , jj, , ii)) ||SS[[ii22....jj]|]|..
Y Z Y ZS
i1 i2 j i
S[1..i-1] contains no square.
=
CPM 2005CPM 2005 2323
Detecting Detecting RR((ddkk-1-1, , ddkk, , ii))-square-square
Let Let zz((jj)) = = ||SS[[ddkk....jj]|]|--XXLL((ddkk-1,-1,jj,,ddkk-1-1) ) for all for all jj in in BBkk
Insert the position Insert the position jj into the set of into the set of jj++zz((jj))For all For all jj in the set of in the set of ii, , XXRR((ddkk, , jj+1, +1, ii)) = = zz((jj))??
Y Z Y =Z ?S
dk-1 dk j i
z(j)set of j+z(j)
insert jamortizedO(logβ) time
CPM 2005CPM 2005 2424
Computing Computing XXLL((ddkk-1, -1, jj, , ddkk-1-1))
||SS[[gg,,ddkk-1]| = min( |-1]| = min( |SS[[ddkk-1-1....ddkk-1]|, -1]|, ||SS[[ddkk....jj]|]| ) )
For all For all vv with with ggvv<<ddkk, , XXLL((vv, , ddkk-1, -1, gg)) can be compute can be compute
d in d in OO(1) time using the technique of computing the (1) time using the technique of computing the head extension function.head extension function.
Y Z YS
dk-1 dk j i
g
v
CPM 2005CPM 2005 2525
Computing Computing XXLL((ddkk-1, -1, jj, , ddkk-1-1)) (cont.)(cont.)
Let Let FF((jj)) denote the longest suffix of denote the longest suffix of SS[[ddkk....jj]] that is al that is al
so a substring of so a substring of SS[[gg....ddkk-1]-1]
XXLL((ddkk-1,-1,jj,,ddkk-1-1)) = | = |FF((jj)| if )| if yy==ddkk-1-1
min( min( ||FF((jj)|)|, , XXLL((yy,,ddkk-1,-1,gg)) ) otherwise ) otherwise
Y Z YS
dk-1 dk j i
g
y
F(j)
CPM 2005CPM 2005 2626
Time ComplexityTime Complexity
forfor ii = 1 = 1 toto nn // // n n = |= |S|S|
{ compute the { compute the ff-factorization of -factorization of SS[1..[1..ii];];
let let SS[[ii] belong to ] belong to BBkk;;
ifif an an LL((ddkk-1-1, , ddkk, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
ifif an an RR((ddkk-1-1, , ddkk, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
ifif an an RR(1, (1, ddkk-1-1, , ii)-square)-square is detected is detected thenthen returnreturn ii;;
}}
returnreturn NO-SQUARE; NO-SQUARE;
amortizedO(logβ)time
CPM 2005CPM 2005 2727
ConclusionConclusionEach of those Each of those OO(log(logββ) terms comes from the tr) terms comes from the tr
aversal in a suffix tree of a string with aversal in a suffix tree of a string with OO((ββ) dis) distinct characters.tinct characters.
Expected time: Expected time: OO((mm))Is it possible to reduce the running time to worIs it possible to reduce the running time to wor
st-case st-case OO((mm) time for a general alphabet?) time for a general alphabet?