+ All Categories
Home > Science > Algorithms on Strings

Algorithms on Strings

Date post: 22-Jul-2015
Category:
Upload: michael-soltys
View: 454 times
Download: 1 times
Share this document with a friend
29
Algorithms on Strings Michael Soltys CSU Channel Islands Computer Science February 4, 2015 Strings - Soltys Math/CS Seminar Title - 1/27
Transcript
Page 1: Algorithms on Strings

Algorithms on Strings

Michael Soltys

CSU Channel IslandsComputer Science

February 4, 2015

Strings - Soltys Math/CS Seminar Title - 1/27

Page 2: Algorithms on Strings

String problems are at the heart of Computer Science:

Rewriting systems are Turing complete

In practice analysis of strings is central to:

I Algorithmic biology

I Text processing

I Language theory

I Coding theory

Strings - Soltys Math/CS Seminar Introduction - 2/27

Page 3: Algorithms on Strings

Basics (COMP 454)

An alphabet is a finite, non-empty set of distinct symbols, denotedusually by Σ.

e.g., Σ = {0, 1} (binary alphabet)Σ = {a, b, c , . . . , z} (lower-case letters alphabet)

A string, also called word, is a finite ordered sequence of symbolschosen from some alphabet.

e.g., 010011101011

|w | denotes the length of the string w .

e.g., |010011101011| = 12

The empty string, ε, |ε| = 0, is in any Σ by default.

Strings - Soltys Math/CS Seminar Introduction - 3/27

Page 4: Algorithms on Strings

Σk is the set of strings over Σ of length exactly k.

e.g., If Σ = {0, 1}, then

Σ0 = {ε}Σ1 = Σ

Σ2 = {00, 01, 10, 11}, etc. |Σk |?

Kleene’s star Σ∗ is the set of all strings over Σ.

Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ . . .︸ ︷︷ ︸=Σ+

Concatenation If x , y are strings, and x = a1a2 . . . am &

y = b1b2 . . . bn ⇒ x · y = xy︸︷︷︸juxtaposition

= a1a2 . . . amb1b2 . . . bn

UNIX cat command

Strings - Soltys Math/CS Seminar Introduction - 4/27

Page 5: Algorithms on Strings

A language L is a collection of strings over some alphabet Σ, i.e.,L ⊆ Σ∗. E.g.,

L = {ε, 01, 0011, 000111, . . .} = {0n1n|n ≥ 0} (1)

Note:

I wε = εw = w .

I {ε} 6= ∅; one is the language consisting of the single string ε,and the other is the empty language.

Strings - Soltys Math/CS Seminar Introduction - 5/27

Page 6: Algorithms on Strings

Consider L = {w | w is of the form x01y ∈ Σ∗ } where Σ = {0, 1}.

We want to specify a DFA A = (Q,Σ, δ, q0,F ) that accepts all andonly the strings in L.

Σ = {0, 1}, Q = {q0, q1, q2}, and F = {q1}.

Transition diagramq

1 0 0,1

10q0 q2 1

Transition table

0 1

q0 q2 q0

q1 q1 q1

q2 q2 q1

Strings - Soltys Math/CS Seminar Introduction - 6/27

Page 7: Algorithms on Strings

A context-free grammar (CFG) is G = (V ,T ,P, S) — Variables,Terminals, Productions, Start variable

Ex. P −→ ε|0|1|0P0|1P1.

Ex. G = ({E , I},T ,P,E ) where T = {+, ∗, (, ), a, b, 0, 1} and P isthe following set of productions:

E −→ I |E + E |E ∗ E |(E )

I −→ a|b|Ia|Ib|I0|I1

If αAβ ∈ (V ∪ T )∗, A ∈ V , and A −→ γ is a production, then

αAβ ⇒ αγβ. We use∗⇒ to denote 0 or more steps.

L(G ) = {w ∈ T ∗|S ∗⇒ w}

Strings - Soltys Math/CS Seminar Introduction - 7/27

Page 8: Algorithms on Strings

Context-sensitive grammars (CSG) have rules of the form:

α→ β

where α, β ∈ (T ∪ V )∗ and |α| ≤ |β|. A language is contextsensitive if it has a CSG.

Fact: It turns out that CSL = NTIME(n)

A rewriting system (also called a Semi-Thue system) is a grammarwhere there are no restrictions; α→ β for arbitraryα, β ∈ (V ∪ T )∗.

Fact: It turns out that a rewriting system corresponds to the mostgeneral model of computation; i.e., a language has a rewritingsystem iff it is “computable.”

Strings - Soltys Math/CS Seminar Introduction - 8/27

Page 9: Algorithms on Strings

A second course in Automata

Chomsky-Schutzenberger Theorem: If L is a CFL, then thereexists a regular language R, an n, and a homomorphism h, suchthat L = h(PARENn ∩ R).

Parikh’s Theorem: If Σ = {a1, a2, . . . , an}, the signature of astring x ∈ Σ∗ is (#a1(x), #a2(x), . . . , #an(x)), i.e., the number ofocurrences of each symbol, in a fixed order. The signature of alanguage is defined by extension; regular and CFLs have the samesignatures.

Strings - Soltys Math/CS Seminar Introduction - 9/27

Page 10: Algorithms on Strings

This presentation is about algorithms on strings.

Based on two papers that are coming out in the next months:

I Neerja Mhaskar and Michael SoltysNon-repetitive strings over alphabet liststo appear in WALCOM, February 2015.

I Neerja Mhaskar and Michael SoltysString Shuffle: Circuits and Graphsaccepted in the Journal of Discrete Algorithms, 2015

Both at http://soltys.cs.csuci.edu (papers 3 & 19)

Strings - Soltys Math/CS Seminar Introduction - 10/27

Page 11: Algorithms on Strings

Non-repetitive strings

A word is non-repetitive if it does not contain a subword of theform vv .

Word with repetition 010101110Word without repetition 101

Easy observation: what is the smallest n so that any word overΣ = {0, 1} of length ≥ n has at least one repetition?

Strings - Soltys Math/CS Seminar Non-repetitive strings - 11/27

Page 12: Algorithms on Strings

Original Thue problem

For Σ3 = {1, 2, 3} and morphism, due to A. Thue:

S =

1 7→ 12312

2 7→ 131232

3 7→ 1323132

Given a string w ∈ Σ∗3, we let S(w) denote w with every symbolreplaced by its corresponding substitution:

S(w) = S(w1w2 . . .wn) = S(w1)S(w2) . . . S(wn)

Lemma: If w is non-repetitive then so is S(w).

Strings - Soltys Math/CS Seminar Non-repetitive strings - 12/27

Page 13: Algorithms on Strings

Problem extended to alphabet lists

List of alphabets L = L1, L2, . . . , Ln

Can we generate non-repetitive words

w = w1w2 . . .wn, such that the symbol wi ∈ Li ?

Studied by: [GKM10], [Sha09], and it is a natural extension of theoriginal problem posed and solved by A. Thue.

E.g., L1 = {a, b, c}, L2 = {b, c, d}, L3 = {a, d , 2}, in this casew = ac2 is over L1, L2, L3 and non-repetitive.

Is that true for any list where |Li | = 3 for all i?

Strings - Soltys Math/CS Seminar Non-repetitive strings - 13/27

Page 14: Algorithms on Strings

[GKM10] shows that this can be done for |Li | = 4 for all i with thisalgorithm:

pick any w1 ∈ L1

for i + 1 (w = w1w2 . . .wi is non-repetitive) pick a ∈ Li+1

if wa is non-repetitive, then let wi+1 = aif wa has a square vv , thenvv must be a suffixdelete the right copy of v from w , and restart.

Using sophisticated Lovasz Local Lemma argument and Catalannumbers we can show that the above algorithm succeeds withnon-zero probability.

Strings - Soltys Math/CS Seminar Non-repetitive strings - 14/27

Page 15: Algorithms on Strings

Particular “yes” cases for L1, L2, . . . , Ln

I Has a system of distinct representatives (SDR)

I Has the union property

I Can be mapped consistently to Σ3 = {1, 2, 3}I It is a partition

Strings - Soltys Math/CS Seminar Non-repetitive strings - 15/27

Page 16: Algorithms on Strings

Open Problem 1

Given any list L1, L2, . . . , Ln, where |Li | = 3, can we always find anon-repetitive string w over such a list?

Strings - Soltys Math/CS Seminar Non-repetitive strings - 16/27

Page 17: Algorithms on Strings

Shuffle

w is the shuffle of u, v : w = u � v

w = 0110110011101000

u = 01101110

v = 10101000

w = 0110110011101000

w is a shuffle of u and v provided:

u = x1x2 · · · xk

v = y1y2 · · · yk

and w obtained by “interleaving” w = x1y1x2y2 · · · xkyk .

Strings - Soltys Math/CS Seminar Shuffle - 17/27

Page 18: Algorithms on Strings

Shuffle

w is the shuffle of u, v : w = u � v

w = 0110110011101000

u = 01101110

v = 10101000

w = 0110110011101000

w is a shuffle of u and v provided:

u = x1x2 · · · xk

v = y1y2 · · · yk

and w obtained by “interleaving” w = x1y1x2y2 · · · xkyk .

Strings - Soltys Math/CS Seminar Shuffle - 17/27

Page 19: Algorithms on Strings

Square Shuffle

w is a square provided it is equal to a shuffle of a u with itself, i.e.,∃u s.t. w = u � u

The string w = 0110110011101000 is a square:

w = 0110110011101000

andu = 01101100 = 01101100

Strings - Soltys Math/CS Seminar Shuffle - 18/27

Page 20: Algorithms on Strings

Result from 2013

given an alphabet Σ, |Σ| ≥ 7,

Square = {w : ∃u(w = u � u)}

is NP-complete.

What we leave open:

I What about |Σ| = 2 (for |Σ| = 1, Square is just the set ofeven length strings)

I What about if |Σ| =∞ but each symbol cannot occur moreoften than, say, 6 times (if each symbol occurs at most 4times, Square can be reduced to 2-Sat – see P. AustrinStack Exchange post http://bit.ly/WATco3)

Strings - Soltys Math/CS Seminar Shuffle - 19/27

Page 21: Algorithms on Strings

Result from 2013

given an alphabet Σ, |Σ| ≥ 7,

Square = {w : ∃u(w = u � u)}

is NP-complete.

What we leave open:

I What about |Σ| = 2 (for |Σ| = 1, Square is just the set ofeven length strings)

I What about if |Σ| =∞ but each symbol cannot occur moreoften than, say, 6 times (if each symbol occurs at most 4times, Square can be reduced to 2-Sat – see P. AustrinStack Exchange post http://bit.ly/WATco3)

Strings - Soltys Math/CS Seminar Shuffle - 19/27

Page 22: Algorithms on Strings

Open Problem 2

Is Square NP-complete for alphabets of size {2, 3, 4, 5, 6} ?

Strings - Soltys Math/CS Seminar Shuffle - 20/27

Page 23: Algorithms on Strings

Upper and lower bounds

Shuffle(x , y ,w) holds if and only if w is a shuffle of x , y

Shuffle 6∈ AC0, but Shuffle ∈ AC1.

Strings - Soltys Math/CS Seminar Shuffle - 21/27

Page 24: Algorithms on Strings

Upper bound

Strings - Soltys Math/CS Seminar Shuffle - 22/27

Page 25: Algorithms on Strings

Lower bound

Parity(x) =∨

0 ≤ i ≤ |x |i is odd

Shuffle(0|x |−i , 1i , x).

Strings - Soltys Math/CS Seminar Shuffle - 23/27

Page 26: Algorithms on Strings

n−i

i=1 i=3 i=5 i=n

0 x 1 1 10 0 0x x x1ii n−i i in−i n−i

Strings - Soltys Math/CS Seminar Shuffle - 24/27

Page 27: Algorithms on Strings

Open Problem 3

Is Shuffle in NC1?

Strings - Soltys Math/CS Seminar Shuffle - 25/27

Page 28: Algorithms on Strings

Announcement of two upcoming seminars

1. February 16, 2015, 6:00-7:00pmBell Tower 1471Ryszard JanickiOn Pairwise Comparisons Based Rankings

2. February 16, 2015, 7:00-8:00pmBell Tower 1471Neerja MhaskarRepetition in Strings and String Shuffles

Computer Science Seminars:http://compsci.csuci.edu/degrees/seminars.htm

Strings - Soltys Math/CS Seminar Conclusion - 26/27

Page 29: Algorithms on Strings

References

Jaros law Grytczuk, Jakub Kozik, and Pitor Micek.A new approach to nonrepetitive sequences.arXiv:1103.3809, December 2010.

Jeffrey Shallit.A second course in formal languages and automata theory.Cambridge Univeristy Press, 2009.

Strings - Soltys Math/CS Seminar References - 27/27


Recommended