Martin Kay Stanford University

Post on 22-Jan-2016

24 views 0 download

description

Martin Kay Stanford University. String Search 1. Naive Search (1). naive_search(Pattern, Text, 1) :- append(Pattern, _, Text). naive_search(Pattern, [_ | Text], N) :- naive_search(Pattern, Text, N0), N is N0+1. naive_search("is", "mississippi", N). N = 2 ? ; N = 5 ? ; no | ?-. - PowerPoint PPT Presentation

transcript

Martin Kay String Matching 1 1

Martin Kay

Stanford University

Martin Kay String Matching 1 2

Naive Search (1)naive_search(Pattern, Text, 1) :-

append(Pattern, _, Text).

naive_search(Pattern, [_ | Text], N) :-

naive_search(Pattern, Text, N0),

N is N0+1.

naive_search("is", "mississippi", N).

N = 2 ? ;

N = 5 ? ;

no| ?-

Martin Kay String Matching 1 3

pref — A Prefix Predicate

pref(P, T) :-

assert(stat(T, P)),

fail.pref([], _).

pref([H | P], [H | T]) :-

pref(P, T).

Make an entry inthe data baseevery time the

predicate iscalled.

Martin Kay String Matching 1 4

Search using prefnaive_search1(Pattern, Text, 1) :-

pref(Pattern, Text).

naive_search1(Pattern, [_ | Text], N) :-

naive_search1(Pattern, Text, N0),

N is N0+1.

| ?- naive_search1([i,s], [m,i,s,s,i,s,s,i,p,p,i], N).

N = 2 ? ;

N = 5 ? ;

no| ?-

Martin Kay String Matching 1 5

The Statistics| ?- listing(stat).

stat([m,i,s,s,i,s,s,i,p,p,i], [i,s]).

stat([i,s,s,i,s,s,i,p,p,i], [i,s]).

stat([s,s,i,s,s,i,p,p,i], [s]).

stat([s,i,s,s,i,p,p,i], []).

stat([s,s,i,s,s,i,p,p,i], [i,s]).

stat([s,i,s,s,i,p,p,i], [i,s]).

stat([i,s,s,i,p,p,i], [i,s]).

stat([s,s,i,p,p,i], [s]).

stat([s,i,p,p,i], []).

stat([s,s,i,p,p,i], [i,s]).

stat([s,i,p,p,i], [i,s]).

stat([i,p,p,i], [i,s]).

stat([p,p,i], [s]).

stat([p,p,i], [i,s]).

stat([p,i], [i,s]).

stat([i], [i,s]).

stat([], [s]).

stat([], [i,s]).

18 Entries

11 Allignments

Martin Kay String Matching 1 6

Observe--

If the pattern “mississippi” matched part of the way, we can move over all the the characters matched because none of them can be an “m”, which is what we need to start a new match.

m i s s i o n a r y . . . .m i s s i s s i p p i

Mismatch

No “m” here

Text:Pattern:

or maybeeven here

So move to here!

Martin Kay String Matching 1 7

Observe further --

p e r p e n d i c u l a r . . .p e r p e t r a t e

Text:Pattern:

Mismatch

This is a prefixof the pattern

p e r p e t r a t e

So try this

Martin Kay String Matching 1 8

Observe yet further --

p e r p e t u a l . . . . .p e r p e t r a t e

Text:Pattern:

Mismatch

No (shorter) prefixof the pattern ends

here

p e r p e t r a t e

So move tohere

Martin Kay String Matching 1 9

Overlaps

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

a b a c a b a d a b a c a b a

Martin Kay String Matching 1 10

Déja vu

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

a b a c a b a d a b a c a b a

Martin Kay String Matching 1 11

On-line search

We have seen this much of the text so far:

We are looking for the pattern cacao.We have some number (0 or more) searchesin progress and are waiting for the next characterto see which ones continue and maybe to starta new one.

c a c a

c a c a c a

c

cc

Martin Kay String Matching 1 12

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

1. The rightmost pointer always moves.

2. Others pointers move if they can do so over the same character

3. A new ‘0’ is introduced on the left

A pointer in a given position always has pointers in the same set of positions to its left

These are properties of the pattern only.

Therefore they can be cached or precompiled.

Martin Kay String Matching 1 13

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

If this matches ...

then so will these

Martin Kay String Matching 1 14

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

So try these

only if this fails!

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

Martin Kay String Matching 1 15

The failure function

a [0]

b [0, 1]

a [0, 2]

b [0, 1, 3]

a [0, 2]

c [0, 1, 3]

a [0, 4]

b [0, 1, 5]

a [0, 2, 6]

d [0, 1, 3, 7]

a [0, 8]

b [0, 1, 9]

a [0, 2, 10]

c [0, 1, 3, 11]

a [0, 4, 12]

a b a c a b a d a b a c a ...

0 1 2 3 4 5 6 7 8 9 10 11 12 ...

0 0 1 0 1 2 3 0 1 2 3 4 ...

Martin Kay String Matching 1 16

a [0]

b [0, 1]

a [0, 2]

b [0, 1, 3]

a [0, 2]

c [0, 1, 3]

a [0, 4]

b [0, 1, 5]

a [0, 2, 6]

d [0, 1, 3, 7]

a [0, 8]

b [0, 1, 9]

a [0, 2, 10]

c [0, 1, 3, 11]

a [0, 4, 12]

a b a c a b a d a b a c a ...

0 1 2 3 4 5 6 7 8 9 10 11 12 ...

0 0 1 0 1 2 3 0 1 2 3 4 ...

Martin Kay String Matching 1 17

The Failure Function

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

-1 0 0 0 1 2 3 4 5

Martin Kay String Matching 1 18

The Failure Function

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

-1 0 0 1 0 1 2 3 0 1 2 3 4 5 6

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

Martin Kay String Matching 1 19

The Failure Function-1 0 0 1 0 1 2 3 0 1 2 3 4 5 6

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

Martin Kay String Matching 1 20

Substring, Prefix, Suffix• Part of a string S (even if it covers the

whole of S) is a substring of S.• If it includes the first (last) character

of S, it is a prefix (suffix) of S.• If it does not cover the whole of S, it

is a proper substring (prefix, suffix) of S.

Example: S = ababacSome substrings: ababac, ab, b, bab, ac,

only ababac is not properSome prefixes: ababac, a, aba,

only ababac is not properSome suffixes: ababac, abac, c,

only ababac is not proper

is the empty string

Martin Kay String Matching 1 21

Borders

• If B is a proper prefix and a proper suffix of a string S, it is a border of S.

• Note is a border of every string

Examples:abcabcabc has borders abc, abcabc, abacabadabacaba has borders abacaba, aba, a,

Martin Kay String Matching 1 22

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

-1 0 0 0 1 2 3 4 5

Borders

a b c a b c a b c

Martin Kay String Matching 1 23

border in Prolog

border(Pattern, Boarder) :-

append([_ | _], Border, Pattern),

append(Border, _, Pattern).

Martin Kay String Matching 1 24

border(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border(P0, Pattern, Q), extend(C, Q, Pattern, R).

Borders in Linear-time

-1 0 0 1 0 1 2 3 0 1

a b a c a b a d a b

a b a c a b a d a b

a b a c a b a d a b

Borders at position i+1

extend borders at position i

Martin Kay String Matching 1 25

Building A Tableborder(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border(P0, Patttern, Q), extend(C, Q, Pattern, R).

make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL).

make_table(_, I, N) :- I>N, !.make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

Martin Kay String Matching 1 26

Building A Tableborder(I, Pattern, Q) :- J is I-1, border_table(J, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border_table(P0, Q), extend(C, Q, Pattern, R).

make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL).

make_table(_, I, N) :- I>N, !.make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

Martin Kay String Matching 1 27

Searchingsearch(Pattern, Text, N) :- make_table(Pattern), retract(border_table(0, _)), assert(border_table(0, 0)), length(Pattern, PL), search(Pattern, PL, Text, N).

search(Pattern, PL, Text, N) :- common_prefix(Pattern, Text, CPL), search(CPL, Pattern, PL, Text, N).

search(CPL, _, CPL, _, 0).search(CPL, Pattern, PL, Text0, N) :- border_table(CPL, BL), M is CPL-BL, advance(Text0, M, Text), search(Pattern, PL, Text, N0), N is N0+M.

Build the table

Do the search

Martin Kay String Matching 1 28

Reference

Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing , 6(2):323-350, June 1977.