+ All Categories
Home > Documents > Martin Kay Stanford University

Martin Kay Stanford University

Date post: 22-Jan-2016
Category:
Upload: abie
View: 24 times
Download: 0 times
Share this document with a friend
Description:
Martin Kay Stanford University. String Search 1. Naive Search (1). naive_search(Pattern, Text, 1) :- append(Pattern, _, Text). naive_search(Pattern, [_ | Text], N) :- naive_search(Pattern, Text, N0), N is N0+1. naive_search("is", "mississippi", N). N = 2 ? ; N = 5 ? ; no | ?-. - PowerPoint PPT Presentation
28
Martin Kay String Matching 1 1 Martin Kay Stanford University
Transcript
Page 1: Martin Kay Stanford University

Martin Kay String Matching 1 1

Martin Kay

Stanford University

Page 2: Martin Kay Stanford University

Martin Kay String Matching 1 2

Naive Search (1)naive_search(Pattern, Text, 1) :-

append(Pattern, _, Text).

naive_search(Pattern, [_ | Text], N) :-

naive_search(Pattern, Text, N0),

N is N0+1.

naive_search("is", "mississippi", N).

N = 2 ? ;

N = 5 ? ;

no| ?-

Page 3: Martin Kay Stanford University

Martin Kay String Matching 1 3

pref — A Prefix Predicate

pref(P, T) :-

assert(stat(T, P)),

fail.pref([], _).

pref([H | P], [H | T]) :-

pref(P, T).

Make an entry inthe data baseevery time the

predicate iscalled.

Page 4: Martin Kay Stanford University

Martin Kay String Matching 1 4

Search using prefnaive_search1(Pattern, Text, 1) :-

pref(Pattern, Text).

naive_search1(Pattern, [_ | Text], N) :-

naive_search1(Pattern, Text, N0),

N is N0+1.

| ?- naive_search1([i,s], [m,i,s,s,i,s,s,i,p,p,i], N).

N = 2 ? ;

N = 5 ? ;

no| ?-

Page 5: Martin Kay Stanford University

Martin Kay String Matching 1 5

The Statistics| ?- listing(stat).

stat([m,i,s,s,i,s,s,i,p,p,i], [i,s]).

stat([i,s,s,i,s,s,i,p,p,i], [i,s]).

stat([s,s,i,s,s,i,p,p,i], [s]).

stat([s,i,s,s,i,p,p,i], []).

stat([s,s,i,s,s,i,p,p,i], [i,s]).

stat([s,i,s,s,i,p,p,i], [i,s]).

stat([i,s,s,i,p,p,i], [i,s]).

stat([s,s,i,p,p,i], [s]).

stat([s,i,p,p,i], []).

stat([s,s,i,p,p,i], [i,s]).

stat([s,i,p,p,i], [i,s]).

stat([i,p,p,i], [i,s]).

stat([p,p,i], [s]).

stat([p,p,i], [i,s]).

stat([p,i], [i,s]).

stat([i], [i,s]).

stat([], [s]).

stat([], [i,s]).

18 Entries

11 Allignments

Page 6: Martin Kay Stanford University

Martin Kay String Matching 1 6

Observe--

If the pattern “mississippi” matched part of the way, we can move over all the the characters matched because none of them can be an “m”, which is what we need to start a new match.

m i s s i o n a r y . . . .m i s s i s s i p p i

Mismatch

No “m” here

Text:Pattern:

or maybeeven here

So move to here!

Page 7: Martin Kay Stanford University

Martin Kay String Matching 1 7

Observe further --

p e r p e n d i c u l a r . . .p e r p e t r a t e

Text:Pattern:

Mismatch

This is a prefixof the pattern

p e r p e t r a t e

So try this

Page 8: Martin Kay Stanford University

Martin Kay String Matching 1 8

Observe yet further --

p e r p e t u a l . . . . .p e r p e t r a t e

Text:Pattern:

Mismatch

No (shorter) prefixof the pattern ends

here

p e r p e t r a t e

So move tohere

Page 9: Martin Kay Stanford University

Martin Kay String Matching 1 9

Overlaps

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

a b a c a b a d a b a c a b a

Page 10: Martin Kay Stanford University

Martin Kay String Matching 1 10

Déja vu

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b aa b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

a b a c a b a d a b a c a b a

Page 11: Martin Kay Stanford University

Martin Kay String Matching 1 11

On-line search

We have seen this much of the text so far:

We are looking for the pattern cacao.We have some number (0 or more) searchesin progress and are waiting for the next characterto see which ones continue and maybe to starta new one.

c a c a

c a c a c a

c

cc

Page 12: Martin Kay Stanford University

Martin Kay String Matching 1 12

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

in the text

1. The rightmost pointer always moves.

2. Others pointers move if they can do so over the same character

3. A new ‘0’ is introduced on the left

A pointer in a given position always has pointers in the same set of positions to its left

These are properties of the pattern only.

Therefore they can be cached or precompiled.

Page 13: Martin Kay Stanford University

Martin Kay String Matching 1 13

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

If this matches ...

then so will these

Page 14: Martin Kay Stanford University

Martin Kay String Matching 1 14

a b a c a b a d a b a c a b a

a b a b a c a b a d a b a c a b a d a b a c a b a b a

Search for

So try these

only if this fails!

0 a [0]

1 b [0, 1]

2 a [0, 2]

3 b [0, 1, 3]

4 a [0, 2]

5 c [0, 1, 3]

6 a [0, 4]

7 b [0, 1, 5]

8 a [0, 2, 6]

9 d [0, 1, 3, 7]

10 a [0, 8]

11 b [0, 1, 9]

12 a [0, 2, 10]

13 c [0, 1, 3, 11]

14 a [0, 4, 12]

15 b [0, 1, 5, 13]

16 a [0, 2, 6, 14]

result 2

17 d [0, 1, 3, 7]

18 a [0, 8]

19 b [0, 1, 9]

20 a [0, 2, 10]

21 c [0, 1, 3, 11]

22 a [0, 4, 12]

23 b [0, 1, 5, 13]

24 a [0, 2, 6, 14]

result 10

25 b [0, 1, 3, 7]

26 a [0, 2]

Page 15: Martin Kay Stanford University

Martin Kay String Matching 1 15

The failure function

a [0]

b [0, 1]

a [0, 2]

b [0, 1, 3]

a [0, 2]

c [0, 1, 3]

a [0, 4]

b [0, 1, 5]

a [0, 2, 6]

d [0, 1, 3, 7]

a [0, 8]

b [0, 1, 9]

a [0, 2, 10]

c [0, 1, 3, 11]

a [0, 4, 12]

a b a c a b a d a b a c a ...

0 1 2 3 4 5 6 7 8 9 10 11 12 ...

0 0 1 0 1 2 3 0 1 2 3 4 ...

Page 16: Martin Kay Stanford University

Martin Kay String Matching 1 16

a [0]

b [0, 1]

a [0, 2]

b [0, 1, 3]

a [0, 2]

c [0, 1, 3]

a [0, 4]

b [0, 1, 5]

a [0, 2, 6]

d [0, 1, 3, 7]

a [0, 8]

b [0, 1, 9]

a [0, 2, 10]

c [0, 1, 3, 11]

a [0, 4, 12]

a b a c a b a d a b a c a ...

0 1 2 3 4 5 6 7 8 9 10 11 12 ...

0 0 1 0 1 2 3 0 1 2 3 4 ...

Page 17: Martin Kay Stanford University

Martin Kay String Matching 1 17

The Failure Function

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

-1 0 0 0 1 2 3 4 5

Page 18: Martin Kay Stanford University

Martin Kay String Matching 1 18

The Failure Function

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

-1 0 0 1 0 1 2 3 0 1 2 3 4 5 6

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

Page 19: Martin Kay Stanford University

Martin Kay String Matching 1 19

The Failure Function-1 0 0 1 0 1 2 3 0 1 2 3 4 5 6

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

a b a c a b a d a b a c a b a

Page 20: Martin Kay Stanford University

Martin Kay String Matching 1 20

Substring, Prefix, Suffix• Part of a string S (even if it covers the

whole of S) is a substring of S.• If it includes the first (last) character

of S, it is a prefix (suffix) of S.• If it does not cover the whole of S, it

is a proper substring (prefix, suffix) of S.

Example: S = ababacSome substrings: ababac, ab, b, bab, ac,

only ababac is not properSome prefixes: ababac, a, aba,

only ababac is not properSome suffixes: ababac, abac, c,

only ababac is not proper

is the empty string

Page 21: Martin Kay Stanford University

Martin Kay String Matching 1 21

Borders

• If B is a proper prefix and a proper suffix of a string S, it is a border of S.

• Note is a border of every string

Examples:abcabcabc has borders abc, abcabc, abacabadabacaba has borders abacaba, aba, a,

Page 22: Martin Kay Stanford University

Martin Kay String Matching 1 22

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

a b c a b c a b c

-1 0 0 0 1 2 3 4 5

Borders

a b c a b c a b c

Page 23: Martin Kay Stanford University

Martin Kay String Matching 1 23

border in Prolog

border(Pattern, Boarder) :-

append([_ | _], Border, Pattern),

append(Border, _, Pattern).

Page 24: Martin Kay Stanford University

Martin Kay String Matching 1 24

border(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border(P0, Pattern, Q), extend(C, Q, Pattern, R).

Borders in Linear-time

-1 0 0 1 0 1 2 3 0 1

a b a c a b a d a b

a b a c a b a d a b

a b a c a b a d a b

Borders at position i+1

extend borders at position i

Page 25: Martin Kay Stanford University

Martin Kay String Matching 1 25

Building A Tableborder(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border(P0, Patttern, Q), extend(C, Q, Pattern, R).

make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL).

make_table(_, I, N) :- I>N, !.make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

Page 26: Martin Kay Stanford University

Martin Kay String Matching 1 26

Building A Tableborder(I, Pattern, Q) :- J is I-1, border_table(J, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q).

extend(_, -1, _, 0).extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1.extend(C, P0, Pattern, R) :- border_table(P0, Q), extend(C, Q, Pattern, R).

make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL).

make_table(_, I, N) :- I>N, !.make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

Page 27: Martin Kay Stanford University

Martin Kay String Matching 1 27

Searchingsearch(Pattern, Text, N) :- make_table(Pattern), retract(border_table(0, _)), assert(border_table(0, 0)), length(Pattern, PL), search(Pattern, PL, Text, N).

search(Pattern, PL, Text, N) :- common_prefix(Pattern, Text, CPL), search(CPL, Pattern, PL, Text, N).

search(CPL, _, CPL, _, 0).search(CPL, Pattern, PL, Text0, N) :- border_table(CPL, BL), M is CPL-BL, advance(Text0, M, Text), search(Pattern, PL, Text, N0), N is N0+M.

Build the table

Do the search

Page 28: Martin Kay Stanford University

Martin Kay String Matching 1 28

Reference

Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing , 6(2):323-350, June 1977.


Recommended