Boyer-Moore String Search Algorithm
Michael Sedlmair
Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10
(1977): 762-772.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Preliminaries — Assumptions• Target audience: CS students, ~2nd term • Prerequisites:
- Java basics • Not required
- Complexity theory (O-notation, etc.) - …
2
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
String search
4
F I N D I N A H A Y S T A C K N E E D L E I N A
Text txt (with N characters, here N = 24)
N E E D L E
Pattern pat (with M characters, here M = 6)
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
5
How would you go about it?F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
6
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
7
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
8
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
9
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
10
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
11
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
12
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
13
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
14
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
15
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
16
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
17
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
18
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
19
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
20
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
21
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Naïve algorithms
22
F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis• What is the problem? • Naïve search is very inefficient
24
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis• What is the problem? • Naïve search is very inefficient• What could we change?
25
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Better naïve algorithm
26
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Better naïve algorithm
27
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
‘A’ does never occur in the pattern
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Better naïve algorithm
28
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
‘A’ does never occur in the pattern
F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E
N E E D L EN E E D L E
skip!
Boyer-Moore Algorithm
Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10
(1977): 762-772.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Intuition• Try to skip as many as M characters when mismatch
(unless we have a reason not to)
30
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Intuition
31
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
• Try to skip as many as M characters when mismatch(unless we have a reason not to)
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Intuition
32
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
• Try to skip as many as M characters when mismatch(unless we have a reason not to)
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Intuition
33
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E
• Try to skip as many as M characters when mismatch(unless we have a reason not to)
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Intuition• Try to skip as many as M characters when mismatch
(unless we have a reason not to) • Scan pattern from right to left
34
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 35
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 36
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 37
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 38
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 39
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 40
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 41
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
condition (a)!
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 42
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 43
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
condition (b)!
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 44
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 45
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 46
F I N D I N A H A Y S T A C K N E E D L E I N AI I
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 47
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 48
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Skipping
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 49
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 50
F I N D I N A H A Y S T A C K N E E D L E I N AI I
N E E D L E
Checking
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair 51
F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E
Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis• Idea: If we know how much to skip, we can drastically
reduce the steps needed
52
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis• Idea: If we know how much to skip, we can drastically
reduce the steps needed • Boyer-Moore
• Naïve
53
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Simple algorithm• Only find first occurrence • Based on remembering rightmost occurrence
55
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skipA: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
56
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
57
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
58
right[c]nullnullnullnullnullnullnullnullnullnull
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
59
c right[c]A -1B -1C -1D -1E -1… -1L -1M -1N -1… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
60
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E -1… -1L -1M -1N -1… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
61
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E -1… -1L -1M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
62
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E 1 1… -1L -1M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
63
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E 1 2 2… -1L -1M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
64
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 2… -1L -1M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
65
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 2… -1L 4 4M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Q: How much to skip
66
N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 5 5… -1L 4 4M -1N 0 0… -1
A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)
//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)
right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)
right[pat.CharAt(j)] = j;
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
67
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
68
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
txt
pat
N
M
skip
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
69
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
70
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
71
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Loop over patternpublic int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
72
F I N D I N A H A Y S T A C K N E E D L E I N AI
N E E D L E
Checking: mismatch?public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
73
Skipping
compute skip value
// compute skip valueright[txt.charAt(i+j)]// (here) = 0
F I N D I N A H A Y S T A C K N E E D L E I N A
N E E D L E
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
74
Skipping
compute skip value
// compute skip valueright[txt.charAt(i+j)]// (here) = 0// we can jump by M-1 chars// j = M-1// skip = j-0 // (here) skip = 5
F I N D I N A H A Y S T A C K N E E D L E I N A
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
75
F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E
Skipping
// compute skip valueright[txt.charAt(i+j)]// (here) = 0// we can jump by M-1 chars// j = M-1// skip = j-0 // (here) skip = 5
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
76
Skip at least by 1
in case other term is nonpositive
. . . . . . E L E .I I I
N E E D L E
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
77
Skip at least by 1
in case other term is nonpositive
. . . . . . E L E .I I I
N E E D L E
right[txt.charAt(i+j)]// = 5
N E E D L E
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
78
Skip at least by 1
in case other term is nonpositive
. . . . . . E L E .I I I
N E E D L E
right[txt.charAt(i+j)]// = 5
N E E D L E
// j = M-3// skip = j-5 = M-3-5 // skip = -2
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
79
Skip at least by 1
in case other term is nonpositive
. . . . . . E L E .I
N E E D L E
not jumping back!
// skip = -2
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
80
Skip at least by 1
in case other term is nonpositive
. . . . . . E L E .I
N E E D L E
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
(Simple) Java implementation
81
MATCH!
F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E
match
public int search (String txt, String pat){
int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){
skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }
}if (skip == 0) return i;
}return N;
}
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis (continued)• Takes about ~N / M character comparisons
82
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis (continued)• Takes about ~N / M character comparisons • Worst Case?
83
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M
84
B B B B B B B B B B B B B B B B B B B B B B B B
A B B B B B
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M
85
B B B B B B B B B B B B B B B B B B B B B B B B
A B B B B B
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M
86
B B B B B B B B B B B B B B B B B B B B B B B B
A B B B B B etc. etc.
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Homework: Do it yourself!Dale’s cone of experience
- what you heard today
- what you do yourself
87
{{
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Homework• Implement “extended” Boyer-Moore that …
- finds all occurrences of the pattern - make the algorithm not only remember the right-most occurrence, but all
occurrences • Analysis
- text: lorem ipsum — http://www.lipsum.com/feed/html - pattern: “nulla” - count steps needed - (not mandatory: compare to steps needed in simple Boyer Moore and naïve
search)
88
Jacobs University, Bremen — May 10, 2017Michael Sedlmair
Summary & Outlook• Naïve search algorithm
- inefficient • Boyer-Moore algorithm
- bad character rule- simple java implementation - ~ N/M character comparisons
• Homework - extended Boyer-Moore implementation
• What’s next - Boyer-Moore — good suffix rule
89