+ All Categories
Home > Documents > Bitap Algorithm

Bitap Algorithm

Date post: 16-Feb-2016
Category:
Upload: aggie
View: 248 times
Download: 4 times
Share this document with a friend
Description:
Bitap Algorithm. Approximate string matching. Evlogi Hristov. Telerik Corporation. Student at Telerik Academy. Table of Contents. Levenshtein distance. Bitap overview. Bitap Exact search. Bitap Fuzzy search . Additional information. Levenshtein distance. Edit distance. - PowerPoint PPT Presentation
16
Bitap Algorithm Approximate string matching Evlogi Hristov Telerik Corporation Student at Telerik Academy
Transcript
Page 1: Bitap Algorithm

Bitap AlgorithmApproximate string matching

Evlogi Hristov

Telerik Corporation

Student at Telerik Academy

Page 2: Bitap Algorithm

Table of Contents1. Levenshtein distance.2. Bitap overview.3. Bitap Exact search.4. Bitap Fuzzy search.5. Additional information.

2

Page 3: Bitap Algorithm

Levenshtein distanceEdit distance

3

Page 4: Bitap Algorithm

Levenshtein distance Edit distance: Primitive operations

necessary to convert the string into an exact match. insertion: cot → coat deletion: coat → cot substitution: coat → cost

4

Example:1. Set n to be the length of s = "GUMBO"

Set m to be the length of t = "GAMBOL"If n = 0, return m and exitIf m = 0, return n and exit

Page 5: Bitap Algorithm

0

1

2

3

4

5

1

1

2

3

4

5

2

2

1

2

3

4

3

3

2

1

2

3

4

4

3

2

1

2

    G U M B O

  0 1 2 3 4 5

G 1

A 2

M 3

B 4

O 5

L 6

Levenshtein distance (2)

2. Initialize matrix M [m + 1, n + 1]3. Examine each character of s ( i

from 1 to n )4. Examine each character of t ( j

from 1 to m )5. If s[i] equals t[j], the cost is 0

If s[i] is not equal to t[j], the cost is 1

6. Set cell M[j, i] equal to the minimum of:

a. The cell immediately above plus 1: M [j-1, i] + 1

b. The cell immediately to the left plus 1: M [j, i-1] + 1

c. The cell diagonally above and to the left plus the cost: M [j-1, i-1] + cost

7. After the iteration steps (3, 4, 5, 6) are complete, the distance is found in the cell M [m - 1, n - 1]

5

Page 6: Bitap Algorithm

Levenstein distance (3)private int Levenshtein(string source, string target){ if (string.IsNullOrEmpty(source)) { if (!string.IsNullOrEmpty(target)) { return target.Length; } return 0; }

if (string.IsNullOrEmpty(target)) { if (!string.IsNullOrEmpty(source)) { return source.Length; } return 0; }

int[,] dist = new int[source.Length + 1, target.Length + 1]; int min1, min2, min3, cost;

// ..continues on text page6

Page 7: Bitap Algorithm

Levenstein distance (4) for (int i = 0; i < dist.GetLength(0); i += 1) { dist[i, 0] = i; } for (int i = 0; i < dist.GetLength(1); i += 1) { dist[0, i] = i; }

for (int i = 1; i < dist.GetLength(0); i++) { for (int j = 1; j < dist.GetLength(1); j++) { cost = Convert.ToInt32(!(source[i-1] == target[j - 1])); min1 = dist[i - 1, j] + 1; min2 = dist[i, j - 1] + 1; min3 = dist[i - 1, j - 1] + cost; dist[i, j] = Math.Min(Math.Min(min1, min2), min3); } } return dist[dist.GetLength(0)-1,dist.GetLength(1)-1];}

7

Page 8: Bitap Algorithm

Bitap algorithmshift-or/shift-and

8

Page 9: Bitap Algorithm

Bitap algorithm Also known as the shift-or, shift-and or

Baeza–Yates–Gonnet algorithm. Aproximate string matching algorithm. Approximate equality is defined in

terms of Levenshtein distance. Often used for fuzzy search without

indexing. Does most of the work with bitwise

operations. Runs in O(mn) operations, no matter

the structure of the text or the pattern.9

Page 10: Bitap Algorithm

Bitap Exact search(2)public static List<int> ExactMatch(string text, string pattern){ long[] alphabet = new long[128]; //ASCII range (0 – 127) for (int i = 0; i < pattern.Length; ++i) { int letter = (int)pattern[i]; alphabet[letter] = alphabet[letter] | (1 << i); } long result = 1; //0000 0001 List<int> indexes = new List<int>(); for (int index = 0; index < text.Length; index++) { result &= alphabet[text[index]]; //if result != pattern => result = 0 result = (result << 1) + 1;

if ((result & (1 << pattern.Length)) > 0) { indexes.Add(index - pattern.Length + 1); } } return indexes;}

10

Page 11: Bitap Algorithm

Bitap Exact search

c b a b a0 0 1 0 1

11

alphabet[a] =

0 1 2 3 4a b a b c

c b a b a0 1 0 1 0

alphabet[b] =

c b a b a1 0 0 0 0

alphabet[c] =

= 5

= 10= 16

Example: text = cbdabababc pattern = ababc

c b a b a0 0 0 0 0

alphabet[d] = = 0

4 3 2 1 0bits:

0 0 0 0 1start res:

c0 0 0 0 0

c b0 0 0 0 0

c b d0 0 0 0 0

c b d a0 0 0 0 1

c b d a b0 0 0 1 0

b d a b a0 0 1 0 1

d a b a b0 1 0 1 0

a b a b a0 0 1 0 1

b a b a b0 1 0 1 0

a b a b c1 0 0 0 0

res:

res:

res:

res:

res:

res:

res:

res:

res:

res:

text[i]

text[i]

text[i]

text[i]

text[i]

text[i]

text[i]

text[i]

text[i]

text[i]

= 1

Page 12: Bitap Algorithm

Fuzzy searching

12

...long[] result = new long[k + 1]; for (int i = 0; i <= k; i++) { result[i] = 1; }... for (int j = 1; j <= k; ++j) { // Three operations of the Levenshtein distance long insertion = current | ((result[j] & patternMask[text[i]]) << 1); long deletion = (previous | (result[j] & patternMask[text[i]])) << 1; long substitution = (previous | (result[j] & patternMask[text[i]])) << 1;

current = result[j]; result[j] = substitution | insertion | deletion | 1; previous = result[j]; } ...

Instead of having a single array result that changes over the length of the text, we now have k distinct arrays  result 1..k

Page 13: Bitap Algorithm

Shift-and vs. Shift-or Shift-and :

Uses bitwise & and 1’s for matches More intuitive and easyer to

understand Needs to add result |= 1

Shift-or : Uses bitwise | and zeroes’s

for matches A bit faster

13

Page 14: Bitap Algorithm

форум програмиране, форум уеб дизайнкурсове и уроци по програмиране, уеб дизайн – безплатно

програмиране за деца – безплатни курсове и уроцибезплатен SEO курс - оптимизация за търсачки

уроци по уеб дизайн, HTML, CSS, JavaScript, Photoshop

уроци по програмиране и уеб дизайн за ученициASP.NET MVC курс – HTML, SQL, C#, .NET, ASP.NET MVC

безплатен курс "Разработка на софтуер в cloud среда"

BG Coder - онлайн състезателна система - online judge

курсове и уроци по програмиране, книги – безплатно от Наков

безплатен курс "Качествен програмен код"

алго академия – състезателно програмиране, състезанияASP.NET курс - уеб програмиране, бази данни, C#, .NET, ASP.NET

курсове и уроци по програмиране – Телерик академия

курс мобилни приложения с iPhone, Android, WP7, PhoneGapfree C# book, безплатна книга C#, книга Java, книга C# Дончо Минков - сайт за програмиране

Николай Костов - блог за програмиранеC# курс, програмиране, безплатно

?? ? ?

??? ?

?

? ?

??

?

?

? ?

Questions?

?

Bitap algorithm

http://algoacademy.telerik.com

Page 16: Bitap Algorithm

Free Trainings @ Telerik Academy

“C# Programming @ Telerik Academy csharpfundamentals.telerik.com

Telerik Software Academy academy.telerik.com

Telerik Academy @ Facebook facebook.com/TelerikAcademy

Telerik Software Academy Forums forums.academy.telerik.com


Recommended