+ All Categories
Home > Documents > Comparison of large sequences

Comparison of large sequences

Date post: 12-Jan-2016
Category:
Upload: drea
View: 24 times
Download: 0 times
Share this document with a friend
Description:
Comparison of large sequences. First part: Alignment of large sequences. Dynamic programming. accaccacaccacaacgagcata … acctgagcgatat. a c c . . t. acc.................................agt | | |.................................|xx acc.................................a--. - PowerPoint PPT Presentation
Popular Tags:
106
Comparison of large sequences First part: lignment of large sequences
Transcript
Page 1: Comparison of large sequences

Comparison of large sequences

First part:

Alignment of large sequences

Page 2: Comparison of large sequences

Dynamic programming

What about genomes?

• Quadratic cost of space and time.

accaccacaccacaacgagcata … acctgagcgatat

acc..t

• Short sequences (up to 10.000 bps) can be aligned using dynamic programming

• Quadratic cost of space and time.

acc.................................agt | | |.................................|xxacc.................................a--

Page 3: Comparison of large sequences

Genomic sequences

In which case Dynamic Programming can be applied?

•The length of sequences is 1000 times longer.

• Genomic sequences have millions of base pairs.

•The running time is 1.000.000 times higher !

(1 second becomes 11 days)(1 minute becomes 2 years)

Page 4: Comparison of large sequences

First assumption

……………………………………………………………….

………………………….………………...…………...….

Genome B

Genome A

……………………………………Genome B

……

……

……

……

……

….

Gen

ome

A

Page 5: Comparison of large sequences

Realistic assumption?

Unrealistic assumption!

More realistic

assumption

……………………………………………………………….

………………………….………………...…………...….

Genome B

Genome A

………………………………………………………………….

………………………………………………...…………...….Genome A

Genome B

………………………

……

……G

enom

e A

Genome B

Page 6: Comparison of large sequences

Realistic assumptions?

But, now is it a

real case?

Unrealistic assumption!

More realistic

assumption

……………………………………………………………….

………………………….………………...…………...….

Genome B

Genome A

…………………………………………………………………

………………………………………………...…………...….Genome A

Genome B

………………………

……

……G

enom

e A

Genome B

Page 7: Comparison of large sequences

Preview in a real case

Chlamidia muridarum: 1.084.689bps Chlamidia Thrachomatis:1057413bps

Page 8: Comparison of large sequences

Preview in a real case

Pyrococcus abyssis: 1.790.334 bpsPyrococcus horikoshu: 1.763.341 bps

Page 9: Comparison of large sequences

Methodology of an alignment

1st:

2nd:

3th: (Linear cost)

Identify the portions that can be aligned.

Make a preview: ……………………..….

…………………...….

Make the alignment:

…..…

……

………………….

(Linear cost)

Page 10: Comparison of large sequences

Methodology of an alignment

(Linear cost)

Make a preview: ……………………..….

…………………...….

1st:

2nd:

3th:

Identify the portions that can be aligned.

Make the alignment:

…..…

……

………………….

?

Page 11: Comparison of large sequences

Preview-Revisited

… a a t g….c t g...

… c g t g….c c c ...

MatchingUniqueMaximal

MUM

Connect to MALGEN

Page 12: Comparison of large sequences

Methodology of an alignment

1st:

2nd:

3th:

Identify the portions that can be aligned.

Make a preview: ……………………..….

…………………...….

Make the alignment:

…..…

……

………………….

How can MUMs be found?

With CLUSTALW, TCOFFEE,…

How can these portions be determined?

Linear costwith

Suffix trees

Page 13: Comparison of large sequences

Bioinformatics PhD. Course

Second part:

Introducing Suffix trees

Page 14: Comparison of large sequences

Suffix trees

Given string ababaas:

1: ababaas

2: babaas

3: abaas

4: baas

5: aas

6: as

7: s

as,3

s,6

as,5

s,7

as,4ba

baas,2

a

babaas,1

a

babaas,1

ba

baas,2

as,3

as,4

s,6

as,5

s,7

Suffixes:

What kind of queries?

Page 15: Comparison of large sequences

Applications of Suffix trees

a

babaas,1as,3

ba

baas,2

as,4

s,6

as,5

s,7

1. Exact string matching

…………………………

• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

Page 16: Comparison of large sequences

Quadratic insertion algorithm

Given the string …………………………......

P1: the leaves of suffixes from have been inserted

and the suffix-tree

…...

Invariant Properties:

Page 17: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

ababaabbs,1

Page 18: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

Page 19: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1ababaabbs,1

Page 20: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

abbs,3

Page 21: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

abbs,3

ba

baabbs,2

Page 22: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

ababaabbs,1

abbs,3

ba

baabbs,2

abbs,4

Page 23: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

ababaabbs,1

abbs,3

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

Page 24: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

abbs,5

Page 25: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

abbs,5

Page 26: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

abbs,4

ba

ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

Page 27: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

bs,6

Page 28: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

bs,6

Page 29: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

Page 30: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

s,8

Page 31: Comparison of large sequences

Quadratic insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

s,7

s,9

Page 32: Comparison of large sequences

Generalizad suffix tree

The suffix tree of many strings …

and it is the suffix tree of the concatenation of strings.

the generalized suffix tree of ababaabb and aabaat …

is the suffix tree of ababaabαaabaatβ, :

is called the generalized suffix tree …

For instance,

Page 33: Comparison of large sequences

Generalizad suffix tree

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given the suffix tree of ababaabα :

Construction of the suffix tree of ababaabbαaabaaβ :

Page 34: Comparison of large sequences

Generalizad suffix tree

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Construction of the suffix tree of ababaabbαaabaaβ :

Page 35: Comparison of large sequences

Generalizad suffix tree

Construction of the suffix tree of ababaabbαaabaaβ :

a bα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

abaaβ,1

Page 36: Comparison of large sequences

Generalizad suffix tree

Construction of the suffix tree of ababaabbαaabaaβ :

a bα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

abaaβ,1

Page 37: Comparison of large sequences

Generalizad suffix tree

a bα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

abaaβ,1

aβ,2

Construction of the suffix tree of ababaabbαaabaaβ :

Page 38: Comparison of large sequences

Generalizad suffix tree

a bα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

abaaβ,1

aβ,2

Construction of the suffix tree of ababaabbαaabaaβ :

Page 39: Comparison of large sequences

Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree

a bα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

abaaβ,1

aβ,2

aβ,3

Page 40: Comparison of large sequences

Construction of the suffix tree of ababaabbαaabaaβ :

Generalizad suffix tree

a bα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

abaaβ,1

aβ,2

aβ,3

Page 41: Comparison of large sequences

Generalizad suffix tree

abα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

aβ,3

a

β,4

Construction of the suffix tree of ababaabbαaabaaβ :

Page 42: Comparison of large sequences

Generalizad suffix tree

abα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

aβ,3

a

β,4

Construction of the suffix tree of ababaabbαaabaaβ :

Page 43: Comparison of large sequences

Generalizad suffix tree

abα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

aβ,3

a

β,4β,5

Construction of the suffix tree of ababaabbαaabaaβ :

Page 44: Comparison of large sequences

Generalizad suffix tree

abα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

aβ,3

a

β,4β,5

Construction of the suffix tree of ababaabbαaabaaβ :

Page 45: Comparison of large sequences

Generalizad suffix tree

abα,5

b

a bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

aβ,3

a

β,4β,5β,6

Construction of the suffix tree of ababaabbαaabaaβ :

Page 46: Comparison of large sequences

Generalizad suffix tree

a bα,5

ba bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

a β,3

aβ,4β,5

β,6

Generalized suffix tree of ababaabbαaabaaβ :

Page 47: Comparison of large sequences

Applications of Suffix trees

a

babaas,1as,3

ba

baas,2

as,4

s,6

as,5

s,7

1. Exact string matching

…………………………

• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

Page 48: Comparison of large sequences

Applications of Suffix trees

2. The substring problem for a database of strings DB• Does the DB contain any ocurrence of patterns abab, aab, and ab?

a bα,5

ba bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

a β,3

aβ,4β,5

β,6

Page 49: Comparison of large sequences

Applications of Suffix trees

3. The longest common substring of two strings

a bα,5

ba bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

a β,3

aβ,4β,5

β,6

Page 50: Comparison of large sequences

Applications of Suffix trees

5. Finding MUMs.

a bα,5

ba bbα,3

baabbα,1

bα,6

a

baabbα,2

b

bbα,4

bα,7

α,8

α,9

baaβ,1

aβ,2

a β,3

aβ,4β,5

β,6

Page 51: Comparison of large sequences

Bioinformatics PhD. Course

Third part:

Suffix links

Page 52: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

Page 53: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 54: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 55: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 56: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 57: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 58: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 59: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 60: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

?

Page 61: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

Page 62: Comparison of large sequences

Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

a

Page 63: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a

Page 64: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a

Page 65: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a aa in S2 [1] Unique matchings

Page 66: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a aa in S2 [1] Unique matchings

aab in S2 [1] =

S1[5..6-7] in S2 [1]

Page 67: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]

Page 68: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]

Page 69: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]

S1[3..6-…] in S2 [2]

Page 70: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]

S1[3..6-…] in S2 [2]

Page 71: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]

S1[3..6-…] in S2 [2]

Page 72: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]

S1[3..6-…] in S2 [2]

Page 73: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1]

S1[3..6-8] in S2 [2]

S1[4..6-8] in S2 [3]

Page 74: Comparison of large sequences

Traversal using Suffix links

a abbα,5

b

a abbα,3

baabbα,1

bα,6

a

baabbα,2

b

abbα,4

bα,7

α,8

α,9

Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4]

S1[3..6-8] in S2 [2]

S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]

Page 75: Comparison of large sequences

From UMs to MUMs

Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4]

S1[3..6-8] in S2 [2]

S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]

Array of UMs

123 6-84 6-85 86 87 889

and S1 = a b a b a a b b α

MUM: S1[3..6-8] in S2[2]

Page 76: Comparison of large sequences

Bioinformatics PhD. Course

Third part:

Linear insertion algorithm

Page 77: Comparison of large sequences

Quadratic insertion algorithm

Given the string …………………………......

P1: the leaves of suffixes from have been inserted

and the suffix-tree

…...

Invariant Properties:

Page 78: Comparison of large sequences

Linear insertion algorithm

Given the string …………………………......

P2: the string is the longest string that can be spelt through the tree.

P1: the leaves of suffixes from have been inserted

and the suffix-tree

…...

Invariant Properties:

Page 79: Comparison of large sequences

Linear insertion algorithm: example

Given the string ababaababb...

ba

baababb...,2

a ababb...,5

ba ababb...,3

baababb...,1ababb...,4

a

Page 80: Comparison of large sequences

Linear insertion algorithm: example

Given the string ababaababb...

ba

baababb...,2

a ababb...,5

ba ababb...,3

baababb...,1ababb...,4

6 7 8

Page 81: Comparison of large sequences

Linear insertion algorithm: example

ba

baababb...,2

a ababb...,5

ba ababb...,3

baababb...,1ababb...,4

6 7 8Given the string ababaababb...

Page 82: Comparison of large sequences

Linear insertion algorithm: example

ba

baababb...,2

a ababb...,5

ba ababb...,3

baababb...,1ababb...,4

6 7 89Given the string ababaababb...

Page 83: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

baababb...,1

ba

baababb...,2

ababb...,4

Given the string ababaababb...

6 7 89

baababb...,1b

b...,6

aababb...,1

Page 84: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba

baababb...,2

ababb...,4

Given the string ababaababb...

7 89

b

b...,6

aababb...,1

Page 85: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba

baababb...,2

ababb...,4

Given the string ababaababb...

7 89

b

b...,6

aababb...,1

Page 86: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba

baababb...,2

ababb...,4

Given the string ababaababb...

7 89

b

b...,6

aababb...,1

Page 87: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba

baababb...,2

ababb...,4

Given the string ababaababb...

7 89

b

b...,6

aababb...,1

baababb...,2b aababb...,2

Page 88: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba

baababb...,2

ababb...,4

Given the string ababaababb...

7 8…

b

b...,6

aababb...,1

baababb...,2b

b...,7

aababb...,2

Page 89: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba ababb...,4

Given the string ababaababb...

89

b

b...,6

aababb...,1

b

b...,7

aababb...,2

Page 90: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba ababb...,4

Given the string ababaababb...

89

b

b...,6

aababb...,1

b

b...,7

aababb...,2

Page 91: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba ababb...,4

Given the string ababaababb...

89

b

b...,6

aababb...,1

b

b...,7

aababb...,2

Page 92: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

ba ababb...,3

ba ababb...,4

Given the string ababaababb...

89

b

b...,6

aababb...,1

b

b...,7

aababb...,2

Page 93: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

ba ababb...,4

Given the string ababaababb...

89

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

Page 94: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

ba ababb...,4

Given the string ababaababb...

89

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

b...,8

Page 95: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

ba ababb...,4

Given the string ababaababb...

9

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

b...,8

Page 96: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

ba ababb...,4

Given the string ababaababb...

9

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

b...,8

Page 97: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

b ababb...,4

Given the string ababaababb... 9

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

b...,8

a

Page 98: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

b ababb...,4

Given the string ababaababb... 9

ababb...,3

b

b...,6

aababb...,1

b

b...,7

aababb...,2

a

b...,8

a

b...,9

Page 99: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

b ababb...,4

Given the string ababaababb... 9

ababb...,3

b

b...,6

ababb...,1

b

b...,7

aababb...,2

a

b...,8

a

b...,9

Page 100: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

b ababb...,4

Given the string ababaababb... 9

ababb...,3

b

b...,6

ababb...,1

b

b...,7

aababb...,2

a

b...,8

a

b...,9

Page 101: Comparison of large sequences

Linear insertion algorithm: example

a ababb...,5

b

b ababb...,4

Given the string ababaababb...

9

ababb...,3

b

b...,6

ababb...,1

b

b...,7

aababb...,2

a

b...,8

a

b...,9

Page 102: Comparison of large sequences

Index

Suffix arrays Suffix-arrays: a new method for on-line

string searches, G. Myers, U. Manber

Page 103: Comparison of large sequences

Suffix arrays

Given string ababaa#:

1: ababaa#

2: babaa#

3: abaa#

4: baa#

5: aa#

6: a#

7: #

Suffixes: … but lexicographically sorted

1: ababaa#

2: babaa#

3: abaa#

4: baa#

5: aa#6: a#1: #1

234567

Which is the cost? O(n log(n))

Page 104: Comparison of large sequences

Applications of suffix arrays

1. Exact string matching• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

1: ababaa#

2: babaa#

3: abaa#

4: baa#

5: aa#6: a#1: #1

234567

Binary search

O(log(n) |P|)

… which is the cost?

O(log(n)+|P|) ?

Can it be improved to …

Page 105: Comparison of large sequences

Fast search with cost O(log(n)+|P|) Query:

Invariant Properties:

P1: α < query ≤ β α

β

12… …

n

Suffix array

P2: matches pref( query)

Page 106: Comparison of large sequences

Fast search with cost O(log(n)+|P|) Query:

Invariant Properties:

P1: α < query ≤ β α

β

γAlgorithm:

12… …

n

Suffix array

P2: matches pref( query)

If suff(γ)<suff(query) then α = γ

else β = γ


Recommended