Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat.

Post on 13-Jan-2016

214 views 0 download

transcript

Swaps + MismatchesSwaps + Mismatches

Based on Estrella EizenbergBased on Estrella Eizenberg

M.Sc. ThesisM.Sc. Thesis

Supervised by Supervised by Ely PoratEly Porat

Swaps + MismatchesSwaps + Mismatches

A paper on this subject by A paper on this subject by

Amihood Amir, Estrella Eizenberg, Ohad Lipsky Amihood Amir, Estrella Eizenberg, Ohad Lipsky and Ely Poratand Ely Porat

Was submitted to ESA 2004Was submitted to ESA 2004

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d a b b a a b c

Mismatches:

Abrahamson 87

K-mismatchesLandau Vishkin 86Amir Lewenstein Porat 00

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d c a b d b a c

Swaps:

Amir Aumann Landau M.Lewenstein N.Lewenstein 87

Cole Hariharan 00

Amir Cole Hariharan Lewenstein Porat 2001

Amir Lewenstein Porat 2000

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d c a b b b a c

Minimum distance:

Counting all as mismatches: 5 err

Minimum distance: 3 err

Starting with simpler problemStarting with simpler problem

={0,1}

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

1 0 0 1 0 1 0

We wish to count only the mismatches

(we will leave the swaps for later) we call them non-swap-mismatches (NSM)

Starting with simpler problemStarting with simpler problem

={0,1}

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

1 0 0 1 0 1 0NSM[6]=2

Mismatches[6]=4

Minimum-distance[6]=(Mismatches[6]+NSM[6])/2

3-err

O(nlogm)

????

O(????+nlogm)

Starting with simpler problemStarting with simpler problem

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

T2: * * * * * * 1 0 1 * * * * 0 1

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * *

P1: * * 0 1 0 1 0

We do the same for the pattern

We will give solution only for the odd places

(NSM[i] where i is odd)

P: 1 0 0 1 0 1 0

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * *

P1: * * 0 1 0 1 0

T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

T1 comparing with P1 doesn’t give any err neither swap nor mismatch (the same is for T2 against P2)

Without loss of generality we look only on T1 against P2

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * * T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

P2: 1 0* * * * * 1

0

Even overlap Odd overlap

We need to count how many odd overlaps we have

One NSM err

Simpler problemSimpler problem

We separate the sequence to 4 categories:

1. Starting at odd position and ending at odd position (called OO)

2. Starting at odd position and ending at even position (called OE)

3. Starting at even position and ending at odd position (called EO)

4. Starting at even position and ending at even position (called EE)

Simpler problemSimpler problem

O

O

T

P

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1-1 1 -1 1 -1 1 -1

The overlap muststart with 1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P

O(nlogm) – one convolution

Simpler problemSimpler problem

O

O

T

P

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

-1 1 -1 1 -1 1 -1 1

The overlap muststart with 1

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P

O(nlogm) – one convolution

O O

O

Simpler problemSimpler problem

We deal with: O? against O? We deal with: O? against O? and with ?O against ?Oand with ?O against ?O

The same method work for E? against E?The same method work for E? against E?and ?E against ?Eand ?E against ?E

We left to deal with: We left to deal with: – OE against EOOE against EO– EO against OEEO against OE– OO against EEOO against EE– EE against OOEE against OO

OO against EEOO against EE

O

E

T

P

P

E E

E

O

EEE

E Even overlap

Odd overlap

We need to recognized when the segment contain one other

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1-1 1-1 1-1 1-1 1 1-1 1-1 1-1 1-1 11-1 1-1 1-1 1-1 1

1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1

0 1

-1

Simpler problemSimpler problem

We can easily know if we are contained or We can easily know if we are contained or we contain another segments if we know the we contain another segments if we know the segment size.segment size.

Smaller segments can’t contain larger Smaller segments can’t contain larger segmentssegments

Simpler problemSimpler problem

Then for each segment we divide the Then for each segment we divide the computation against bigger segmentcomputation against bigger segmentand against smaller segmentsand against smaller segments

We do it by computing the answer each time We do it by computing the answer each time to all segments of size ‘x’to all segments of size ‘x’

Simpler problemSimpler problem

The number of different sizes is at most The number of different sizes is at most square root of msquare root of m

What we haveWhat we have

We have an algorithm for the Simpler We have an algorithm for the Simpler problem that run in time O(n\sqrt{m}\logm)problem that run in time O(n\sqrt{m}\logm)

We have an algorithm for binary alphabet We have an algorithm for binary alphabet that run in O(n\sqrt{m}\logm)that run in O(n\sqrt{m}\logm)

With several more techniques we develop With several more techniques we develop an algorithm solving the original problem in an algorithm solving the original problem in O(n\sqrt{m}\logm)O(n\sqrt{m}\logm)

Open problemOpen problem

It is easy to see that our algorithm is at most It is easy to see that our algorithm is at most factor of O(\sqrt{\logm}) from the optimalfactor of O(\sqrt{\logm}) from the optimalalgorithm (due to redaction to counting algorithm (due to redaction to counting mismatches)mismatches)

But one can try to improve the small But one can try to improve the small alphabet casealphabet case