+ All Categories
Home > Documents > Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F....

Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F....

Date post: 21-Dec-2015
Category:
View: 217 times
Download: 1 times
Share this document with a friend
Popular Tags:
45
Guided Forest Edit Distance: Better Structure Comparisons by Using Domain- knowledge Z.S. Peng H.F. Ting
Transcript
Page 1: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge

Z.S. Peng

H.F. Ting

Page 2: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The Forest Edit Distance

Page 3: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Relabling node i in E by the label of node j in F

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

Page 4: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Relabling node i in E by the label of node j in F

Relabel (3,5)

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy y

Page 5: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Relabling node i in E by the label of node j in F

Cost of the operation: (3,5)

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy p

Page 6: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Delete node i from E

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

Page 7: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Delete node i from E

Delete (2,-)

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

Page 8: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Delete node i from E

Delete (2,-)

4

3

1

4

1

2

3

7

5 6

E F

a

h

m

a

me

z

v

uy

Page 9: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labeled forests

Edit operations between E and F Delete node i from E

Cost of the operation: (2,-)

4

3

1

4

1

2

3

7

5 6

E F

a

h

m

a

me

z

v

uy

Page 10: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labelled forests

Edit operations between E and F Delete node j from F

The cost of operation: (-,j)

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

Page 11: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labelled forests

The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'.

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

4

2 3

1

4

1

2

3

7

5 6

a

h

f m

a

me

z

v

uy

Page 12: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labelled forests

The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'.

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

4

2 3

1

4

1

2

3

7

5 6

a

h

f m

a

me

z

v

uy e

Page 13: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Edit distance of two ordered, labelled forests

The Guided edit distance (E,F,G) between E and F with respect to a third forest G is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' include G as a subforest.

4

2 3

1

4

1

2

3

7

5 6

E F

a

h

f m

a

me

z

v

uy

4

2 3

4

1 3

a

m

a

mee

3

1 2

a

me

G

Page 14: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Application 1: RNA comparisons

Cherry small circular viroid-Like RNA GI:2347024 between base 287 and base 337. The Hammerhead motif of the RNA is printed in bold.

Page 15: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Application 2: Comparing XML documents

XML documents with same Document Type Descriptor should be aligned with this DTD to get more accurate results

Page 16: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The algorithms

(E,F)

Tai 1979:Zhang and Shasha 1989:

where Klein 1998:

(E,F,G):

This paper:

))()(|||(| 22 FdEdFEO

))()(|||(| FEFEO

|)|log|||(| 2 FFEO

))(|)()(|||||(| 2GLFEGFEO

)}(),(min{)( XdXLX

Page 17: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Special Cases

a

a

c

c

b

a

c

c

a

c

c

f

f

Page 18: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Special Cases

a

a

c

c

b

a

c

c

a

c

c

f

f

Longest Constraint Common Subsequence

Constrained Sequence Alignment

Page 19: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The algorithms

Constrained Longest Common Subsequent Tsai 2003:

Constrained Sequence Alignment Chin et al. :

This paper:

where

Since G has one leaf, the time becomes

|)||||(| gfeO

|)||||(| gfeO

))(|)()(|||||(| 2GLFEGFEO )}(),(min{)( XdXLX

|)||||(| GFEO

Page 20: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Our algorithm for computing (E,F,G)

Dynamic Programming

Page 21: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The sub-problems

Post-order numbering (naming) of the nodes

5

3 4

1 2

14

10

1211

138

7

9

6

18

16

15

17

20

19 2221

23

Page 22: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The sub-problems

: A "consecutive" sub-forest

'..iiE

5

3 4

1 2

14

10

1211

138

7

9

6

18

16

15

17

20

19 2221

23

Page 23: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The sub-problems

: A "consecutive" sub-forest

'..iiE

5

3 4

1 2

14

10

1211

138

7

9

6

18

16

15

17

20

19 2221

23

21..4E

Page 24: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The sub-problems

),,( '..'..'.. kkjjii GFE

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

Page 25: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The sub-problems

),,( 3..27..47..2 GFE

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

7..2E 7..4F 3..2G

Page 26: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

is equal to the minimum of the followings:

),()( ..,1)..(,)..( 00jGFE kjjsiis

1.

2.

3.

4.

5.

),()( ..,)..(,1)..( 00 iGFE kjjsiis

)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis

),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis

),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis

)( ..,)..(,)..( 00 kjjsiis GFE

Page 27: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

1. ),()( ..,)..(,1)..( 00 iGFE kjjsiis

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

Page 28: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

1. ),()( ..,)..(,1)..( 00 iGFE kjjsiis

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

Page 29: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

2. ),()( ..,1)..(,)..( 00jGFE kjjsiis

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

Page 30: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

3.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis

Page 31: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

3.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis

Page 32: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

4.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis

Page 33: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

4.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis

Page 34: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

5.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis

Page 35: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

5.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis

Page 36: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

5.

5

3 4

1 2

5

1

32

48

7

9

6

9

6

7

8

2

1 43

5

E F G

),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis

Page 37: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The order for solving the sub-problems

for i=1 to |E|

for j=1 to |F|

for h=1 to |G|

for k=1 to (|G|-h+1)

if k is a leaf then find ),,( )1..(..1..1 hkkji GFE

Page 38: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The time complexity

)|)(||||||(| 222 GLGFEO

Page 39: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Sparsify the dynamic program

using a clever trick of Zhang and Shasha

Page 40: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

key-root: if it is the root, or has a left-slibling

5

3 4

1 2

5

1

32

4

8

7

9

6

9

6

7

8

2

1 43

5

E F G

2

1

Page 41: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

key-root: if it is the root, or has a left-slibling

5

3 4

1 2

5

1

32

4

8

7

9

6

9

6

7

8

2

1 43

5

E F G

2

1

No. of key-roots ≤ no. of leaves

Page 42: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

To compute (E,F,G)= (E||1..|E| ,F||1..|F| ,G||1..|G|)

for i=1 to |E|

for j=1 to |F|

for h=1 to |G|

for k=1 to (|G|-h+1)

if k is a leaf

find ),,( )1..(..1..1 hkkji GFE

Page 43: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

To compute (E,F,G)= (E||1..|E| ,F||1..|F| ,G||1..|G|)

for i=1 to |E|

for j=1 to |F|

for h=1 to |G|

for k=1 to (|G|-h+1)

if k is a leaf and i and j are key-roots

find ),,( )1..(..1..1 hkkji GFE

Page 44: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

The new running time

))(|)()(|||||(| 2GLFEGFEO

Page 45: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Thank you


Recommended