+ All Categories
Home > Documents > On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu...

On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu...

Date post: 27-Mar-2015
Category:
Upload: james-hayes
View: 213 times
Download: 1 times
Share this document with a friend
Popular Tags:
86
On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen Ting, Ling Tok Wang
Transcript
Page 1: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching Using

Two Data Streaming Techniques

Presenter: Lu Jiaheng

Supervisor: Prof. Ling Tok Wang

Joint work: Chen Ting, Ling Tok Wang

Page 2: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

2

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 3: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

3

XML Twig Pattern Matching

An XML document is commonly modeled as a rooted, ordered and tagged tree.

book

preface chapter chapter

section

section

figure

paragraph

section

figure

paragraph figure

paragraph

………….

title

title

“XML”“Data”

“Intro”

Page 4: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

4

Regional Coding Node Label1: (startPos: endPos, LevelNum) E.g.

book (0: 32, 1)

preface (1:3, 2) chapter (4:29, 2) chapter(30:31, 2)

“Intro” (2:2, 3) section (5:28, 3)

section(9:17, 4)

figure (14:15, 6)

paragraph(13:16, 5)

section(18:23, 4)

figure (20:21, 6)

paragraph(19:22, 5)figure (25:26, 5)

paragraph(24:27, 4)title: (6:8, 4)

title: (10:12, 5)

1. M.P. Consens and T.Milo. Optimizing queries on files. In In Proceedings of ACM SIGMOD, 1994.

“Data” (7:7, 3)

“XML” (11:11, 3)

Page 5: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

5

What is a Twig Pattern? A twig pattern is a small tree whose nodes are tags, attributes or text

values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges.

E.g. Selects Figure elements which are descendants of Paragraph elements which in turn are children of Section elements having child element Title

XPath: Section[Title]/Paragraph//Figure Twig pattern :

Section

Title Paragraph

Figure

Page 6: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

6

XML Twig Pattern Matching Problem Statement

Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D.

E.g. Consider Query and Document:

Document: s1

s2

f1

p1

t1

t2

Section

title figure

Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1)

Query:

Page 7: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

7

XML Twig Pattern Matching Problem Statement

Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D.

E.g. Consider Query and Document:

Document: s1

s2

f1

p1

t1

t2

Section

title figure

Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1)

Query:

Page 8: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

8

XML Twig Pattern Matching Problem Statement

Given a query twig pattern Q, and an XML database D, we need to compute ALL the answers to Q in D.

E.g. Consider Query and Document:

Document:

s1

s2

f1

p1

t1

t2

Section

title figure

Query solutions: (s1, t1, f1) (s2, t2, f1) (s1, t2, f1)

Query:

Page 9: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

9

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 10: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

10

Previous work: TwigStack TwigStack2: a holistic approach

Each element in the document is labeled with region encoding labeling scheme.

The input data is the labels of all elements whose tags occur in the query twig. The output data is the matching solutions with the format of n-tuple, where n is the number of nodes in query.

For each node in the query, there exists a corresponding input stream.

Each label in a stream is scanned only once. That is, the cursor of each stream is not allowed to go back in any time.

2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002.

Page 11: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

11

Previous work: TwigStack

TwigStack2: a holistic approach Two-phase algorithm:

Phase 1 TwigJoin: intermediate root-leaf paths are outputted Phase 2 Merge: merge the intermediate paths to get the final results

2. N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal xml pattern matching. In In Proceedings of ACM SIGMOD, 2002.

Page 12: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

12

Previous work: TwigStack

A node q in a twig pattern Q is associated with a stack Sq

Insertion and deletion in a stack Sq

Insertion: An element eq from stream Tq is pushed into its stack Sq if and only if eq has a descendant eqi in each Tqi , where qi is a child of q

Each node eqi recursively has the first property

Deletion: An element eq is popped out from its stack if all matches involving it have been output.

Page 13: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

13

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:

Page 14: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

14

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

Page 15: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

15

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

Page 16: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

16

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

Page 17: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

17

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Page 18: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

18

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

2:3,2

Page 19: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

19

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>

Page 20: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

20

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>

4:9,2

Page 21: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

21

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>

4:9,2

5:6,3

Page 22: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

22

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

4:9,2

Page 23: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

23

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

4:9,2

7:8,3

Page 24: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

24

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

<s1,f1>,<s2,f1>,

4:9,2

Page 25: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

25

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

<s1,f1>,<s2,f1>

10:11,2

Page 26: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

26

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

1:12,1

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

<s1,f1>,<s2,f1>,<s1,f2>

Page 27: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

27

XML Twig Pattern Matching

Document:s1

s2

f1

f2t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,210:11,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3), (10:11,2)

Output path solutions:

<s1, t1>, <s1,t2>,<s2,t2>,

<s1,f1>,<s2,f1>,<s1,f2>

Merge:

<s1,t1,f1>,<s1,t1,f2>, <s1,t2,f1>,<s1,t2,f2>,<s2,t2,f1>

Page 28: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

28

Sub-optimality of TwigStack

If the query contains any parent-child relationship, TwigStack may output some intermediate path solutions that cannot contribute to final results.

We call that TwigStack is sub-optimal for queries with parent-child relationships.

Page 29: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

29

Example: sub-optimality of TwigStack

Document:s1

s2

f1

t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3)

Page 30: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

30

Example: sub-optimality of TwigStack

Document:s1

s2

f1

t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3)

1:12,1

Because f1 and t1 are descendants of s1 , s1 is pushed to the stack. Note that f1 is not a child of s1.

Page 31: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

31

Example: sub-optimality of TwigStack

Document:s1

s2

f1

t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3)

1:12,1

2:3,2

Page 32: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

32

Example: sub-optimality of TwigStack

Document:s1

s2

f1

t1

t2

Section

title figure

Query:1:12,1

2:3,2

4:9,2

5:6,3 7:8,3

(1:12,1), (4:9,2)

(2:3,2), (5:6,3)

Section

title

figure

(7:8,3)

1:12,1

Output solution: <s1,t1>.

But it is a useless intermediate solution and do not contribute to any final solution.

Page 33: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

33

TwigStackList The main problem of TwigStack is to assume all

edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships.

Alternative: TwigStackList3 [CIKM 2004] TwigStackList3 is an improvement algorithm for

TwigStack, which consider parent-child relationships in the first phase and identify a large query class to be optimal than TwigStack.

3. J. Lu, T. Chen, and T. W. Ling. Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In CIKM, pages 533- 542, 2004.

Page 34: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

34

Optimal class of TwigStack and TwigStackList

TwigStack TwigStackList

Optimal query class

All edges are ancestor-descendant relationships

All edges connecting branching nodes and the children are ancestor-descendant relationship

TwigStack O S STwigStackList O O S

a

b c

O :optimal

S: sub-optimal

a

b c

a

b c

d

Page 35: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

35

Challenges (1) Although TwigStackList enlarges the optimal

query class of TwigStack, it still shows sub-optimal for a large class of twig query.

For example: two sub-optimal twig queries for TwigStackList :

Section

title figure

Section

title figure

Page 36: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

36

Challenges (2) In algorithms TwigStack and TwigStackList, to

answer a twig query, they need to read labels for all elements whose tags occur in the query.

Can we accelerate the query processing by reading only parts of them ?

Section

title figure

Query:Document :

s1

f1

t1

f2 fn ……

There is no answer in the document, since no figure elements in level 2. But previous algorithms still need to read all figure elements in Level

3.

Level 1:

Level 2:

Level 3:

Page 37: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

37

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 38: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

38

Our solution We proposed two data streaming schemes:

tag+level and prefix path streaming. Basic idea: Separate the elements with the same

tag name to different streams Tag+level: elements with the same tag and level

are grouped together Prefix path: elements with the same root-to-node

path are grouped together

Page 39: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

39

Two Refined Streaming Schemes(1) Tag + Level: elements with the same tag and level are grouped together.

Document

a1a

Level

1:

Level2:

Level1:

2:

3:

a1

a2 a3 b2

d2 b1d3

c2

d1

c1

4:

a2 , a3

b2b

Level3:

Level2:

b1

C1, C2c Level4:

d Level3: d1 ,d2,d3

Page 40: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

40

Two Refined Streaming Schemes(2) Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together.

Document

a1a

Level

1:

2:

3:

a1

a2 a3 b2

d2 b1d3

c2

d1

c1

4:

a2 , a3

b2b

a/a/b:

a/b:

b1

C1c

dd1 , d2

a:

a/a:

C2

a/a/b/c:

a/b/d/c:

d3

a/a/d:

a/b/d:

Page 41: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

41

Two benefits of refined streaming schemes(1) (1) Enlarge the optimal query classes For example, considering the document and query, previous

algorithms: TwigStack and TwigStackList will output one useless solution <s1,t1>.

But based on tag+level, <s1,t1> is not output, since we know there is no figure elements in level 2.

QueryDocument

s1

t1 s2

t2 f1figure

S1

t1

S2Level2:

Level1:

t2

f1

Level3:

Level2:

Level2:

Level

1:

2:

3:

Section

title

figure

title

Section

Page 42: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

42

Two benefits of refined streaming schemes(2) (2) Skip irrelevant elements For the document and query, since there is no title elements in level 3,

we may skip reading all figure elements in level 3.

Document :

s1

f1

t1

f2 fn ……

Level 1:

Level 2:

Level 3:

Section

title figure

Query:

Page 43: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

43

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 44: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

44

A general algorithm: iTwigJoin We propose a general algorithm, called iTwigJoin , which can be used on various data streaming schemes.

Our key idea is to classify all current head elements to three classes: Subtree-matching Useless Blocked

Page 45: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

45

Classifying Head Elements Subtree-Matching Element

Element e of tag E is called a subtree-matching element for query Q e is in a match to QE (QE is the sub-tree of Q rooted at E);

and NOT in any future match to QP where P is the parent of E

in Q Useless Element

Element e is called a useless element if e is not in any future match to QE.

Blocked Element An element which is neither subtree-matching nor useless

Page 46: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

46

Example 1: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

useless a2

blocked

: head element

a

b

c

d

Page 47: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

47

Example 1: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

useless a2

blocked d1

: head element

a

b

c

d

Page 48: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

48

Example 1: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

useless a2

blocked d1,a1,b1,b2,c1

: head element

a

b

c

d

Page 49: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

49

Example 2: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

useless a1,a2

blocked

: head element

a

b

c

d

Page 50: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

50

Example 2: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

useless a1,a2,b2

blocked

: head element

a

b

c

d

Page 51: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

51

Example 2: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

d1

useless a1,a2, b2

blocked

: head element

a

b

c

d

Page 52: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

52

Example 2: Classifying Head Elements (Tag+Level)a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

A

D B

C

D:Q1:

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 Subtree-matching

d1

useless a1,a2 , b2

blocked c1,b1

: head element

a

b

c

d

Page 53: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

53

Classifying Head Elements

•Useless element can be discarded safely

•sub-tree Matching element is pushed to the corresponding stack

•Blocked element causes problem

•CANNOT be discarded because it may cause loss of results

•CANNOT be pushed to stack because it may cause useless results

•When all head elements are blocked; optimal holistic matching CANNOT be guaranteed

• We push blocked elements into stack, which may result in useless intermediate results in some cases.

Page 54: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

54

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3

a

b

c

d

Page 55: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

55

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

Page 56: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

56

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca Since a2 is a useless

element, we discard a2 and scan a3.

Page 57: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

57

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

Now all elements are blocked. We push a1 to stack.

a1

Page 58: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

58

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

d1

Page 59: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

59

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1>

Page 60: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

60

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1>

Since a3 is a sub-tree matching element, we

push a3 to stack.

a3

Page 61: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

61

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1a3

d2

Output intermediate path solutions:

<a1,d1>

Page 62: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

62

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>

a3

Page 63: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

63

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>

a3

b1

Page 64: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

64

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>

a3

b1

c1

Page 65: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

65

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>,

<a3,b1,c1>

a3

b1

Page 66: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

66

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

b2

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>,

<a3,b1,c1>

Page 67: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

67

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>,

<a3,b1,c1>

b2

c2

Page 68: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

68

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>,

<a3,b1,c1>,<a1,b2,c2>

b2

Page 69: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

69

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> , <a1,d2>,<a3,d2>,

<a3,b1,c1>,<a1,b2,c2>

b2d3

Page 70: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

70

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

Output intermediate path solutions:

<a1,d1> ,<a1,d2>,<a1,d3>,<a3,d2>,<a3,b1,c1>, <a1,b2,c2>,

b2

Page 71: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

71

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1

b2

The 1th final solution:<a1,d1,b2,c2>

Output intermediate path solutions:

<a1,d1> ,<a1,d2>,<a1,d3>,<a3,d2>,<a3,b1,c1>, <a1,b2,c2>,

Page 72: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

72

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1a3

b2

The 2th final solution:<a1,d2,b2,c2>

Output intermediate path solutions:

<a1,d1> ,<a1,d2>,<a1,d3>,<a3,d2>,<a3,b1,c1>, <a1,b2,c2>,

Page 73: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

73

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1a3

b2

The 3th final solution:<a1,d3,b2,c2>

Output intermediate path solutions:

<a1,d1> ,<a1,d2>,<a1,d3>,<a3,d2>,<a3,b1,c1>, <a1,b2,c2>,

Page 74: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

74

An example of iTwigJoin algorithmDocument:

Query: A

D B

C

a1

a2 a3 b2

d2 b1

c2

d3

c1

d1

1:20,1

2:5,2

3:4,3

6:13,2

7:8,3

9:12,3

10:11,4

14:19,2

15:18,3

16:17,4

a1

Level2:

Level1:

a2 , a3

b2

Level3:

Level2:

b1

C1, C2Level4:

Level3: d1 ,d2,d3 b

dca

a1a3

b2

The 4th final solution:<a3,d2,b1,c1>

Output intermediate path solutions:

<a1,d1> ,<a1,d2>,<a1,d3>,<a3,d2>,<a3,b1,c1>, <a1,b2,c2>,

Page 75: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

75

Optimal classes of iTwigJoin for three streaming schemes

A

B C

Tag Streaming A-D only pattern

Optimal classStreaming scheme

A-D only

Page 76: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

76

A

B C

A

B C

Tag Streaming A-D only pattern

Tag+Level Streaming A-D/P-C only pattern

Optimal classStreaming scheme

A-D/P-C only

A-D only

Optimal classes of iTwigJoin for three streaming schemes

Page 77: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

77

A

B C

A

B C

Tag Streaming A-D only pattern

Tag+Level Streaming A-D/P-C only pattern

Prefix Path Streaming

Optimal classStreaming scheme

A-D/P-C only or 1-Branch node

A-D/P-C only

A-D only

A

B C

A-D/P-C only or 1-Branch

Optimal classes of iTwigJoin for three streaming schemes

Page 78: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

78

A

B C

A

B C

Tag Streaming A-D only pattern

Tag+Level Streaming A-D/P-C only pattern

Prefix Path Streaming A-D/P-C only or 1-Branch

Optimal classStreaming scheme

A-D/P-C only or 1-Branch node

A-D/P-C only

A-D only

A

B C

More refined

Optimal class:Larger

Optimal classes of iTwigJoin for three streaming schemes

Page 79: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

79

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 80: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

80

Experiments

Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal

Page 81: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

81

Experiments: I/O Performance

0

20000004000000

60000008000000

1000000012000000

14000000

Tree1 Tree2 Tree3 Tree4 Tree5

Ele

men

t Sca

nned

TwigStack TwigStackLst Tag+Level Prefix

Tree1: A-D only

Tree2: P-C only

Tree3: P-C only

Tree4: 1-branchnode

Tree5: 1-branchnode

By pruning irrelevant streams, PPS usually scan the fewest number of elements.

Page 82: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

82

Experiments: Number of Intermediate PathsTree1: A-D only

Tree2: P-C only

Tree3: P-C only

Tree4: 1-branchnode

Tree5: 1-branchnode1

10

100

1000

10000

100000

Tree1 Tree2 Tree3 Tree4 Tree5In

term

ed

iate

Pa

ths

Ou

tpu

tTwigStack TwigStackLst Tag+Level Prefix

2. For treebank 5, there is no matching results. So Tag+Level and PPS do not output any intermediate results.

1. Tag+level and PPS output less intermediate results than TwigStack and TwigStackList in TreeBank data.

Page 83: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

83

Experiments: Running Time

XMark1: Path Pattern,

XMark2: A-D only,

XMark3: P-C only,

XMark4: 1-branchnode,

XMark5: Non-optimal,

0

2

4

68

10

12

14

XMark1 XMark2 XMark3 XMark4 XMark5

Exe

cutio

n T

ime

(Sec

ond)

TwigStack TwigStackLst Tag+Level Prefix

Tag+level and PPS have better performance than TwigStack and TwigStackList in XMark data.

Page 84: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

84

Outline Background

Define our problem: XML twig pattern matching Previous two algorithms: TwigStack and TwigStackList

Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching algorithm: iTwigJoin

Experiments Conclusion

Page 85: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

85

Conclusions We develop a general algorithm to perform

holistic twig join on Tag+Level and PPS streaming schemes.

We identify two I/O optimal classes for Tag+Level and PPS streaming schemes.

Since our experiments show that Tag+Level streaming schemes can guarantee to produce very few useless intermediate results in most cases, we recommend to use Tag+Level scheme for efficient XML twig pattern matching.

Page 86: On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.

On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

86

END

Thank you!

Q & A


Recommended