+ All Categories
Home > Documents > On the Memory Requirements of XPath Evaluation over XML Streams

On the Memory Requirements of XPath Evaluation over XML Streams

Date post: 21-Jan-2016
Category:
Upload: phuong
View: 23 times
Download: 0 times
Share this document with a friend
Description:
On the Memory Requirements of XPath Evaluation over XML Streams. Ziv Bar-Yossef Marcus Fontoura Vanja Josifovski IBM Almaden Research Center. Preliminaries: XML. x 0. < conference > < name > PODS < speaker > < name > Josifovski - PowerPoint PPT Presentation
37
On the Memory Requirements of XPath Evaluation over XML Streams Ziv Bar-Yossef Marcus Fontoura Vanja Josifovski IBM Almaden Research Center
Transcript
Page 1: On the Memory Requirements of XPath Evaluation over XML Streams

On the Memory Requirements of XPath

Evaluation over XML Streams

Ziv Bar-YossefMarcus FontouraVanja Josifovski

IBM Almaden Research Center

Page 2: On the Memory Requirements of XPath Evaluation over XML Streams

Preliminaries: XML

<conference> <name> PODS </name>

<speaker> <name> Josifovski </name> <paper_cnt> 1 </paper_cnt> </speaker>

<speaker> <name> Fagin </name> <paper_cnt> 3 </paper_cnt> </speaker></conference>

conference

name

speaker

namepaper_cnt

root

speaker

namepaper_cnt

PODS

JosifovskiFagin1 3

x0

x1

x2

x3

x6

x4 x5 x7

x8

Page 3: On the Memory Requirements of XPath Evaluation over XML Streams

Preliminaries: XPath 1.0

/conference[name = PODS]/speaker[paper_cnt > 1]/name

conference

name

root

DocumentQuery

Result: { x7 }

speaker

namepaper_cnt

= PODS

> 1

conference

name

speaker

namepaper_cnt

root

speaker

namepaper_cnt

PODS

JosifovskiFagin1 3

x0

x1

x2

x3

x6

x4 x5 x7

x8

Page 4: On the Memory Requirements of XPath Evaluation over XML Streams

XML Streams

XML stream: XML document arriving as a one-way stream

Critical resources:

• Memory

• Processing time

Why XML streams?

• For transferring XML between systems

• For efficient access to large XML documents

Page 5: On the Memory Requirements of XPath Evaluation over XML Streams

Streaming XML Algorithms

XFilter and YFilter [Altinel and Franklin 00] [Diao et al 02] X-scan [Ives, Levy, and Weld 00] XMLTK [Avila-Campillo et al 02] XTrie [Chan et al 02] SPEX [Olteanu, Kiesling, and Bry 03] Lazy DFAs [Green et al 03] The XPush Machine [Gupta and Suciu 03] XSQ [Peng and Chawathe 03] TurboXPath [Josifovski, Fontoura, and Barta 04] …

Page 6: On the Memory Requirements of XPath Evaluation over XML Streams

Our Results

Space lower bounds for evaluating XPath on XML streams

A streaming XML algorithm Matches the lower bounds on a large fragment

of the language Uses space sub-linear in the query size rather

than exponential in the query size

Page 7: On the Memory Requirements of XPath Evaluation over XML Streams

Related Work Space complexity of XPath evaluation over non-

streaming XML documents [Gottlob, Koch, Pichler 03], [Segoufin 03]

Space complexity of XPath evaluation over streams of indexed XML data [Choi, Mahoui, Wood 03]

Space complexity of select-project-join queries over relational data streams [Arasu et al 02]

Page 8: On the Memory Requirements of XPath Evaluation over XML Streams

Data Complexity [Vardi 82]

(Q,D) Evaluation function of a query Q on document D.

Q(D) Evaluation function of a fixed query Q on document D.

Data complexity on Q: Complexity of best algorithm for Q on worst D.

Worst-case data complexity: maxQ (complexity of Q).

We characterize the data complexity of Q separately for each Q (not just the worst-case one).

Page 9: On the Memory Requirements of XPath Evaluation over XML Streams

XPath Fragment

1. Queries are subsumption-free

conference

name

root

Query

= PODS name != SIGMOD

conference

root

Query

name != SIGMOD

Not subsumption-free Subsumption-free

Page 10: On the Memory Requirements of XPath Evaluation over XML Streams

XPath Fragment (cont.)

2. Queries are univariate

conference

paper_cnt

root

Query

author_cnt

Query

Not univariate Univariate

<

conference

paper_cnt

root

author_cnt< 30 > 30

Page 11: On the Memory Requirements of XPath Evaluation over XML Streams

XPath Fragment (cont.)

3. Queries consist of conjunctions only

4. Queries are “star-restricted”

Page 12: On the Memory Requirements of XPath Evaluation over XML Streams

Query Frontier Size

1. Frontier at u: u, its siblings, and the siblings of its ancestors.

Theorem 1: For all queries Q in the fragment,

stream-space(Q) = (FrontierSize(Q)).

Definitions:

2. FrontierSize(Q): size of largest frontier.

conference

name

root

Query

speaker

namepaper_cnt

= PODS

> 1

Page 13: On the Memory Requirements of XPath Evaluation over XML Streams

Theorem 2: For all queries Q in the fragment that have at least one “//” node,

stream-space(Q) = (recDepthQ(D)).

Document Recursion Depth

//part

number

root

name

part

numbername

root

name

x0

x1

x3

x4

x4

x6

x7

x2

Definition:

recDepthQ(D): Max number of nodes in D that lie on one root-to-leaf path and “path match” the same node in Q.

Document D

Query Q

part

part

number

x5Compressor12

Refrigerator

456

Page 14: On the Memory Requirements of XPath Evaluation over XML Streams

Document Depth

Definition:

depth(D): Length of longest root-to-leaf path.

part

numbername

root

name

x0

x1

x3

x4

x4

x6

x7

x2

Document D

part

part

number

x5Compressor12

Refrigerator

Theorem 2: For all queries Q in the fragment that have at least one “/” node,

stream-space(Q) = (log depth(D)).456

Page 15: On the Memory Requirements of XPath Evaluation over XML Streams

New algorithm

Theorem 4(a):

For all queries Q in a “Univariate XPath”:

Space: O(|Q| recDepth(D) log depth(D)).Time: O(|D| |Q| recDepth(D)).

Theorem 4(b):

For all queries Q in a subset of our fragment and for non-recursive documents D,Space: O(FrontierSize(Q) log depth(D)).Time: O(|D| FrontierSize(Q)).

Page 16: On the Memory Requirements of XPath Evaluation over XML Streams

Proof of Theorem 1

Fragment:

•“subsumption-free”•“univariate”•Conjunctions only •“star-restricted”

Theorem 1: For all queries Q in the fragment,

stream-space(Q) = (FrontierSize(Q)).

conference

name

root

Query

speaker

namepaper_cnt

= PODS

> 1

Page 17: On the Memory Requirements of XPath Evaluation over XML Streams

Critical DocumentDefinition: Document D is critical for query Q, if:

(1) D matches Q.

(2) If we remove from D any node, it no longer matches Q.

conference

name

root

Query Q

speaker

namepaper_cnt

= PODS

> 1

conference

name

speaker

namepaper_cnt

root

speaker

namepaper_cnt

PODS

JosifovskiFagin1 3

x0

x1

x2

x3

x6

x4 x5 x7

x8

Document D

Page 18: On the Memory Requirements of XPath Evaluation over XML Streams

Main Lemmas

Lemma 1: For all queries Q in the fragment and any critical document D for Q,

stream-space(Q) = (FrontierSize(D)).

Lemma 2: For all queries Q in the fragment, there is a critical document D so that

FrontierSize(D) = FrontierSize(Q).

showproof

Theorem 1: For all queries Q in the fragment,

stream-space(Q) = (FrontierSize(Q)).

Page 19: On the Memory Requirements of XPath Evaluation over XML Streams

One-way Communication Complexity

Alice Bob

x ym

f: (X, Y) Z

f(x,y)

CC(f) = number of communication bits used by the best protocol on the worst-case choice of inputs.

Page 20: On the Memory Requirements of XPath Evaluation over XML Streams

D

Reduction

Alice Bob

stateA()

A : streaming algorithm for Q using space S

stateA()

Theorem: stream-space(Q) >= CC(Q)

Q(D)

Page 21: On the Memory Requirements of XPath Evaluation over XML Streams

D,

Fooling Set Technique

Theorem: For any fooling set T, CC(Q) = (log |T|).

Definition

A set T of partitioned documents is a fooling set for Q if:1. All documents in T match Q.

2. For any two distinct documents D,, D, in T, either D, does not match Q or D, does not match Q.

Partitioned document:

Document prefix Document suffix

Page 22: On the Memory Requirements of XPath Evaluation over XML Streams

Proof of Lemma 1

Lemma 1: For all queries Q in the fragment nd any critical document D for Q,

stream-space(Q) = (FS(D)).

conference

name

root

Query Q

speaker

name

= PODS

> 1

conference

name

root

speaker

namepaper_cnt

Fagin 3

x0

x1

x2

x3

x4

x5

Document D

paper_cnt

PODS

Page 23: On the Memory Requirements of XPath Evaluation over XML Streams

Proof of Lemma 1

For each subset S of Frontier(D), define a partitioned document DS:

S = { x2, x5 }

conference

name

root

Query Q

speaker

name

= PODS

> 1

conference

name

root

speaker

name paper_cnt

Fagin 3

x0

x1

x2

x3

x4

x5

Document DS

paper_cnt

PODS

Page 24: On the Memory Requirements of XPath Evaluation over XML Streams

2. If S T, need: either DST or DTS does not match Q.

Proof of Lemma 1 (cont)

1. For all S, DS matches Q.

Claim: { DS }S is a subset of Frontier(D) is a fooling set.

stream-space(Q) >= log(2FS(D)) = FS(D).

Proof of Claim:

Page 25: On the Memory Requirements of XPath Evaluation over XML Streams

Proof of Claim (example)

conference

name

root

speaker

name paper_cnt

x0

x1

x3x2

x4

x5

Document DT

T = { x4,x5 }

PODS

Document DTS

conference

name

root

speaker

namepaper_cnt

x0

x1

x2

x3

x5x4

Document DS

S = { x2,x5 }

PODS

Fagin

Fagin 3

3conference

root x0

x1

Conference name missing!speaker

name paper_cnt

x3

x4

Fagin 3

name

Fagin

x4x5

Page 26: On the Memory Requirements of XPath Evaluation over XML Streams

Algorithm

Uses the query as an NFA Based on three global data structures

Pointer array Validation array Level array

Matches the lower bounds for a fragment of XPath.

Page 27: On the Memory Requirements of XPath Evaluation over XML Streams

Algorithm Example Run

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>...

aF

1

Level array

Validation array

Pointer array with one entry

/a

/b

$ u0

u1

u2 /c u3

Query: /a[b and c]Input XML

Page 28: On the Memory Requirements of XPath Evaluation over XML Streams

Algorithm Example Run

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>... a

F

1

$

bF

2

a

cF

2

Index 0

Index 1

Query: /a[b and c]Input XML

/a

/b

$ u0

u1

u2 /c u3

Page 29: On the Memory Requirements of XPath Evaluation over XML Streams

Algorithm Example Run

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>...

Input XML

aF

1

$

Query: /a[b and c]

bF

2

a

cF

2

Index 0

Index 1

bF

2

c

cF

2

/a

/b

$ u0

u1

u2 /c u3

Page 30: On the Memory Requirements of XPath Evaluation over XML Streams

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>... a

F

1

$

bF

2

a

cF

2

Index 0

Index 1

bF

2

c

cF

2

bF

2

/c

cT

2

Algorithm Example RunQuery: /a[b and c]Input XML

/a

/b

$ u0

u1

u2 /c u3

Page 31: On the Memory Requirements of XPath Evaluation over XML Streams

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>... a

F

1

$

bF

2

a

cF

2

Index 0

Index 1

bF

2

c

cF

2

bF

2

b

cT

2

Algorithm Example Run

bF

2

/c

cT

2

Query: /a[b and c]Input XML

/a

/b

$ u0

u1

u2 /c u3

Page 32: On the Memory Requirements of XPath Evaluation over XML Streams

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>... a

F

1

$

bF

2

a

cF

2

Index 0

Index 1

bF

2

c

cF

2

bF

2

b

cT

2

Algorithm Example Run

bF

2

/c

cT

2

bT

2

/b

cT

2

Query: /a[b and c]Input XML

/a

/b

$ u0

u1

u2 /c u3

Page 33: On the Memory Requirements of XPath Evaluation over XML Streams

<a> <c>c1</c> <b>b1</b></a>...

<a> <c>c1</c> <b>b1</b></a>... a

F

1

$

bF

2

a

cF

2

bF

2

c

cF

2

bF

2

b

cT

2

Algorithm Example Run

bF

2

/c

cT

2

bT

2

/b

cT

2

aT

1

/aReturn

TRUE

Query: /a[b and c]Input XML

/a

/b

$ u0

u1

u2 /c u3

Page 34: On the Memory Requirements of XPath Evaluation over XML Streams

Conclusion: our Contributions

Space lower bounds on the instance data complexity of XPath on XML streams:1. In terms of Query Frontier Size

2. In terms of Document Recursion Depth

3. In terms of Document Depth

A streaming XML algorithm Matches the lower bounds on a fragment of the

language Does not use finite-state automata

Page 35: On the Memory Requirements of XPath Evaluation over XML Streams

XPath 1.0

C

N

S

N P

$

S

N P

PODS

Josifovski Fagin1 3

x0

x1

x2

x3 x6

x4 x5x7 x8

/conference/name

/C

/N

$ u0

u1

u2

DQ

Result: { x2 }

Page 36: On the Memory Requirements of XPath Evaluation over XML Streams

XPath 1.0

C

N

S

N P

$

S

N P

PODS

Josifovski Fagin1 3

x0

x1

x2

x3 x6

x4 x5x7 x8

/conference//name

/C

//N

$ u0

u1

u2

D

Q

Result: { x2, x4, x7 }

Page 37: On the Memory Requirements of XPath Evaluation over XML Streams

D 31 1 2 2 3 31 1 2 2 3

Reduction

Alice Bobs1

s2

s3

s4

A : S-space streaming algorithm for Q.

r ¸ 1: integer.

(r = 6)

s0s1 s2 s3 s4 s5 s6

s5

s6

Theorem: S ¸ CC(Qr) / r

Q(D) Q(D)


Recommended