+ All Categories
Home > Documents > Michael Schmidt Stefanie Scherzinger Christoph Koch Saarland University Database Group

Michael Schmidt Stefanie Scherzinger Christoph Koch Saarland University Database Group

Date post: 26-Jan-2016
Category:
Upload: luce
View: 16 times
Download: 0 times
Share this document with a friend
Description:
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation. Michael Schmidt Stefanie Scherzinger Christoph Koch Saarland University Database Group Saarbrücken, Germany. - PowerPoint PPT Presentation
Popular Tags:
34
Combined Static and Dynamic Combined Static and Dynamic Analysis for Effective Buffer Analysis for Effective Buffer Minimization in Minimization in Streaming XQuery Evaluation Streaming XQuery Evaluation Michael Schmidt Stefanie Scherzinger Christoph Koch Saarland University Database Group Saarbrücken, Germany 2007 IEEE 23rd International Conference on Data Engineering - April 17, 2007
Transcript
Page 1: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

Combined Static and Dynamic Combined Static and Dynamic Analysis for Effective Buffer Analysis for Effective Buffer

Minimization in Streaming Minimization in Streaming XQuery EvaluationXQuery Evaluation

Michael Schmidt Stefanie Scherzinger Christoph Koch

Saarland University Database GroupSaarbrücken, Germany

2007 IEEE 23rd International Conference on Data Engineering - April 17, 2007

Page 2: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

22

OutlineOutline

I. Streaming XQuery EvaluationI. Streaming XQuery Evaluation– Motivation and RequirementsMotivation and Requirements– Desiderata to streaming and in-memory XQuery EnginesDesiderata to streaming and in-memory XQuery Engines– Existing ApproachesExisting Approaches

II. Combining Static and Dynamic Buffer MinimizationII. Combining Static and Dynamic Buffer Minimization– Query NormalizationQuery Normalization– The Concept of RolesThe Concept of Roles– Active Garbage CollectionActive Garbage Collection– System ArchitectureSystem Architecture– OptimizationsOptimizations

III. The GCX XQuery EngineIII. The GCX XQuery Engine– Prototype ImplementationPrototype Implementation– Benchmark ResultBenchmark Result

IV. SummaryIV. Summary

Page 3: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

33

Motivation and Motivation and RequirementsRequirements

Growing importance of streaming XML processing Growing importance of streaming XML processing comes along with the profileration of the WWWcomes along with the profileration of the WWW

Streams may arrive at very high ratesStreams may arrive at very high rates

storing incoming data to disk often unfeasiblestoring incoming data to disk often unfeasible

Main memory DOM tree representation of XML Main memory DOM tree representation of XML documents very space-consumingdocuments very space-consuming

buffer management becomes buffer management becomes thethe key prerequisite to key prerequisite to performanceperformance

Problem becomes even more urgent when evaluating Problem becomes even more urgent when evaluating (powerful fragments of) XQuery rather than simple (powerful fragments of) XQuery rather than simple filters on data streamsfilters on data streams

Streaming techniques very useful for in-memory Streaming techniques very useful for in-memory XQuery engingesXQuery enginges

I.

Page 4: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

44

Desiderata for in-Desiderata for in-memory XQuery memory XQuery EnginesEngines

(1)(1) Only buffer data that is relevant for Only buffer data that is relevant for query evaluationquery evaluation

(2)(2) Avoid multiple copies of the data in Avoid multiple copies of the data in main memorymain memory

(3)(3) Do not keep data buffered longer Do not keep data buffered longer than necessarythan necessary

Claim:Claim: Combination of static and dynamicCombination of static and dynamic

analysis required to satisfy all desiderataanalysis required to satisfy all desiderata

I.

Page 5: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

55

(1)(1) Only buffer data that is relevant for Only buffer data that is relevant for query evaluationquery evaluation

Document ProjectionDocument Projection Statical query analysisStatical query analysis Detect parts of the document that are Detect parts of the document that are

relevant to query evaluationrelevant to query evaluation Project away those parts of the document Project away those parts of the document

that are that are notnot relevant to query evaluation relevant to query evaluation

Existing Approaches Existing Approaches (1)(1)

A. Marian and J. Siméon“Projecting XML Documents”In Proc. VLDB’03, pages 213–224, 2003.

S. Bréssan, B. Catania, Z. Lacroix, Y. G. Li and A. Maddalena “Accelerating Queries by Pruning XML Documents”TKDE, 54(2):211–240, 2005.

V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen“Type-Based XML Projection”In Proc. VLDB’06, 2006.

I.

Page 6: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

66

Existing Approaches Existing Approaches (2)(2)

Document ProjectionDocument Projection

<q> {for $b in /bib/bookwhere ($b/author=“A. Turing” and fn:exists($b/price))return $b/title} </q>

XQuery

Projection Paths{ /bib/book, /bib/book/author/

dos::node(), /bib/book/price, /bib/book/title/

dos::node()}

bib

book

author price title

book

author price title

… … … …

article

… … …isbn

isbn

… … … …

XML document

I.

dos:=descendant-or-selfdos:=descendant-or-self

Page 7: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

77

Existing Approaches Existing Approaches (3)(3)(2)(2) Avoid multiple copies of the data in main Avoid multiple copies of the data in main

memorymemory(3)(3) Do not keep data buffered longer than Do not keep data buffered longer than

necessarynecessary

Hard to satisfy both paradigms in combinationHard to satisfy both paradigms in combination

<q> { for $x1 in //book return for $x2 in //* return for $x3 in //article return <node/>} </q>

XQuery Two approaches:

(1) Single DOM-tree

(2) Buffers for variables

I.

Page 8: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

88

The Big PictureThe Big PictureII.

XQuery

NormalizedXQuery

ProjectionTree

Roles

Buffer(nodes annotated

with roles)

input stream

Evaluator

output stream

RewrittenXQuery

(role updates)

transformation, extraction

input, output

communication

variable bindings

role removals, active garbage collection

Page 9: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

99

Query NormalizationQuery Normalization

(1)(1) Rewriting where-expressions to if-Rewriting where-expressions to if-statementsstatements

(2)(2) Pushing down if-statementsPushing down if-statements

<r> { for $b in /bib where (fn:exists($b/book)) return <books>{ $b/book }</books>} </r>

<r> { for $b in /bib return ( if (fn:exists($b/book)) then <books> else (),

if (fn:exists($b/book)) then $b/book else (),

if (fn:exists($b/book)) then </books> else () )} </r>

II.

Page 10: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1010

Deriving RolesDeriving Roles

<r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title )} </r>

/bib

/* /book

/

/title/dos::node()

/price[1]

dos::node()

rr11 //

rr22 /bib/bib $bib$bib

rr33 /bib/*/bib/* $x$x

rr44 /bib/*/price[1]/bib/*/price[1] $x/price$x/price

rr55 /bib/*/dos::node()/bib/*/dos::node() $x$x

rr66 /bib/book/bib/book $b$b

rr77 /bib/book/title//bib/book/title/dos::node()dos::node()

$b/title$b/title

II.

Page 11: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1111

Assigning RolesAssigning Roles

Matching document nodes get assigned roles when Matching document nodes get assigned roles when projected into the bufferprojected into the buffer

Roles assigned on-the-fly while reading the inputRoles assigned on-the-fly while reading the input Nodes without roles and role-carrying ancestors need not Nodes without roles and role-carrying ancestors need not

to be buffered (projection)to be buffered (projection)

bib

book

authortitle

{ r2 }

{ r3, r5, r6 }

{ r5 }{ r5, r7 }

rr1 1 / /

rr22 /bib /bib

rr33 /bib/* /bib/*

rr44 /bib/*/price[1] /bib/*/price[1]

rr55 /bib/*/dos::node() /bib/*/dos::node()

rr66 /bib/book /bib/book

rr77

/bib/book/title/dos::node()/bib/book/title/dos::node()

XML documentRoles

II.

Page 12: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1212

Inserting Role UpdatesInserting Role Updates

<r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title)} </r>

<r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r>

rr11 / /

rr22 /bib /bib $bib$bib

rr33 /bib/* /bib/* $x$x

rr44 /bib/*/price[1] /bib/*/price[1]

$x/price$x/price

rr55 /bib/*/dos::node() /bib/*/dos::node() $x$x

rr66 /bib/book /bib/book $b$b

rr77 /bib/book/title/dos::node() /bib/book/title/dos::node() $b/title$b/title

II.

Page 13: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1313

Active Garbage Active Garbage CollectionCollection

<r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r>

Buffer:

Output stream:

Input stream:

<bib>

<book>

<title/><author/>

</book>…

<r><book>

<title/><author/>

</book>

bib

book

title

{r2}

{r3 , r5 , r6}

{r5 , r7} author

{r5}

{r5 , r6}

{r7} {}

{r6}

II.

Page 14: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1414

<r> { for $bib in /bib return (for $_1 in $bib/book (return $_1/book, signOff($_1/book/dos::node(),r2)), signOff($bib,r1))} </r>

<r> { for $bib in /bib return for $_1 in $bib/book return $_1/book} </r>

OptimizationsOptimizations

Rewrite path steps to for-Rewrite path steps to for-expressionsexpressions

Use aggregated rolesUse aggregated roles Remove redundant rolesRemove redundant roles

>r} < for $bib in /bib return $bib/book

/> {r<

>r} < for $bib in /bib) return $bib/book, signOff($bib,r1),

signOff($bib/book/dos::node(),r2)(/> {r<

II.

Page 15: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1515

GGarbage arbage CCollected ollected XXQueryQuery Implemented in C++ for a fragment of composition-free Implemented in C++ for a fragment of composition-free

XQueryXQuery– Arbitrary nested single step for-loopsArbitrary nested single step for-loops– FWR-expressionsFWR-expressions– Child and descendant axesChild and descendant axes– Node-tests for tags, wildcards, node(), text()Node-tests for tags, wildcards, node(), text()– If-expressions with If-expressions with andand, , oror, , notnot, , fn:existsfn:exists– Let/some-expressions and aggregations not yet supportedLet/some-expressions and aggregations not yet supported– No support for attributes (no restriction)No support for attributes (no restriction)

Open Source (Open Source (BBerkeley erkeley SSoftware oftware DDistribution Licence)istribution Licence)

GCX project page:GCX project page:http://www.infosys.uni-sb.de/projects/streams/gcx/index.phphttp://www.infosys.uni-sb.de/projects/streams/gcx/index.php

GCX download page:GCX download page:http://www.infosys.uni-sb.de/software/gcx/http://www.infosys.uni-sb.de/software/gcx/

III.The GCX XQuery The GCX XQuery EngineEngine

Page 16: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1616

Benchmark Results (1)Benchmark Results (1)

Time and memory consumptionTime and memory consumption Queries and documents from the XMark BenchmarkQueries and documents from the XMark Benchmark Queries and documents modified to match the supported Queries and documents modified to match the supported

fragmentfragment 3GHz CPU Intel Pentium IV with 2GB RAM 3GHz CPU Intel Pentium IV with 2GB RAM SuSe Linux 10.0, J2RE v1.4.2 for Java-based systemsSuSe Linux 10.0, J2RE v1.4.2 for Java-based systems Time limit: 1 hourTime limit: 1 hour Benchmarks against the following systemsBenchmarks against the following systems

– FluXFluXJava in-memory engine for streaming XQuery evaluation.Java in-memory engine for streaming XQuery evaluation.

– MonetDB v4.12.0/XQuery v0.12.0MonetDB v4.12.0/XQuery v0.12.0A A secondary storagesecondary storage engine written in C++. Loading of the engine written in C++. Loading of the document is included in time measurements.document is included in time measurements.

– QizX/open v1.1QizX/open v1.1Free in-memory XQuery engine written in Java.Free in-memory XQuery engine written in Java.

– Saxon v8.7.1Saxon v8.7.1Free in-memory XQuery engine written in Java.Free in-memory XQuery engine written in Java.

III.

Page 17: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1717

Benchmark Results (2)Benchmark Results (2)

<query1> { for $s in /site return  for $p in $s/people return   for $pe in $pe/person return   if ($pe/person_id="person0")   then <result>{ $pe/name }</result>   else ()}</query1>

XMark Q1:

0

2

4

6

8

10

12

14

16

10MB 50MB 100MB 200MB

GCX

FluxQuery

MonetDB

Saxon

Qizx/open

Running time (s)

III.

Page 18: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1818

Benchmark Results (3)Benchmark Results (3)

0

100

200

300

400

500

600

700

800

900

1000

10MB 50MB 100MB 200MB

GCX

FluxQuery

MonetDB

Saxon

Qizx/open

Memory Consumption (MB)

<query1> { for $s in /site return  for $p in $s/people return   for $pe in $pe/person return   if ($pe/person_id="person0")   then <result>{ $pe/name }</result>   else ()}</query1>

XMark Q1:

III.

Page 19: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

1919

Benchmark Results (4)Benchmark Results (4)

<query8> {  for $root in (/) return  for $site in $root/site return  for $people in $site/people return  for $person in $people/person return    <item> { ( <person>{ $person/name }</person>,      <items_bought> {      for $site2 in $root/site return      for $cas in $site2/closed_auctions return      for $ca in $cas/closed_auction return         for $buyer in $ca/buyer return         if ($buyer/buyer_person=$person/person_id)         then <result> { $ca } </result>         else () } </items_bought> ) } </item> } </query8>

XMark Q8:

III.

Page 20: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2020

Benchmark Results (5)Benchmark Results (5)

0

500

1000

1500

2000

2500

3000

3500

10MB 50MB 100MB 200MB

GCX

FluxQuery

MonetDB

Saxon

Qizx/open

XMark Q8

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

10MB 50MB 100MB 200MB

Running time (s)

Memory Consumption (MB)

Failure for 100MB: MonetDB – Failure for 200MB: GCX, FluxQuery, MonetDB

III.

Page 21: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2121

SummarySummary

Combination of Combination of static and dynamicstatic and dynamic buffer minimization buffer minimization RolesRoles are derived from the XQuery and assigned to are derived from the XQuery and assigned to

matching document nodes in the preprojection phasematching document nodes in the preprojection phase XQuery expression statically rewritten: at runtime, XQuery expression statically rewritten: at runtime,

signOff-statementssignOff-statements cause buffered nodes to lose roles cause buffered nodes to lose roles An An active garbage collectionactive garbage collection mechanism removes nodes mechanism removes nodes

from buffers that have lost their last rolefrom buffers that have lost their last role Document projection integrated in the role conceptDocument projection integrated in the role concept Technique behaves very well for composition-free Technique behaves very well for composition-free

XQuery w.r.t. execution time and memory consumptionXQuery w.r.t. execution time and memory consumption Applicable in streaming contexts, but also useful for Applicable in streaming contexts, but also useful for

common in-memory XQuery enginescommon in-memory XQuery engines

IV.

Page 22: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2222

Thank you for your attention!Thank you for your attention!

Page 23: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

Z. Bar-Yossef, M. Fontoura, and V. Josifovski“On the Memory Requirements of XPath

Evaluation over XML Streams”

In Proc. PODS’04, pages 177–188, 2004

M. Benedikt, W. Fan, and F. Geerts“XPath Satisfiability in the Presence of DTDs”In Proc. PODS, pages 25–36, 2005

V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen“Type-Based XML Projection”In Proc. VLDB’06, 2006

S. Bréssan, B. Catania, Z. Lacroix, Y. G. Li and A. Maddalena

“Accelerating Queries by Pruning XML Documents”

TKDE, 54(2):211–240, 2005

L. Fegaras, R. Dash, and Y. Wang“A Fully Pipelined XQuery Processor”In XIME-P, 2006

L. Fegaras, D. Levine, S. Bose, and V. Chaluvadi“Query Processing of Streamed XML Data”In Proc. CIKM 2002, pages 126–133, 2002

T. J. Green, G. Miklau, M. Onizuka, and D. Suciu “Processing XML Streams with Deterministic

Automata”In Proc. ICDT’03, pages 173–189, 2003

C. Koch“On the complexity of nonrecursive XQuery and

functional query languages on complex values”

ACM Transactions on Database Systems, 31(4), 2006

C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier

“Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams”

In Proc. VLDB’04, pages 228–239, 2004

X. Li and G. Agrawal“Efficient evaluation of XQuery over

streaming data”In Proc. VLDB’05, pages 265–276, 2005

A. Marian and J. Siméon“Projecting XML Documents”In Proc. VLDB’03, pages 213–224, 2003

D. Olteanu, H. Meuss, T. Furche, and F. Bry“XPath: Looking Forward”In EDBT 02: Proceedings of the Worshops XMLDM,

MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers,pages 109–127, 2002

D. Olteanu, T. Kiesling, and F. Bry“An Evaluation of Regular Path Expressions

with Qualifiers against XML Streams”In Proc. ICDE’03, page 702, 2003

H. Su, E. A. Rundensteiner, and M. Mani“Semantic Query Optimization for XQuery

over XML Streams”In Proc. VLDB, pages 277–288, 2005

P. R. Wilson“Uniprocessor Garbage Collection

Techniques”In Proc. IWMM’92, pages 1–42, 1992

Page 24: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2424

Additional ResourcesAdditional Resources

Page 25: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2525

Full Benchmark Full Benchmark ResultsResults

  GCX FluxQuery Galax MonetDB Saxon Qizx/open

Q1

10MB 0.18s / 1.2MB 1.59s / 50MB 5.45s / 186MB 0.86s / 30MB 1.48s / 80MB 1.20s / 38MB

50MB 0.92s / 1.2MB 3.96s / 111MB 42.33s / 880MB 3.69s / 98MB 4.29s / 292MB 3.74s / 195MB

100MB 1.87s / 1.2MB 6.94s / 111MB 02:07m / 1,8GB 7.19s / 225MB 7.96s / 547MB 6.56s / 285MB

200MB 3.53s / 1.2MB 12.27s / 111MB timeout 13.60s / 244MB 14.30s / 973MB 11.82s / 480MB

Q6

10MB 0.34s / 1.2MB n/a 7.66s / 240MB 0.98s / 29MB 1.73s / 82MB 1.56s / 33MB

50MB 1.68s / 1.2MB n/a 57.98s / 1.2GB 5.06s / 111MB 5.78s / 292MB 6.13s / 169MB

100MB 3.33s / 1.2MB n/a 5:08m / 2GB 9.94s / 253MB 10.85s / 622MB 11.74s / 484MB

200MB 6.42s / 1.2MB n/a timeout 19.95s / 337MB 20.14s / 1.2GB 20.33s / 805MB

Q8

10MB 13.15s / 9.8MB 18.04s / 128MB 01:04m / 377MB 02:56m / 407MB 6.61s / 145MB 9.89s / 148MB

50MB 05:13m / 43MB 06:51m / 169MB 33:08m / 1.8GB 03:26m / 1.35GB 02:02m / 352MB 03:38m / 265MB

100MB 22:07m / 86MB 27:01m / 216MB timeout - 08:39m / 650MB 14:27m / 397MB

200MB timeout timeout timeout - 32:43m / 1.15GB 52:05m / 636MB

Q13

10MB 0.17s / 1.2MB 1.60s / 52MB 5.92s / 182MB 0.80s / 31MB 1.53s / 48MB 1.26s / 28MB

50MB 0.85s / 1.2MB 3.98s / 111MB 43.91s / 899MB 3.64s / 98MB 4.45s / 292MB 3.85s / 195MB

100MB 1.69s / 1.2MB 7.00s / 111MB 02:04m / 1.8GB 7.34s / 224MB 8.35s / 547MB 6.81s / 285MB

200MB 3.24s / 1.2MB 12.33s / 111MB timeout 13.52s / 271MB 15.02s / 1.05GB 12.30s / 480MB

Q20

10MB 0.25s / 1.2MB 1.65s / 48MB 6.95s / 215MB 0.85s / 34MB 1.65s / 62MB 1.43s / 39MB

50MB 1.24s / 1.2MB 4.19s / 111MB 53.08s / 1,5GB 4.17s / 120MB 4.90s / 292MB 4.18s / 195MB

100MB 2.48s / 1.2MB 7.37s / 111B 03:14m / 2GB 8.47s / 247MB 9.13s / 622MB 8.71s / 350MB

200MB 4.74s / 1.2MB 13.14s / 111MB timeout 16.40s / 296MB 16.58s / 1.15GB 15.80s / 628MB

Page 26: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2626

Benchmark Queries (1)Benchmark Queries (1)

<query1> { for $s in /site return  for $p in $s/people return   for $pe in $pe/person return   if ($pe/person_id="person0")   then <result>{ $pe/name }</result>   else ()}</query1>

<query6> {  for $site in //site return    for $regions in $site/regions return      $regions//item} </query6>

Page 27: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2727

Benchmark Queries (2)Benchmark Queries (2)

<query8> {  for $root in (/) return  for $site in $root/site return  for $people in $site/people return  for $person in $people/person return    <item> { ( <person>{ $person/name }</person>,      <items_bought> {      for $site2 in $root/site return      for $cas in $site2/closed_auctions return      for $ca in $cas/closed_auction return         for $buyer in $ca/buyer return         if ($buyer/buyer_person=$person/person_id)         then <result> { $ca } </result>         else () } </items_bought> ) } </item> } </query8>

Page 28: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2828

Benchmark Queries (3)Benchmark Queries (3)

<query13> {  for $site in /site return    for $regions in $site/regions return      for $australia in $regions/australia return        for $item in $australia/item return          <item> {          (            <name> { $item/name } </name>,            <desc> { $item/description } </desc>          )          } </item>} </query13>

Page 29: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

2929

Benchmark Queries (4)Benchmark Queries (4)

<query20> {  for $site in /site return    for $people in $site/people return      for $person in $people/person return        if (fn:not(fn:exists($person/person_income)))        then $person        else ()} </query20>

Page 30: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

3030

Buffer Plot (1)Buffer Plot (1)

<query6> {  for $site in //site return    for $regions in $site/regions return      $regions//item} </query6>

Buffer plot for XMark Q6 on 10MB input document

According to the DTD:all regions occur at the

beginning of the document

Page 31: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

3131

Buffer Plot (2)Buffer Plot (2)

<query8> {  for $root in (/) return  for $site in $root/site return  for $people in $site/people return  for $person in $people/person return    <item> { ( <person>{ $person/name }</person>,      <items_bought> {      for $site2 in $root/site return      for $cas in $site2/closed_auctions return      for $ca in $cas/closed_auction return         for $buyer in $ca/buyer return         if ($buyer/buyer_person=$person/person_id)         then <result> { $ca } </result>         else () } </items_bought> ) } </item> } </query8>

Buffer plot for XMark Q8 on 10MB input document

first partition of join partners:

persons

second partition of join partners:

buyers

Page 32: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

3232

Buffer Plot (3)Buffer Plot (3)

<r> {for $bib in /bib return (for $x in $bib/* return if (not(exists($x/price))) then $x else (), for $b in $bib/book return $b/title)} </r>

XQuery

bib

(book|article)*

title

author

price

9 x article + 1 x book

9 x book + 1 x article

Page 33: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

3333

The GCX Runtime The GCX Runtime EngineEngine

StreamPreprojector

BufferManager

Evaluator

XQueryinput stream

output stream

nodes/roles

node lookupgarbage collection

node/eos

signOff($x/π,r)

OK

node/NULL

getNext($x/π)

Buffer

nextNode()

Page 34: Michael Schmidt     Stefanie Scherzinger    Christoph Koch Saarland University Database Group

3434

System ArchitectureSystem Architecture

XQuery

NormalizedXQuery

Evaluator

Buffer(nodes & roles)

role updates

input

input stream

output stream

Stream Preprojector

RewrittenXQuery

(role updates)

ProjectionPaths

Projection DFA(constructed lazily, assigns roles)

Roles


Recommended