XQuery Updates in MonetDB/XQuery

Post on 12-Sep-2021

8 views 0 download

transcript

ADT 2008ADT 2008

Lecture 5Lecture 5

XQuery Updates in MonetDB/XQueryXQuery Updates in MonetDB/XQuery

Stefan ManegoldStefan.Manegold@cwi.nl

http://www.cwi.nl/~manegold/

2

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

• skipping: avoid touching node ranges that cannot contain results

Generate a duplicate-free result in document order • pruning: reduce the context set a-priori• partitioning: single sequential pass over the document

document

List of context nodes

seek

seek scan skip seek scan skip ...

Staircase Join Staircase Join [VLDB03][VLDB03]

3

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Loop-lifted XPath StepsLoop-lifted XPath Steps

Many algorithms have been proposed & studied for XPath evaluation:• Dataguide based, • Structural Join,• Staircase Join, • Holistic Twig Join

IN: sequence of context nodes in (doc order)OUT: sequence of document nodes (unique, in doc order)

4

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Loop-lifted XPath StepsLoop-lifted XPath Steps

In XQuery, expressions generally occur inside FLWR blocks, i.e. inside a for-loop

for $x in doc()//employee $x/ancestor::department

Choice:• call XPath algorithm N times, accessing document and index structures N times.• use a loop-lifted algorithm:

IN: for each iteration, a sequence of context nodesOUT: for each iteration, a sequence of document nodes (per iteration unique, in doc order)

5

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Staircase joinStaircase join

document

List of context nodes

6

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Loop-lifted staircase joinLoop-lifted staircase join

document document

List of context nodes Active stack

Multiple lists of context nodes

Adapt:

pruning, partitioning and skipping rules

to correctly deal with multiple context sets

7

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Loop-lifted staircase joinLoop-lifted staircase join

Results on the 20 XMark queries:

8

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

• 15.09.2008:

•RDBMS back-end support for XML/XQuery (1/2):

•Document Representation (XPath Accelerator, Pre/Post plane)

•XPath navigation (Staircase Join)

• 22.09.2008:

•XQuery to Relational Algebra Compiler:

•Item- & Sequence- Representation

•Efficient FLWoR Evaluation (Loop-Lifting)

•Optimization

• 29.09.2008:

•RDBMS back-end support for XML/XQuery (2/2):

•Updateable Document Representation

•Other (DB-) approaches to XML/XQuery processing

ScheduleSchedule

9

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

What is MonetDB?

• Main-memory based DBMS backend/kernel

• Developed at CWI since 1992

• “Query-intensive” applications

• Data mining

• Data warehousing / decision support

• Multi-media information retrieval (text, images, audio, video, XML, ...)

• XML databases

• GIS

• part of Data Distilleries' products

• CWI spin-off company

• (>100GB) databases at ABN Amro, Postbank, Ohra, Spaarbeleg, FBTO, Centerparcs, Vodafone

• Nowadays: part of SPSS

10

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Motivation (1/2)• Relational DBMS dominate the scene

• Oracle, SQLserver, DB2

• databases a solved problem?

11

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Motivation (1/2)• Relational DBMS dominate the scene

• Oracle, SQLserver, DB2

• databases a solved problem? No!

Problems:

• performance

• new ‘query intensive’ applications (data mining, et al)

• extensibility

• new applications (GIS,text,image,audio,video,XML)

12

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Motivation (2/2)

• are relational DBMS fit for the job?

• developed in end 1970’s begin 1980’s

13

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Motivation (2/2)

• are relational DBMS fit for the job?

• developed in end 1970’s begin 1980’s

• hardware has changed

• CPUs get faster but more vulnerable

• capacity and bandwidth follows Moore’s law

• latency becomes a bottleneck (I/O and RAM)

14

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Motivation (2/2)

• are relational DBMS fit for the job?

• developed in end 1970’s begin 1980’s

• hardware has changed

• CPUs get faster but more vulnerable

• capacity and bandwidth follows Moore’s law

• latency becomes a bottleneck (I/O and RAM)

• applications have changed

• RDBMS tuned for transaction processing

• not query-intensive

• only business domain

15

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Transactions (OLTP)Transactions (OLTP)

16

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

OLAP, Data MiningOLAP, Data Mining

17

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

How is MonetDB Different

• full vertical fragmentation: always!• everything in binary (2-column) tables

• saves you from table scan hell in OLAP and Data Mining

• the RISC approach to databases• simple data model, simple query language

• don’t need (to pay for) a buffer manager => manage virtual memory

• explicit transaction management => DIY approach to ACID

• CPU and memory cache optimized• programming team experienced in main memory DBMS techniques

• use of scientific programming optimizations (loop unrolling)

•Cache conscious data structures and algorithms

18

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB: Shopping ListMonetDB: Shopping List

• A quantum leap in performance requires a quantum leap in technology (and risk)

• Better support for non-administrative applications, using:• Multi-model database kernel support• Extensible data types, operators, accelerators• Database hot-set is memory resident (but scale to TB)• Use simple data structures• Index management should be automatic• Algebraic language as the computational model• Query optimization = strategic + tactic + operational optimization• Dynamic optimization, parallelism, JIT-compile-link-run• Cooperative (application) transaction management• Do not replicate the operating system

19

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Storing Relations in MonetDBStoring Relations in MonetDB

20

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Relational MappingRelational Mapping

21

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Object-Oriented MappingObject-Oriented Mapping

22

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Hash tables,T-trees,R-trees,...

BAT Data StructureBAT Data Structure

BAT: binary association table

BUN: binary unit

BUN heap: - consecutive memory block (array) - memory-mapped file

23

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

BAT Storage OptimizationsBAT Storage Optimizations

Dense ascendingsequence

24

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

type - (physical) type number

enum - enumerated type flag

dense - dense ascending range

sorted - ascending head sorting

constant - all equal values

align - unique sequence id

key - no duplicates on column

set - no duplicates in BAT

hash - accelerator flag

Ttree - accelerator flag

mirrored - head=tail value

count - cardinality

BAT Property ManagementBAT Property Management

25

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XQuery Update Facility 1.0 W3C Candidate Recommendation http://www.c3.org/TR/xquery-update-10/

• Categorize updates into• Value updates• Structural updates

(MonetDB/XQuery does not yet support the latest syntax changes made by W3C; for details see

http://monetdb.cwi.nl/XQuery/Documentation/XQuery-Updates.html)

XML/XQuery UpdatesXML/XQuery Updates

26

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

do replace value of fn:doc("bib.xml")/books/book[1]/pricewith fn:doc("bib.xml")/books/book[1]/price * 1.1

do replace value of fn:doc(“bib.xml”)/books/book[2]/@isbnwith “90­6196­517­9”

do rename fn:doc(“bib.xml”)/books/book[3]/author[1]into “primary­author”

do rename fn:doc(“bib.xml”)/journals/journal[9]/@isbninto “issn”

=> map directly to simple value updates in relational storage

Value UpdatesValue Updates

27

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

do insert attribute isbn {“90­6196­517”}into fn:doc("bib.xml")/books/book[17]

do delete fn:doc(“bib.xml”)/books/book[2]/@wrong

do insert <author>Stefan Manegold</author>after fn:doc(“bib.xml”)/books/book[33]/author[last()]

do replace fn:doc(“bib.xml”)/books/book[44]/author[1]with fn:doc(“bib.xml”)/books/book[33]/author[last()]

do delete fn:doc(“bib.xml”)/books/book[author = “Kermit”]

=> How to implement on pre-/post-encoding?

Structural UpdatesStructural Updates

28

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML/XQuery XML/XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

29

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML/XML/XQuery XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

30

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML/XQuery UpdatesXML/XQuery Updates

31

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML/XQuery UpdatesXML/XQuery Updates

32

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML/XML/XQuery XQuery UpdatesUpdates

StaircaseStaircaseJoinJoin

33

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML Storage RevisitedXML Storage Revisited

N9N8N7

N6N5N4N3N2nullnullN1N0nid

147

null03

30113010229

208

306305224

null-121510110levelsizerid

309308227206145304303222131090levelsizepre

null-12nullnull3

30113010229208147306305224

1510110levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

34

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML Storage RevisitedXML Storage Revisited

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110levelsizerid

309308227206145304303222131090levelsizepre

null-12nullnull3

30113010229208147306305224

1510110levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

122100mappage

rid = pre.swizzle( )

35

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110levelsizerid

36

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110levelsizerid

37

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

MonetDB/XQueryMonetDB/XQueryOur own XML DBMS with (almost..) full XQuery support.• Built purely on an RDBMS, namely MonetDB

• Future: also middleware support (P2P!!) in AmbientDB

Pathfinder compiler & “staircase join” (see later):– Technical University Munich (Torsten Grust, et al.)

– Technical University Twente (Maurice van Keulen, et. al.)

MonetDB High-Performance DBMS– CWI Amsterdam (Peter Boncz, Stefan Manegold, ...)

Useful for:

• Large XML databases!

• Querying XML annotations (multimedia, forensic NFI)

Pathfinder Compiler

RelationalAlgebra

XQuery

RDBMS

(MonetDB)

38

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Current ProjectsCurrent Projects• Value indeces & runtime optimization

• Code freeze, release this week

• Algebraic Query Optimization• Some released, most still in the development version

• Distributed XQuery P2P XQuery• SOAP group communication, XQuery RPC

• VLDB'07 [Zhang, Boncz]

• Benchmarking beyond XMark• ExpDB'06 Workshop [Manegold]

• Support for XML Interval Annotations• XIME-P'06 Workshop [Alink et al.]

• ...

39

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

ConclusionsConclusions• Relational approach can be scalable & fast

• MonetDB/XQuery compares favorably with all other available systems

• Techniques that made it work• Property-driven peephole optimization

Order & other properties

• Loop-lifted XPath steps Evaluate Sets of context nodes in a single pass

• Support for dense (autoincrement) keys Positional lookup

• Background Information & Literaturehttp://monetdb-xquery.orghttp://pathfinder-xquery.org

40

Stefan.Manegold@CWI.nl Lecture 5: XQuery Updates ADT 2008

Exam / TentamenExam / Tentamen

Tuesday October 21 2008

9:45 – 11:45

REC-G S.14