+ All Categories
Home > Documents > On the Path to Efficient XML Queries

On the Path to Efficient XML Queries

Date post: 20-Jan-2016
Category:
Upload: decker
View: 23 times
Download: 0 times
Share this document with a friend
Description:
On the Path to Efficient XML Queries. Andrey Balmin, Kevin Beyer , Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab. New languages = new abilities + new pitfalls. XQuery A new query language designed specifically for XML data SQL / XML - PowerPoint PPT Presentation
28
IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer , Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab
Transcript
Page 1: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

On the Path to Efficient XML Queries

Andrey Balmin, Kevin Beyer, Fatma ÖzcanIBM Almaden Research Center

Matthias NicolaIBM Silicon Valley Lab

Page 2: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

New languages = new abilities + new pitfalls

XQuery

– A new query language designed specifically for XML data

SQL / XML

– Added XML as a data type, including XQuery sequences

– Added XQuery as a sublanguage

Page 3: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Purpose

Teach users of the new XML query languages

Teach the teachers

Share our users’ experiences

Influence languages

Page 4: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Focus

Large databases with many moderate XML documents

Schema flexibility is required

– Many schemas in one collection

– No schema validation used

– Schemas with xs:any

– Documents like Atom Syndication and RSS that allow any extension

Therefore

– Document filtering is primary concern

– Limited type inference

– Any data is possible

Page 5: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Index eligibility

Index Eligibility:

We say that an index I is eligible to answer predicate P of query Q, if for any collection of XML documents D, the following holds:

Q(D) = Q( I( P,D )).

Where I( P,D ) is the set of XML documents produced, by probing index I with predicate P.

This is not as obvious as it is in relational databases.

Page 6: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XML indexes in DB2

Index a linear XPath pattern over a column as a particular datatype

CREATE INDEX index-name ON table(xml-column) USING 'pattern' AS type

pattern ::= namespace-decls? (( / | // ) axis? ( name-test | kind-test ))+

axis ::= @ | child:: | attribute:: | self:: | descendant:: | descendant-or-self::

name-test ::= qname | * | ncname:* | *:ncname

kind-test ::= node() | text() | comment() | processing-instruction()

type ::= varchar | double | date | timestamp

Page 7: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Query pattern index pattern

CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double

Can use the index: more restrictive

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i

Cannot use the index: less restrictive

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@* > 100 ]return $i

Page 8: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Match index and query predicate data type

CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double

Can use the index: numeric predicate and index

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i

Cannot use the index: string predicate

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[lineitem/@price > "100" ]return $i

Page 9: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Data Types for Joins

CREATE INDEX o_custid ON orders(orddoc) USING XMLPATTERN '//custid' AS double

CREATE INDEX c_custid ON customer(cdoc) USING XMLPATTERN '/customer/id' AS double

Cannot use the indexes: unknown comparison type

for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid = $j/idreturn $i

Can use the indexes: at least one cast required

for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid/xs:double(.) = $j/id/xs:double(.)return $i

Page 10: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

SQL/XML Query Functions

XMLQuery

Scalar function that returns an (possibly empty) XQuery sequence for every row

XMLExists

Predicate that returns true iff the XQuery sequence produced is not empty

XMLTable

Produces a table with one row for each item in the row-producing XQuery sequence, and with one column per column-producing XQuery expression. The columns may be XQuery sequences or cast to simple SQL types.

Page 11: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XMLQuery does not filter rows (usually)

Cannot use the index:

SELECT XMLQuery(‘$order//lineitem[ @price > 100 ]

‘ passing orddoc as "order")FROM orders

Can use the index:

VALUES (XMLQuery(’ db2-fn:xmlcolumn("ORDERS.ORDDOC") //lineitem[ @price > 100 ] '))

Can use the index:

db2-fn:xmlcolumn('ORDERS.ORDDOC') //lineitem[ @price > 100 ]

(LI1, LI2)

()

(LI3)

()

(LI1, LI2, LI3)

LI1

LI2

LI3

Result

Page 12: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XMLExists filter rows (usually)

Can use the index

SELECT ordid, orddocFROM ordersWHERE XMLExists(‘

$order//lineitem[ @price > 100 ]‘ passing orddoc as "order")

Cannot use the index: false exists

SELECT ordid, orddocFROM ordersWHERE XMLExists(‘

$order//lineitem/@price > 100‘ passing orddoc as "order")

Need XMLTest which uses XQuery’s Effective Boolean Value

Page 13: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XMLQuery + XMLExists vs. XMLTable Can use the index

SELECT ordid, XMLQuery(‘$order//lineitem[@price > 100]

’ passing orddoc as "order")FROM ordersWHERE XMLExists(‘

$order//lineitem[@price > 100] ’ passing orddoc as "order")

XMLTable: More efficient and less redundant

SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘

$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem)

Page 14: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Predicates in XMLTable column expressions Can use the index

SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘

$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem)

Cannot use the index

SELECT o.ordid, t.lineitem, t.priceFROM orders o, XMLTable(‘

$order//lineitem ’ passing o.orddoc as "order" COLUMNS "lineitem" XML BY REF PATH '.', "price" DECIMAL(6,3) PATH '@price[. > 100]‘ )as t(lineitem, price)

1 LI1

1 LI2

3 LI3

1 LI4 null

1 LI1 175

1 LI2 150

2 LI5 null

2 LI6 null

3 LI3 201

3 LI7 null

Page 15: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Joining XML Values in SQL/XML

Can use index on product/id, but not p.id

SELECT p.name, o.orddocFROM products p, orders oWHERE XMLExists(‘

$order//lineitem/product[ id eq $pid ] ‘ passing o. orddoc as "order", p.id as "pid")

Can use index on p.id, but not product/id

SELECT p.name, o.orddocFROM products p, orders oWHERE p.id = XMLCast( XMLQuery(‘

$order//lineitem/product/id ‘ passing o. orddoc as "order") as VARCHAR(13))

Need to unify XQuery and SQL data types

Page 16: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Joining XML Values in SQL/XML Probably cannot use XML indexes: SQL types differ from XML

SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLCast( XMLQuery(‘

$order/order/custid ’ passing o.orddoc as "order") as DOUBLE) =XMLCast( XMLQuery(‘

$cust/customer/id ’ passing c.cdoc as "cust") as DOUBLE)

Can use XML indexes

SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLExists(‘

$order/order[ custid/xs:double(.) = $cust/customer/id/xs:double(.) ]

‘ passing o.orddoc as "order", c.cdoc as "cust")

Page 17: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XQuery Let Clauses

Can use the index

for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')for $item in $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>

for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn $ord/lineitem[ @price > 100 ]

Cannot use the index

for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')let $item:= $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>

for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn <result>{ $ord/lineitem[ @price > 100 ]}</result>

R/L1

R/L2

R/L3

R/L1,L2

R

R/L3

R

Page 18: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

XQuery Let Clauses

Can use the index

for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderwhere $ord/lineitem/@price > 100return <result>{ $ord/lineitem }</result>

Same as above

for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderlet $price := $ord/lineitem/@pricewhere $price > 100return <result>{ $ord/lineitem }</result>

R/L4,L1,L2

R/L3,L7

Page 19: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Context is everything

$i is bound to the document node

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') return $i/order/lineitem

$j is bound to <my_order>

for $j in ( for $o in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return <my_order>{ $o/* }</my_order> )return $j/my_order/lineitem

Page 20: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Remember the dot

Produces a type error: no document node at root

let $order := <new_order>{ db2-fn:xmlcolumn('ORDERS.ORDDOC') /order[custid > 1001] } </new_order>return $order[ //customer/name ]

Absolute path expressions is a shorthand for

fn:root(.) treat as document-node().

Absolute path expressions are bad style

Page 21: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Construction and View Composition

Want to rewrite this…

let $view := for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC')/ order/lineitem return <ordered>{ $i/@quantity, $i/product/@price,

<pid>{ $i/product/id/data(.) }</pid> }</ordered>

for $j in $viewwhere $j/pid = '17‘return $j/@price

Page 22: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Construction and View Composition

… into this

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') /order/lineitemwhere $i/product/id/data(.) = '17‘return $i/product/@price

but…

Page 23: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Construction and View Composition

Data type changed to untypedAtomic

– id is string: comparison is now an error

– id is long: comparison is now as double instead of long

List types are concatenated

Error for duplicate @price attributes lost

New node identity lost

Parent axis is broken

Any sequence should live in tree without change.Separate identity from construction?

Page 24: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Remember the namespaces

Index definition and query must match namespaces

CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double

Cannot use the index. Which is right?

declare default element namespace "http://ournamespaces.com/order";

for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i

Page 25: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Elements and text nodes differ

CREATE INDEX PRICE_TEXT ON orders.orddocUSING XMLPATTERN '//price' AS varchar

Can not use index

for $ord in db2-fn:xmlcolumn(“ORDERS.ORDDOC”) /order[ lineitem/price/text() = “99.50” ]return $ord

Element might have more data than just text

<price>99.50<unit>USD</unit></price>

Page 26: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Attributes are shy

No attributes

//*//node()

Only attributes

//@*//attribute::node()

Empty result due to “principle node kind”

//@*/self:*

Page 27: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Between predicates are not obvious

Might not be between: multiple prices

lineitem[ price > 100 and price < 200 ]

Between or error

lineitem[ price gt 100 and price lt 200 ]

Always between

lineitem/price/data()[ . > 100 and . < 200 ]

Between if not list type

lineitem/price[ . > 100 and . < 200 ]

Between if not list type

lineitem[ @price > 100 and @price < 200 ]

Page 28: On the Path to Efficient XML Queries

IBM Almaden Research Center

© 2006 IBM Corporation

Conclusions

Easy to make mistakes without schema constraints

Many subtle differences in expressions

Improve construction composition

Unify SQL and XQuery type systems

Add XMLTest to SQL/XML


Recommended