Querying XML Data in DB2

Native XML Support inDB2 Universal Database

Matthias Nicola, Bert van der LindenIBM Silicon Valley Lab

Presented by Mo LiuPresented by Mo Liu , Frate, Joseph Frate, Joseph and and John RussoJohn Russo

Some material in the talk is adapted from Some material in the talk is adapted from the slides of this paper’s conference talk.the slides of this paper’s conference talk.

Agenda

What is DB2 9 (Viper)? Native XML in the forthcoming version of

DB2 Native XML Storage XML Schema Support XML indexes Querying XML data in DB2 Summery

What is DB2 9 (Viper)? IBM DB2 9 is the next-generation hybrid

data server with optimized management of both XML and relational data.

IBM extended DB2 to include:• New storage techniques for efficient

management of hierarchical structures inherent in XML documents.

• New indexing technology

New query language support (for XQuery), a new graphical query builder (for XQuery), and new query optimization techniques

New support for validating XML data based on user-supplied schemas

New administrative capabilities, including extensions to key database utilities

Integration with popular application programming interfaces (APIs)

XML Databases XML-enabled Databases The core data model is not XML (but e.g. relational) Mapping between XML data model and DB’s data model is required, or XML is stored as text E.g.: DB2 XML Extender v8

Native XML Databases Use the hierarchical XML data model to store and process XML internally No mapping, no storage as text Storage format = processing format E.g.: Forthcoming version of DB2

XML in Relational Databases – Today's Challenge

Today’s Challenge: XML must be force fit into relational data model – 2 choices1. Shredding or decomposing − Mapping from XML to relational often too complex − Loses hierarchical dependencies

− Loses digital signature− Often requires dozens or hundreds of tables− Difficult to change original XML document

2. Large Object (BLOB, CLOB, Varchar) It allows for fast insert and retrieval of full documents but it needs XML

parsing at query execution time.− SLOW performance− Search performance is slow (must parse at search time)− Retrieval of sub-documents is expensive− Update inside the document is slow− Indexing is inefficient (based on relative position)− Difficult to join with relational− Costs get worse as document size increases

DB2 Hybrid XML Engine - Overview

Integration of XML & Relational Capabilities in DB2 Native XML data type

(not Varchar, not CLOB, not object- relational)

XML Capabilities in all DB2 components Applications combine XML & relational

data

Integrating XML and Relational in DB2

DB2 Hybrid XML Engine - Interfaces Data Definition create table dept(deptID int, deptdoc xml); Insert insert into dept(deptID, deptdoc) values (?,?) Index create index xmlindex1 on dept(deptdoc) generate key using xmlpattern ‘/dept/name’ as varchar(30); Retrieve select deptdoc from dept where deptID = ? SQL based Query select deptID, xmlquery('$d/dept/name' passing deptdoc as “d") from dept where deptID <> “PR27”; XQuery based Query for $book in db2-fn:xmlcolumn('BOOKS')/book for $entry in db2-fn:xmlcolumn('REVIEWS')/entry where $book/title = $entry/title return <review> {$entry/review/text()} </review>;

Native XML Storage

Efficient Document Tree Storage

Information for Every Node Tag name, encoded as unique StringID A nodeID Node kind (e.g. element, attribute, etc.) Namespace / Namespace prefix Type annotation Pointer to parent Array of child pointers Hints to the kind & name of child nodes (for early-out navigation) For text/attribute nodes: the data itself

XML Node Storage Layout

XML Storage: “Regions Index”

XML Indexes in DB2

Need index support to manage millions of XML documents

Path-specific value indexes on XML columns to index frequently used elements and attributes

XML-aware full-text indexing

XML Value Indexes Table DEPT has two fields: “id” and “dept_doc” Field “dept_doc” is an XML document:

<dept>

<employee id=901>

<name>John Doe</name>

<phone>408 555 1212</phone>

<office>344</office>

</employee>

</dept>

CREATE INDEX idx1 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘/dept/employee/name’ AS SQL VARCHAR(35)

Creates XML value index on employee name for all documents

XML Value Indexes (continued)

“xmlpattern” identifies the XML nodes to be indexed

Subset of XPath language Wildcards, namespaces allowed XPath predicates such as /a/b[c=5] not supported

“AS SQL” necessary to define data type, since DB2 does not require single XML schema for all documents in a table (so DB2 may not know data type to use for index)

XML Value Indexes: Data Types

Allowed data types for indexes: VARCHAR(n) VARCHAR HASHED, DOUBLE DATE TIMESTAMP

DB2 index manager enhanced to handle special XML types (e.g., +0, -0, +INF, -INF, NaN)

XML Value Indexes (continued)

Node does not cast to the index typeNo error is raisedNo index entry created for that node

Single document (e.g., XML field from single record) may contain 0, 1, or multiple index entriesDifferent than relational index

XML Value Indexes: unique indexes

Unique indexes enforced within a document, and across all documents

Example of unique index on employee id:

CREATE UNIQUE INDEX idx2 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN‘/dept/employee/@id’ AS SQL DOUBLE

XML Value Indexes: multiple elements or attributes Can create indexes on multiple elements

or attributes Example: create index on all text nodes:

CREATE INDEX idx3 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘//text()’ AS SQL VARCHAR(hashed)

Example: create index on all attributesCREATE INDEX idx4 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘//@*’ AS SQL DOUBLE

XML Value Indexes: namespaces

Can index in a particular namespace XMLPATTERN can contain namespace

declarations and prefixes Example:

CREATE INDEX idx5 ON DEPT(deptdoc) GENERATE KEY USING XMLPATTERN ‘DECLARE NAMESPACE m=http://www.me.com/;/m:dept/m:employee/ m:name’ AS SQL VARCHAR(45)

XML Value Indexes: internal

For each XML document, each unique path mapped to an integer PathID (like StringID for tags)

Each index entry includes: PathID to identify path of indexed node Value of the node cast to the index type RowID

Identify rows containing the matching documents NodeID

Identify matching nodes and regions within the documents

XML Value Indexes: atomic vs. non-atomic

Atomic Node: if it is an attribute, or if it is a text node, or if it is an element that has no child elements and

exactly one text node child Indexes typically defined for atomic nodes Possible to define index on non-atomic

nodes, e.g. index on ‘/dept/employee’

XML Value Indexes: atomic vs. non-atomic

‘/dept/employee’ non-atomic since it has child elements

Single index entry for all of “employee” element, on all text nodes under “employee” (concatenation)

Can be useful for mixed content in text-oriented XML, e.g.: <title>The benefits of <bold>XML</bold></title>

XML Full Text Indexes

Allows full-text search of XML columns Can be fully indexed or partially indexed Example of full index:

CREATE INDEX myIndex FOR TEXT ON DEPT(deptdoc) FORMAT XML CONNECT TO PERSONNELDB

Example query: SELECT deptdoc FROM dept WHERE

CONTAINS(deptdoc,’SECTIONS(“/dept/comment”) “Brazil” ‘) =1

Internal index structure

System RX: One Part Relational, One Part XML

Kevin Beyer, Roberta J Cochrane,

Vanja Josifovski, Jim Kleewein, George Lapis,

Guy Lohman, Bob Lyle, Fatma Özcan,

Hamid Pirahesh, Normen Seemann,

Tuong Truong, Bert Van der Linden, Brian Vickery,

Chun Zhang

Internal index structure

XML index implemented with two B+ treesPath indexValue Index

Internal index structure: Path Index

Path Index maps reverse path (revPath) to a generated path identifier (pathId)

A “reverse path” is a list of node labels from leaf to root Compressed into vector of label identifiers

Analogy to COLUMNS catalog from relational database

Used for efficient processing of descendent queries Example: “//name” query

Internal index structure: Value Index Value Index used to represent nodes Cconsists of the following key:

PathIdvaluenodeIdrid

Internal index structure: Value Index “value” is representation of the node’s data

value when cast to the index’s data type “rid” identifies the row in the table (used for

locking) “nodeId” identifies a node within the

uses a Dewey node identifiercan provide quick access to a node in the XML

store “pathId” to retrieve specific path queries

Internal index structure: Tradeoffs of Value Index key fields Order of keys is a tradeoff pathId first allows quick retrieval of specific

queries e.g., index on //name might match many paths query on /book/author/name still has consecutive

index entries but, query like //name=‘Maggie’ will need to examine

every location in the index per matching path

XML Schema Support

Optional XML Schema validation Insert, Update, Query Limited support for DTDs an external entities Type annotation produced by validation

persisted with document (query execution) Conforms to XML Query standard, XML

Schema standard, XML standard

XML Schema Support

Register XML Schemas and DTDs in DB DB then stores type-annotated documents

on disk, compiles execution plans with references to the XML Schemas

Schemas stored in DB itself, for performanceXML Schema Repository (XSR)

XML Schema Support: XSR

XSR consists of several new database catalog tables:Original XML schema documents for XML

schemaBinary representation of the schema for fast

reference

XML Schema Support: Registration

Example:REGISTER XMLSCHEMA

http://my.dept.com FROM dept.xsd AS departments.deptschema complete

Schema URI is http://my.dept.com File with schema document is “dept.xsd” Schema identifier in DB is “deptschema” Belongs to relational DB schema “departments”

XML Schema Support: Validation

“XMLVALIDATE” function to validate documents in SQL statements

Schema for validation is specified explicitly, orcan be deduced from the schemaLocation hints

in the instance documents Referenced by Schema URI or by identifier


Example (explicit by URI):INSERT INTO DEPT(detpdoc)

VALUES xmlvalidate(?according to xmlschema uri ‘http://my.dept.com’)

Example (explicit by ID):INSERT INTO DEPT(deptdoc)

VALUES xmlvalidate(? according to xmlschema id departments.deptschema)


Example (implcit) DB2 tries to deduce schema from input

document

INSERT INTO dept(deptdoc) VALUES xmlvalidate(?)

Try to find it in repository

XML Schema Support: First repository design principle

Repository will notrequire users to modify a schema before it is being

registeredrequire users to modify XML documents before they

are inserted and validated Once document is validated in DB,it will never

require updates to remain validConsidered infeasible to bulk-update all existing

documents to become valid

XML Schema Support: Second repository design principle

Enable schema evolution Sequence of changes in an XML schema over

its lifetime New or evolving business needs How to accomplish schema evolution is much-

debatedno standardsbusiness demands require it; so constrain problem


Flexibility of schema repository “paramount importance”

DB2’s schema repository does not require namespace or the schema URI of each registered schema to be unique (user does not have control)

Database-specific Schema identifier must be unique (user does have control)


Built-in support for one very simple type of schema evolution

If new schema is backwards-cmpatible with old schema, then old schema can be replaced with new schema in the schema repository

DB2 verifies all possible elements and attributes in old schema have same named types in the new schema

Querying XML Data in DB2

Options Supported XQuery/XPath as a stand-alone language SQL embedded in XQuery XQuery/XPath embedded in SQL/XML Plain SQL for full-document retrieval

DB2 treats SQL and XQuery as primary query languages. Both will operate independently on their data models Can also be integrated

Sample Tables

create table ship (shipNo varchar(5) primary key not null,capacity decimal(7,2),class int,purchDate date,maintenance xml

)

create table captain (captID varchar(5) primary key not null,lname varchar(20),fname varchar(20),DOB date,contact xml

)

Notice the xml datatype

Sample XML Data Ship.maintenance<mrecord>

<log><mntid>2353</mntid><shipno>39</shipno><vendorid>2345</vendorid><captid>9875</captid><maintdate>01/10/2007</maintdate><service>Removed rust on hull </service><resolution>complete</resolution><cost>13450.96</cost><nextservice>01/10/2008</nextservice>

</log><log>

<mntid>1254</mntid><shipno>39</shipno><vendorid>1253</vendorid><captid>9234</captid><maintdate>09/20/2005</maintdate><service>Replace rudder</service><resolution>complete</resolution><cost>34532.21</cost><nextservice>NA</nextservice>

</log></mrecord>

Sample XML DataCaptain.contactinfo<contactinfo>

<Address><street>234 Rolling Lane</street><city>Rockport</city><state>MA</state><zipcode>01210</zipcode>

</Address><phone>

<work>9783412321</work><home>9722342134</home><cell>9782452343</cell><satellite>2023051243</satellite>

</phone><email>[email protected]</email>

</contactinfo>

Standalone XQuery in DB2

for $s in db2-fn:xmlcolumn(‘ship.maintenance’)

let $ml:= $s//log

where $ml/cost = > 10000

order by $ml/shipno

return <MaintenanceLog>

{$ml/shipno,$ml}

</MaintenanceLog>

Db2-fn:xmlcolumn returns sequence of all documents in the XML column

SQL Embedded in XQuery

for $m in db2-fn:sqlquery(‘select maintenance from ship where class = 1’)

let $ml := $m//log

order by $ml/shipno

return

<maintenanceLog>

{$ml}

</mantenanceLog>

This will return the documents for all class one ships.

Select Statement using XML Column

Select shipno,class,maintenance

from ship

where class = 1

This will produce the maintenance document for each ship that is class 1.

We can also create views this way

SQL/XML Queries

Restricting results using XML element values select captid,lname,fname from captain

where xmlexists(‘$c/contactinfo/Address[state=“MA”]’

passing captain.contact as “c”

• This will return the captid, lname and fname of all captains who live in Massachusetts

SQL/XML Queries

Projecting XML element values Two functions: XMLQuery and XMLTable

XMLQuery retrieves value for 1 element XMLTable retrieves value for multiple elements

XMLQuery example:select xmlquery(‘$c/contactinfo/email’

passing contact as “c”)

from captain

where state = ‘MA’

This will return email addresses for all captains in Massachusetts

SQL/XML QueriesXMLQuery (Continued)

We could also look for only first email for each captain by changing the first line:

select xmlquery(‘$c/contactinfo/email[1]’ …

Similarly, we could use xmlexists to qualify:select xmlquery(‘$c/contactinfo/email’

passing contact as “c”)from captain

where state = ‘MA’and xmlexists(‘$c/contactinfo/email’passing contact as “c”)

SQL/XML QueriesXMLTable XMLTable retrieves XML elements Elements are mapped into result set

columns Maps XML data as relational data

SQL/XML QueriesXMLTable Exampleselect s.shipNo,sm.mid,sm.vid,sm.md,sm.cost

from ship s,xmltable(‘$c/mrecord/log’ passing s.maintenance as “c”columns varchar(4) mid path ‘mntid’,

varchar(4) vid path ‘vendorid’, date md path ‘maintdate’, decimal(7,2) cost path ‘cost’) as sm

This will produce a list of maintenance logs for all ships

Joining XML and Relational Data

select c.captid,c.lname,c.fname

from captain, ship

where xmlexists(‘$s/mrec/log[captid=$c]’

passing ship.maintenance as “s”, captain.captid as “c”)

If the captain was the captain of any ship when it underwent maintenance, he or she will be listed

Using FLWR Expressions in SQL/XMLselect captid,

xmlquery(‘for $c in $cn/contactinfo

let $x := $c//city

return $x’ passing contact as “cn”)

from captain

where class = 1

Returns captid as well as city information

XMLElement

XML Element allows you to publish relational data as XML

select xmlelement(name “captain”,xmlelement(name “captid”, captid),xmlelement(name “lname”,lname),xmlelement(name “fname”,fname),xmlelement(name “class”,class))

from captainwhere class <= 2

XMLElementOutput from previous command

<captain>

<captid>3563</captid>

<lname>Smith</lname>

<fname>John</fname>

<class>2</class>

</captain>

…

Aggregating and Grouping Data

select xmlelement(name “captainlist”,

xmlagg(xmlelement(name “captain”,

xmlforest(cid as “captid”,lname as “lname”,fname as “fname”,class as “class”))

order by cid))

from captain

group by class

This query produces three captainlist elements each with a number of captains.

Updating and Deleting XML Data

UpdatesUse XMLParse command. You must specify

the entire XML column to update. If you specify only 1 element to update, the rest of the data will be lost.

DeletionSame as standard SQLCan also use xmlexists to use XML as qualifier

Query Execution Plans

•Separate parsers for SQL and XQuery statements

•Integrated query compiler for both languages

•QGMX is an internal query graph model

•Query execution plans contain special operators for navigation (XSCAN), XML index access (XISCAN) and joins over XML indexes (XANDOR)

Source: [2]

Query Run-time Evaluation

3 major components added for processing queries over XML:XML NavigationXML Index RuntimeXQuery Function Library

Summary

Problems with CLOB and Shredded XML storage

Native XML support in DB2 offers:Hierarchical and parsed representationPath-specific XML indexingNew XML join and query methods Integration of SQL and XQuery

References

[1] Nicola, M. and van der Linden, B. 2005. Native XML support in DB2 universal database. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 1164-1174.

[2] Beyer, K., Cochrane, R. J., Josifovski, V., Kleewein, J., Lapis, G., Lohman, G., Lyle, B., Özcan, F., Pirahesh, H., Seemann, N., Truong, T., Van der Linden, B., Vickery, B., and Zhang, C. 2005. System RX: one part relational, one part XML. In Proceedings of the 2005 ACM SIGMOD international Conference on Management of Data (Baltimore, Maryland, June 14 - 16, 2005). SIGMOD '05. ACM Press, New York, NY, 347-358.

[3] http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0603saracco2/

Date post:	11-May-2015
Category:	Documents
Upload:	tess98
View:	2,109 times
Download:	5 times

Querying XML Data in DB2

Documents