© 2010 IBM Corporation Information Management DB2 9 for z/OS pureXML Performance and Best Practices...

transcript

Information Management

DB2 9 for z/OS pureXML Performance and Best Practices

June, 2010

Agenda

XML performance

Performance monitoring and tuning

Best practices

Note: the following performance numbers are not latestand have been improved since measurements.

What You Can Do with pureXML

Create tables with XML columns or alter table add XML columns

Insert XML data, optionally validated against schemas

Create indexes on XML data

Efficiently search XML data

Extract XML data

Decompose XML data into relational data or create relational view

Construct XML documents from relational and XML data

Handle XML objects in all the utilities and tools

XMLDOC

XML Column

XMLIndex

- Managing XML data the same way as relational data

Storage for UNIFI Messages

Original Docs DB2XML Strip WS DB2XML Pres WS

Uncompress (KB)

Compress (KB)

96 sample documentsStrip WS: Strip WhitespacesPres WS: Preserve Whitespaces

58%52% 71%

100120140

160180

OriginalDocs

XMLStripWS

XMLPrervWS

OriginalDocs

XMLStripWS

XMLPrervWS

Unifi-messages JPM

Non compressed Compressed

Whitespace option and Compression

Note : UNIFI (International Standard ISO 20022 – UNIversal Financial Industry message scheme )

70% saving

80% saving

Table Compression Impact

EL CPU EL CPU

insert Xscan

Non compressed XML table Compressed XML table

Insert Performance (Batch)

Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

1K x 1000000 10K x 100000 100K x 10000 1M x 1000 10M x 100

Doc Size x Number

Elapsed CPU

3.9 millions 10K docs per hour or

1100 docs/sec

Insert XML – with indexes

Insert Elapsed and CPU

100% 100%111% 116%

125%138%

40%60%

80%100%

120%140%

Elapsed CPU

w/1 index

w/ 2 indexes

XML Index Create or Rebuild

//e /a/b/@c /a/b/f/g

//e' /a/b/@c' /a/b/f/g'

Create Elapsed Create CPU Rebuild Elapsed Rebuild CPU

Insert Performance – compare w/ CLOB

(average of 1K to 10M document insert performance)

100% 100%111% 116%

71.36%64%

Elapsed CPU

XML XML w/ One index CLOB

Fetch Performance (Batch)

Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

01020304050607080

1K x 1000000 10K x 100000 100K x 10000 1M x 1000

Doc size x Number

Elapsed CPU

9.3 millions 10K docs per hour or

2580 docs/sec

XML Index Exploitation

XML Insert v.s. Validation v.s. Decomposition

Insert Insert withValidation

Decomposition

4K doc Single thread 3000 repeat

Elapsed

Large Sample Tax Document Insert Performance

IRS Doc size Elapsed time (resp time) CPU time 25M 2.09 1.284250M 17.663 13.32378M 26.089 19.773500M 35.0333 26.822722M 51.917 40.064

z9-109, one LPAR with dedicated 3 CPs. Documents were stored remotelyin AIX box and inserted using Java application. Time in seconds.

Order_insert (1-2K)

CustAcc_insert (4-20KB)

LOAD (2.5MB)

XMLSS usage in insert and load

zAAP eligble CP

z/OS XML Specialty Engine Support

14%32%

LOAD Testing

Job Number of Rows

Size of XML documents (bytes)

Number of User XML Indexes

CPU time in general CP(sec)

CPU time in zAAP

Redirection percentage

LOAD1 300,000 4K-20K 4 191 40 17%

LOAD2 300,000 4K-20K 2 152 31 17%

LOAD3 300,000 4K-20K 1 93 38 29%

LOAD4 2,000 2.5M 2 329 82 20%

LOAD5 2,000 2.5M 2 330 82 20%

LOAD6 200 25M 1 254 64 20%

LOAD7 200 25M 1 114 64 36%

Average 209 57 21%

Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 4 General Purpose CPs, 1 zAAP, 1 zIIP, all dedicated Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

TPoX Insert Test

TPoX Mass insert2 z/OS 1.8 (Case-A)

z/OS 1.9 (Case-B)

LPAR CPU utilization 76.45% 63.71%

Number of concurrent threads 20 Threads 20

XML inserts per second (average) 2363 2269

DB2 class1 average elapsed time (ms, per commit) 77.565 88.501

DB2 class1 average CPU time (ms, per commit) 7.909 7.586

Number of transactions per second (10 inserts /tx) 236.3 226.9

Internal Throughput rate 309.09 356.15

XML System Service CPU usage in LPAR 32.30% 18.23%

Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 dedicated General Purpose CPs (no zIIP no zAAP) Memory: 24GB memory Storage IBM DS8300 Case–AOperating system z/OS Version 1.8 DB2 DB2 9 June 2007 PTF levelCase –BOperating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

TPoX Mixed Transaction Test (1/2)

Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 General Purpose CPs, dedicated (no zAAP or zIIP) Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF levelThreads 35

Transaction name Type of transaction Transaction Weight

Get _order (1) Query 12

Get_security (2) Query 12

Customer_profile (3) Query 12

Account_summary (5) Query 12

Get_security_price (6) Query 12

Insert_custacc Insert 20

Insert_order Insert 20

TPoX Mixed Transaction Test (2/2)

Transaction Average Transaction Response time

Get _order 0.03 second

Get_security 0.03 second

Customer_profile 0.03 second

Account_summary 0.03 second

Get_security_price 0.02 second

Insert_custacc 0.03 second

Insert_order 0.02 second

Transactions per second 1207 tps

CPU utilization 58.3%

Internal Throughput Rate 2068 tps

z/OS XML System Services CPU consumption

TPoX Benchmark

PK81260 PK80732 10 PK80732 20 PK80732 30 PK80732 40

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

ETR (tps)

cpu busy (%)

z10, 5 CPs

Number of users

October, 2009

Agenda

XML performance

Best practices

Performance Monitoring and Tuning

Since XML native storage is built on top of regular tablespace structure, there are no special changes in DB2 Performance Expert to support XML other than minor points - such as new XML locks (type x’35’).

XML performance problem can be analyzed through accounting traces and performance traces.

There is a new LOAD MODULE for XML: DSNNXML XML indexes have the same consideration as other

indexes. The REORG utility should be used to maintain order and

free space. Run RUNSTATS for statistics to help pick XML indexes.

XML Query Performance Issues

■ 85% of the performance issues relate to:– Query execution plans– Index usage (indexing presentation)– Proper coding of SQL/XML and XQuery expressions

(Best Practices section)

How to obtain and analyze XML query plans■ Create Explain tables

– Use member DSNTESC of the SDSNSAMP library– Option E from menu of DB2 admin tool (DSN_STATEMNT_TABLE)– Use Visual Explain

• Optim Development Studio• IBM DB2 Optimization Service Center for DB2 for z/OS(OSC)

■ Gather explain information– Use SPUFI – prefix query with EXPLAIN PLAN SET QUERYNO– SELECT from PLAN_TABLE

Use RUNSTATS

Use RUNSTATS to collect statistics for XML data and indexes so the optimizer can pick the right access methods

LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD

RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

Agenda

XML performance

Best practices

Best Practices

Tip 1: Choose the right table and storage design

Tip 2: Choose the right XML document granularity

Tip 3: Be aware of XML schema validation overhead

Tip 4: Avoid encoding conversion during XML insert and retrieval

Tip 5: In XPath expressions, use fully specified paths as much as possible

Tip 6: Define lean XML indexes

Tip 7: Put document filtering predicates in XMLEXISTS instead of XMLQUERY

Tip 8: Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS

Tip 9: Use RUNSTATS to collects statistics for XML data and indexes

Tip 10: Use SQL/XML publishing views to expose relational data as XML

Tip 11: Use XMLTABLE views to expose XML as relational data

Tip 12: Use SQL/XML statements with parameter markers and host vars

Tip 1: Decision making: XML input => storage

Regulatory Requirements

Intact Digital Signature Significant Data Flexible

Search in XML

LOBVARCHARVARBIN

(preserve whitespace) (strip whitespace) (Relational/XML)

Yes Return XMLalways

XML withXML indexes

Light Reporting

StructuresRegularFixed Relational

Complex Flexible

XML withXMLTABLE()

Heavy Analytics

Relational withXML

(can be materialized)

Some considerations

Tedious normalization and frustrated changes of schema are an indicator for using native XML.

Store hybrid or redundant data in relational/XML, when– Fully normalized storage is an overkill– Referential integrity: extract into relation columns– Store in XML, but materialize frequently used fields in relational

for heavy analytic applications– Document size

Use compression for XML data always

Table Design

Mixed document types in one table– Flexibility in exchange of

overhead (such as index maintenance)

Separate tables for different document types– to avoid overhead

XMLDOCID DOCTYPE

XMLDOCID

DOCTYPE1TAB

XMLDOCID

DOCTYPE2TAB

DOCTAB

Tip 2: Choose the right XML document granularity

Small vs. large documents? (KBs vs. MBs)XML Indexes filter at the document level

Smaller documents tend to perform betterBut, rule of thumb:

Document granularity should match the predominant granularity of access

Document Granularity: Example<order date=‘2004-11-05'>

Tip 3: Beware of Schema Validation Overhead

create table dept(deptID char(8), deptdoc xml);

Validation is optional, and per document (per row):insert into dept values (?, ?)insert into dept values (?, dsn_xmlvalidate(?, ?))

Validation increases CPU time for inserts, and reduces throughput.

Use schema validation if needed.Avoid schema validation for highest possible insert performance.

No validation

with validation

Tip 4: Avoid encoding conversion

■ Internally encoded XML: encoding derived from the data, e.g. Unicode Byte-Order Mark or optional XML declaration: <?xml version="1.0" encoding="UTF-8" ?>

■ Externally encoded XML: application encoding determines XML encoding if character type variables are used

■ Internally encoded XML with UTF-8 is preferred

– CLI: use SQL_C_BINARY data buffers rather than SQL_C_CHAR, SQL_C_DBCHAR, SQL_C_WCHAR

– Java: use binary stream (setBinaryStream) rather than string (setString).

– COBOL: SQL BLOB

Tip 5: Use fully specified paths if possible■ As much as possible possible, use fully specified XPath expression

rather than wildcards, e.g.

– /customerinfo/phone instead of //phone

– /customerinfo/addr/state instead of /customerinfo/*/state

<customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant></customerinfo>

Tip 6: Lean XML Indexes• create unique index idx1 on customer(info)

• generate key using

• xmlpattern '/customerinfo/@Cid'

• as sql decfloat;

• create index idx2 on customer(info)

• xmlpattern '/customerinfo/name'

• as sql varchar(40);

create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40);

create table customer( info XML);

create index idx4 on customer(info)generate key using xmlpattern '/customerinfo/phone' as sql varchar(40);

LUW: “as sql double”zOS: “as sql decfloat”

Tip 6: Lean XML Indexes• create unique index idx1 on customer(info)

• xmlpattern '/customerinfo/@Cid'

• as sql decfloat;

• create index idx2 on customer(info)

• xmlpattern '/customerinfo/name'

• as sql varchar(40);

create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40);

create index idx4 on customer(info) generate key using xmlpattern '//text()' as sql varchar(40);

Don’t index everything!Very expensive for

insert, update, delete !

Tip 6: Lean XML Indexes and Indexing non-leaf Nodes

<customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> (…)</customerinfo>

Typically not useful !

…xmlpattern '/customerinfo/addr' as sql varchar(128);

Single index entry. Key value = concatenation of all text nodes under “addr”:

(/customerinfo/addr, “1596 BaselineTorontoOntarioM3Z-5H9”)

Better: 4 separate indexes !

…xmlpattern '/customerinfo/addr/street' as sql varchar(50); …xmlpattern '/customerinfo/addr/city' as sql varchar(40); …xmlpattern '/customerinfo/addr/state' as sql varchar(25); …xmlpattern '/customerinfo/addr/pcode' as sql varchar(10);

Tip 7 & 8 : Put document filtering predicates in XMLEXISTS instead of XMLQUERY & Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS■ XMLQUERY function in a SELECT clause does not filter

documents or rows, does not use indexes

■ Document/Row-filtering predicates must be in XMLEXISTS in the WHERE clause

■ Predicates in XMLEXISTS must be in square brackets

SQL/XML with XMLQUERY

<customerinfo> <name>Matt Foreman</name> <phone>905-555-4789</phone></customerinfo>

<customerinfo> <name>Peter Jones</name> <phone>905-123-9065</phone></customerinfo>

<customerinfo> <name>Mary Poppins</name> <phone>905-890-0763</phone></customerinfo>

• select xmlquery(‘$i/customerinfo[phone = “905-555-4789”]/name’ passing info as “i”)

from customer

select xmlquery(‘$i/customerinfo/name’ passing info as “i”) from customer where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

<name>Matt Foreman</name>

customer table:

1 record(s) selected

Can usean index!

Can not use an index!

select xmlquery(‘$i/customerinfo/name’ passing info as “i”) from customer where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

SQL/XML with XMLEXISTS select xmlquery(‘$i/customerinfo/name’ passing info as “i”)

from customerwhere xmlexists(‘$i/customerinfo/phone = “905-555-4789”’ passing info as “i”)

<name>Matt Foreman</name>

<name>Peter Jones</name>

<name>Mary Poppins</name>

customer table:

<customerinfo> <name>Matt Foreman</name> <phone>905-555-4789</phone></customerinfo>

<customerinfo> <name>Peter Jones</name> <phone>905-123-9065</phone></customerinfo>

<customerinfo> <name>Mary Poppins</name> <phone>905-890-0763</phone></customerinfo>

Can usean index!

Can not use an index!

True or false, not empty!

Tip 9: Use RUNSTATS on XML data!

■ RUNSTATS does collect statistics for XML data and XML indexes!

■ The optimizer does use these stats!

LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD

RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

■ SQL/XML publishing functions hidden in a view

■ create table unit( unitID char(8), name char(20), manager varchar(20));

■ create view UnitView(unitID, name, unitdoc) as select unitID, name, XMLELEMENT(NAME "Unit",

XMLELEMENT(NAME "ID", u.unitID),

XMLELEMENT(NAME "UnitName", u.name),

XMLELEMENT(NAME "Mgr", u.manager) )

from unit u;

Queries that perform sub-optimally

select unitdoc from UnitViewwhere xmlexists('$i/Unit[ID = "WWPR"]' passing unitdoc as "i");

Query that performs well: filter on relational

select unitdoc from UnitViewwhere UnitID = "WWPR";

In a nutshell, include relational columns in a SQL/XML publishing view, and when querying the view express any predicates on those columns rather than on the constructed XML.

Tip 11: Use XMLTABLE views to expose XML data in relational format

■ Values returned from XML documents in tabular format

■ create table customer(info XML);

■ create view myview(CustomerID, Name, Zip, Info) as SELECT T.*, info FROM customer, XMLTABLE ('$c/customerinfo' passing info as “c”

COLUMNS

“CID” INTEGER PATH './@Cid',

“Name” VARCHAR(30) PATH './name',

“Zip” CHAR(12) PATH './addr/pcode' ) as T;

Tip 11: Use XMLTABLE views to expose XML data in relational format

■ Query with an XML predicate– May perform sub-optimallyselect CustomerID,Name from myviewwhere Zip = “95141”;

■ Will perform well select CustomerID, Name

from myview where xmlexists('$i/customerinfo[addr/pcode ' “95141”] passing info as “i”);

In a nutshell, be careful with XMLTABLE views which expose XML data in relational form. When possible, include additional columns in the view definition so that filtering predicates can be expressed on those columns instead of the XMLTABLE columns.

Extracting values from XML for Hybrid Store using Trigger

CUST(ID, NAME, CITY, ZIP, INFO): extract NAME, CITY, ZIP from INFO (XML)

CREATE TRIGGER ins_cust AFTER INSERT ON cust REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL BEGIN ATOMIC update cust set (name, city, zip) = (select X.name, X.city, X.zip from cust, XMLTABLE('customerinfo' PASSING CUST.INFO COLUMNS name varchar(30) PATH 'name', city varchar(20) PATH 'addr/city', zip varchar(12) PATH 'addr/pcode-zip') as X where cust.id = newrow.id ) where cust.id = newrow.id; END #

Tip 12: Use Parameter markers and host vars for fast XML queries

select info from customer where xmlexists('$i/customerinfo[phone = "905-555-4789"]' passing info as "i")

select info from customer where xmlexists('$i/customerinfo[phone = $p]' passing info as "i", cast(? as varchar(12)) as "p")

select info from customer where xmlexists('$i/customerinfo[phone = $p]' passing info as "i", :vchostvar as "p")

XML Queries – Things to do to improve Performance

Use XPath instead of FLWOR where possible– Reason: Simpler is better, for humans and for the DB2 optimizer.– XML may not have to be reconstructed and FLWOR can use more

tempspace

■ Avoid parent for predicates– Eg. /a/b//d[../c=fn:string(“abc”)]– Reason: parent steps in the predicate prevent index usage– See: http://www.ibm.com/developerworks/data/library/techarticle/dm-

0611nicola/#cases

■ Use range predicates when appropriate– Eg. [dateOfBirth [.>= xs:date(“2000-01-01”) and .<= xs:date(“2000-12-31”)]]– What is important here is the notation with the self axes (the dots).– Reason: This allows DB2 to use a single XISCAN instead of 2 XISCANs +

IXAND.– See: http://www.ibm.com/developerworks/data/library/techarticle/dm-

0611nicola/#rangepredicates

LUW chart

Using SPUFI or JCL

SELECT Cid, InfoFROM DSN8910.CUSTOMERWHERE XMLEXISTS ('declare default element namespace "http://posample.org";//addr[city="Toronto"]' passing INFO)

XML and XPath are case-sensitive: CAP off/case mixed– SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN

UNEXPECTED TOKEN DEFAULT FOLLOWING DECLARE.

Terminal session CCSID setting has to be consistent with application encoding scheme as “[” and “]” have different code points in different code pages.– SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN

UNEXPECTED TOKEN FOLLOWING "Toronto".

XML as Front to Backend/Core Systems

DB2 pureXML

relationalXML

Backend 1

Backend 2

Backend 3

Backend n

Interface Tables 1

Interface Tables 2

Interface Tables 3

Interface Tables n

Physical tables or logical views

Need to handle XML data, but full normalization is overkill.

Great fit for pureXML

An End-to-end XML Paradigm

Client Presentation

HTML+XFormsLotus Forms

Data StorageDB2 pureXML

relationalXMLSOA Gateway / WAS

XQueryXSLTXPath

End-to-End Straight Through Processing using XML

XML programming paradigm and architecture pattern– XForms– REST/SOAP web services– XQuery suite: XQuery, XQuery update facility, XQuery scripting extension, etc.

ProtocolSOAPHTTP

Summary

XML performance

Best practices