© 2008 IBM Corporation
Reading DB2 LUW EXPLAIN plans(with special emphasis on XML)
Susanne Englert3/31/2009
2© 2009 IBM Corporation Information Management
Agenda
Why should I be interested in EXPLAINs?What IS an EXPLAIN?How do I get them?How do I read them?►What do the operators mean?
►Which ones are XML-specific?
Examples (many)Summary and references
3© 2009 IBM Corporation Information Management
Why should I be interested in EXPLAINs?
The single most powerful tool to debug performance problems!Answer questions like:►Which indexes are getting used?
►How many rows does DB2 think my query will read?
►Does my query require sorts?
►For joins, what join methods are being used? In what order are the tables joined?
4© 2009 IBM Corporation Information Management
What happens when I ask for an EXPLAIN?
DB2 query optimizer populates special tables in the catalog that describe the execution strategy (the “plan”)► EXPLAIN_ARGUMENT, ► EXPLAIN_INSTANCE► … more
Two tools are available to read the tables and provide a visual/graphical representation of the plan ► db2exfmt (command line)► Visual Explain
Other tools that do not use the “Explain tables”► dynexpln► db2expln
5© 2009 IBM Corporation Information Management
What does an EXPLAIN look like?
Representation of the query optimizer’s execution plan as a tree►Leaf nodes are data►Internal nodes are operators that filter, join, sort, group, etc.
For queries, data flows upwards from the leaves through the tree’s operators towards the root
6© 2009 IBM Corporation Information Management
Types of EXPLAIN outputsdb2exfmt command-line tool
Visual Explain tool – start from►DB2 Control Center or►IBM Data Studio Developer
Rows RETURN ( 1) Cost I/O | 1
GRPBY ( 2) 15.2737
2 |
0.438058 HSJOIN ( 3) 15.2736
2 /-----+-----\
92.6111 92 IXSCAN IXSCAN( 4) ( 5) 15.1963 0.0655628
2 0 | | 19450 92
INDEX: SENGLERT INDEX: SENGLERT PRODX1 DSX4
7© 2009 IBM Corporation Information Management
How to use db2exfmt Not as pretty as Visual Explain, but
► Text format provides all information without clicking► Easy to cut, paste, attach to email► Preferred format when dealing with DB2 support and sending explains to IBM► Prerequisite: (Create Explain tables in the database catalog – a one- time operation)
db2 –tvf sqllib/misc/EXPLAIN.DDL
Steps (assume a connection to “mydb”) ► db2 set current explain mode explain (set flag to explain, don’t run query)► db2 –tvf <file containing text of your query> (populate the explain tables)► db2exfmt –d mydb -1 –o <output_file.exfmt> (format output)► db2 set current explain mode no (reset the explain-only flag)
OR ► db2 explain plan for <text of your query>► db2exfmt –d mydb -1 –o <output_file.exfmt> (format output)► When using this method, text following XQUERY must be enclosed in single quotes
Both options explain a single query and format the most-recently-explained query into a file called <output_file.exfmt> .
8© 2009 IBM Corporation Information Management
db2exfmt examples:
Single-table queries involving►Relational columns only (Q1)►XML extraction, relational predicate (Q2)►XQUERY, using one XML index for one predicate (Q3)►XQUERY, using two XML indexes and two predicates (Q4)►One XML index, One relational index (Q5)
Join queries►Relational join with XML predicates and extraction (Q6)►Q6 rewritten using XML joins, two ways
– Incorrectly written, doesn’t use index on join element (Q7)– Corrected to use index on join element (Q8)
9© 2009 IBM Corporation Information Management
Single-table queries: The PRODUCT table. Relational columns and one 1 XML column that replicates relational data
CREATE TABLE "SENGLERT"."PRODUCT" ("PRODKEY" INTEGER NOT NULL , "UPC_NUMBER" CHAR(11) NOT NULL ,"PACKAGE_TYPE" CHAR(20) ,"FLAVOR" CHAR(20) ,"FORM" CHAR(20) ,"CATEGORY" INTEGER ,"SUB_CATEGORY" INTEGER ,"CASE_PACK" INTEGER ,"PACKAGE_SIZE" CHAR(6) ,"ITEM_DESC" CHAR(30) ,"P_PRICE" DECIMAL(11,2) ,"CATEGORY_DESC" CHAR(30) ,"P_COST" DECIMAL(11,2) ,"SUB_CATEGORY_DESC" CHAR(70) ,"PRDDOC" XML );
Relational indexes:• PRODKEY (primary key), (CATEGORY, PRODKEY)XML indexes:/product/prodkey (type double), /product/category (type double) /product/sub_category (type varchar(30))
10© 2009 IBM Corporation Information Management
Query with relational columns only (Q1)-- Uses the PRODUCT table in database “POPSSER”
db2 => set current explain mode explain;DB20000I The SQL command completed successfully.
db2 => select count(*) from product wheredb2 (cont.) => category = 42 and sub_category = 3;
SQL0217W The statement was not executed as only Explain information requestsare being processed. SQLSTATE=01604
db2 => !db2exfmt -d popsser -1 -o q1.exfmt;
DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 200Licensed Material - Program Property of IBMIBM DATABASE 2 Explain Table Format Tool
Connecting to the Database.Connect to Database Successful.Output is in q1.exfmt.
db2 => set current explain mode no;db2 => -- Let's see what's in q1.exfmt!
11© 2009 IBM Corporation Information Management
The top of q1.exfmtDB2 Universal Database Version 9.5, 5622-044 (c) Copyright IBM Corp. 1991, 2007Licensed Material - Program Property of IBMIBM DATABASE 2 Explain Table Format Tool
******************** EXPLAIN INSTANCE ********************DB2_VERSION: 09.07.0SOURCE_NAME: SQLC2G13SOURCE_SCHEMA: NULLID SOURCE_VERSION: EXPLAIN_TIME: 2009-03-28-21.14.50.051131 EXPLAIN_REQUESTER: SENGLERT
Database Context:----------------
Parallelism: NoneCPU Speed: 4.000000e-005Comm Speed: 0Buffer Pool size: 880100Sort Heap size: 1024Database Heap size: 2476Lock List size: 423444Maximum Lock List: 98Average Applications: 1Locks Available: 13279204
Package Context:---------------
SQL Type: DynamicOptimization Level: 5Blocking: Block All CursorsIsolation Level: Cursor Stability
Database parameters that can affect query
plan selection!
12© 2009 IBM Corporation Information Management
The next part of q1.exfmt---------------- STATEMENT 1 SECTION 201 ----------------
QUERYNO: 1QUERYTAG: CLP Statement Type: SelectUpdatable: NoDeletable: NoQuery Degree: 1
Original Statement:------------------select count(*) from product where category = 42 and sub_category = 3
Optimized Statement:-------------------SELECT Q3.$C0 FROM
(SELECT COUNT(*) FROM
(SELECT $RID$ FROM SENGLERT.PRODUCT AS Q1 WHERE (Q1.SUB_CATEGORY = 3) AND (Q1.CATEGORY = 42)) AS Q2) AS Q3
Optimized Statement-A SQL-like representation of the query after rewriting:• View merging• Redirection to summary tables• Pre-computation of constant expressions• Subquery-to-join transformations
13© 2009 IBM Corporation Information Management
92.6111 IXSCAN( 6)16.3696
1
Total Cost: 45.259Query Degree: 1
Rows RETURN( 1)Cost I/O |1
GRPBY ( 2)45.2234 17 |
0.377555 FETCH ( 3)45.2096 17
/---+---\92.6111 19450 RIDSCN TABLE: SENGLERT( 4) PRODUCT18.7684
1 |
92.6111 SORT ( 5)18.7169
1 |
92.6111 IXSCAN( 6)16.3696
1 |19450
INDEX: SENGLERTPRODX2
The interesting part of q1.exfmt
A tree of operatorsEvery operator has:
►Rowcount estimate: 92.6111►Operator name: IXSCAN►Operator number: (6)►Cost: 16.3696►I/O count: 1
If you forget what the numbers mean, look at the RETURN operator! It serves as a legend.
Relational index on (category, prodkey)
Relational Index scan: CATEGORY = 42
Sort Row-IDs (RIDs) generated by index scan
Fetch rows from base table and apply SUB_CATEGORY = 3 predicate
Compute aggregate: count(*)
A sample operator
14© 2009 IBM Corporation Information Management
How to see what each operator is doing – Example: IXSCAN in operator 6 – look at detail section in output
6) IXSCAN: (Index Scan)…Predicates:----------3) Start Key Predicate
Comparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.0047615Predicate Text:--------------(Q1.CATEGORY = 42)
3) Stop Key PredicateComparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.0047615Predicate Text:--------------(Q1.CATEGORY = 42)
Input Streams:1) From Object SENGLERT.PRODX2
Estimated number of rows: 19450Number of columns: 2
Output Streams:2) To Operator #5
Estimated number of rows: 92.6111
What predicate is being evaluated?
PRODX2 is the relational index on (category, prodkey)
15© 2009 IBM Corporation Information Management
EXPLAIN plan operators – last three are XML-specificOperator: Description: DELETE Deletes rows from a table. FETCH Fetches rows from a table. FILTER Filters data. GENROW Used by DB2 to generate rows of data. GRPBY Groups rows. HSJOIN Performs a hash joins in which the qualified rows from tables are hashed. INSERT Inserts rows into a table. IXAND The AND’ing of the results of multiple index scans. IXSCAN Scans or probes an index on relational. MSJOIN Performs a merge-sort join. NLJOIN Performs a nested loop join. RETURN Returns data from a query. RIDSCN Scans a list of row identifiers (RIDs). RPD Retrieves data from a non-relational remote data source. SHIP Retrieves data from a remote data source. SORT Sorts rows or rowIDs from a table. TBSCAN Performs a table scans. TEMP Stores data in a temporary table. TQ A table queue, for parallelization of a query. UNION Concatenates streams of rows from multiple tables. UNIQUE Eliminates rows with duplicate values. UPDATE Updates data in the rows of a table. XANDOR Evaluates multiple predicates simultaneously with two or more XISCAN operators. XISCAN Scans or probes an index on XML data. XSCAN Navigates XML data to evaluate XPath expressions.
Table courtesy of Matthias Nicola
16© 2009 IBM Corporation Information Management
XML-specific EXPLAIN operators
XSCAN – XML document scan. Traverse XML document trees, extract document sequences or values, evaluate predicatesXISCAN - XML index scan ► Input: path-value pair such as $doc/product[p_price > 1.00]► Output: row IDs of qualifying documents and node IDs within those
documentsXANDOR – XML index AND-ing► Input: two or more XISCANs► Output Row IDs of document that satisfy all XISCANs► Can be used if:
– Only equality predicates are used. – There are no wildcards in the index lookup path.– All predicates involve the same XML column
► XANDOR does round-robin probing of indexes to efficiently find qualifying Row IDs
17© 2009 IBM Corporation Information Management
XML extraction, relational predicate (Q2)explain plan forselect xmlquery('$PRDDOC/product/item_desc/text()') from product where prodkey = 1;!db2exfmt -d popsser -1 -t|more;
RowsRETURN( 1)CostI/O|1
NLJOIN( 2)26.2359
3/-+--\
1 1FETCH XSCAN( 3) ( 5)18.1042 8.13169
2 1/---+---\1 19450
IXSCAN TABLE: SENGLERT( 4) PRODUCT9.9514 Q2
1|
19450INDEX: SYSIBM
SQL090217203311130Q2
IXSCAN - Relational index scan: PRODKEY = 1
XSCAN – the navigation operator – extracts /product/item_desc/text()
This NLJOIN operator is not really joining anything. Delivering documents to the XSCAN operator
FETCH documents from base table
Details of XSCAN operator (5)5) XSCAN : (XML Doc Navigation)
Arguments:INPUTXID: (Context Node) PRDDOCJN INPUT: (Join input leg) INNERXPATH : (Internal XPath Expression) ($INTERNAL_XMLTOXML_NIEO$(Q2.PRDDOC))/product/item_desc/(text())(:-->$C0:)
18© 2009 IBM Corporation Information Management
FAQ about Q2Q. How come the plan shows a NLJOIN? There is no join happening.A. True, there isn’t. This is a notation used to indicate that documents are beingpassed to XSCAN. The pictures shows how to think of what is happening. Imagine that the FETCH feeds the XSCAN.
RowsRETURN( 1)CostI/O|1
NLJOIN( 2)26.2359
3/-+--\1 1
FETCH XSCAN( 3) ( 5)18.1042 8.13169
2 1/---+---\1 19450
IXSCAN TABLE: SENGLERT( 4) PRODUCT9.9514 Q2
1|19450
INDEX: SYSIBMSQL090217203311130
Q2
RowsRETURN( 1)CostI/O|1
XSCAN( 2)8.13169
|1
FETCH( 3) 18.1042
2 /---+---\1 19450
IXSCAN TABLE: SENGLERT( 4) PRODUCT9.9514 Q2
1|19450
INDEX: SYSIBMSQL090217203311130
Q2
Actual Plan How to think about it – FETCH feeds XSCAN
Idea/pictures courtesy of M. Nicola
19© 2009 IBM Corporation Information Management
XQUERY that uses an XML index for one predicate (Q3)
xquery for $i in db2-fn:xmlcolumn('PRODUCT.PRDDOC') where $i/product/category = 54 return <result>{$i/product/item_desc}</result>;
RowsRETURN( 1)CostI/O|
30.7726FILTER( 2)
|32.0548NLJOIN( 3)
/--+--\34.1379 0.938979FETCH XSCAN( 4) ( 8)
/---+----\34.1379 19450RIDSCN TABLE: SENGLERT( 5) PRODUCT
|34.1379SORT( 6)
|34.1379XISCAN( 7)
|19450
XMLIN: SENGLERTDIM_PRODCATEGORYIDX
Q2
XISCAN: XML index scan on /product/category = 54
XSCAN – the navigation operator –extracts /product/item_desc/text() and rechecks /product/category = 54
SORT RIDs of rows with qualifying docs
Details of XSCAN operator 8:XPATH : (Internal XPath Expression)
Q2.PRDDOC/{(.[(product/category = 54)])(:-->$C0:),product/(item_desc)(:-->$C1:)}
Again, not a real nested loop join -Delivers documents for navigation.
20© 2009 IBM Corporation Information Management
FAQ about Q3
Q. Why is there a NLJOIN shown? This is not a join.► A. See FAQ for Q2
Q. The details for XSCAN operator (8) show that it does two things- extraction of /product/item_desc and re-evaluation of the /product/category predicate. Why do we need to re-evaluate the predicate? Hasn’t the index scan XISCAN operator (7) already returned only the rows with documents satisfying the predicate?► A. Good question! It turns out that there are some (rare) cases in which the
index can return documents that don’t satisfy the predicate (but it never misses any that do). So we are careful and plan a navigation to make sure that the predicate is really satisfied. However, a run-time optimization is able to avoid this “extra” navigation in many cases. Often, we are able to detect that we don’t need to do it.
21© 2009 IBM Corporation Information Management
RETURN( 1)
|0.067333FILTER( 2)
|0.0701385NLJOIN( 3)
/--+---\0.0747235 0.938641FETCH XSCAN( 4) ( 10)
/----+----\0.0747235 19450RIDSCN TABLE: SENGLERT( 5) PRODUCT
|0.0747235SORT( 6)
|0.0747235XANDOR( 7)
/--------+---------\34.1379 42.5735XISCAN XISCAN( 8) ( 9)
| |19450 19450
XMLIN: SENGLERT XMLIN: SENGLERTDIM_PRODCATEGORYIDX DIM_PRODSUBCATEGORYIDX
Q2 Q2
XQUERY with two predicates and two XML indexes (Q4)
xquery for $i in db2-fn:xmlcolumn('PRODUCT.PRDDOC')where $i/product/category = 54 and$i/product/sub_category = 3return <result>{$i/product/item_desc}</result>;
XISCAN: XML index scan on /product/sub_category = 3
XANDOR: XML index-anding. See slide 17. Output: RIDs that satisfy both XISCANs.
XISCAN: XML index scan on /product/category = 54
XSCAN navigation to extract item_descas well as to re-check predicates on category and sub_category
22© 2009 IBM Corporation Information Management
RETURN( 1)
|1
GRPBY( 2)
|24.0385^NLJOIN( 3)/-+--\
24.0385 1FETCH XSCAN( 4) ( 10)
/---+----\24.0385 19450RIDSCN TABLE: SENGLERT( 5) PRODUCT
|24.0385SORT( 6)
|24.0385IXAND( 7)
/-------+-------\70.9319 6591.52IXSCAN XISCAN( 8) ( 9)
| |19450 19450
INDEX: SYSIBM XMLIN: SENGLERTSQL090217203311130 DIM_PRODCATEGORYIDX
One XML index, one relational index (Q5)
explain plan forselect count(*)from productwhere xmlexists( '$PRDDOC/product/category[. < 10]')and prodkey between 30 and 100;!db2exfmt -d popsser -1 -t|more;
XISCAN: XML index scan on /product/category < 10
IXAND – index anding- can be used with a combination of XML and relational indexes. Allows range predicates, wildcards in XML expressions
XSCAN navigation to re-check predicate on category
23© 2009 IBM Corporation Information Management
For join queries: The DAILY_SALES table
One row per saleEach row has a foreign key “prodkey” that refers to our product tableRelational index on “prodkey”(others as well, but not used in our examples)One XML document per row, sample at right. Replicates keys of relational column, adds other data.XML indexes: /fact/keys/prodkey(type double)
CREATE TABLE "SENGLERT"."DAILY_SALES“("PERKEY" INTEGER NOT NULL ,"STOREKEY" INTEGER NOT NULL ,"CUSTKEY" INTEGER NOT NULL ,"PRODKEY" INTEGER NOT NULL"PROMOKEY" INTEGER NOT NULL ,"SALDOC" XML );
24© 2009 IBM Corporation Information Management
Relational join with XML predicate and extractions (Q6)
select px.sub_category, sx.shelf_number fromdaily_sales s, product p,xmltable('$SALDOC/fact/measures/shelf_number‘
columns shelf_number integer path '.')as sx,
xmltable('$PRDDOC/product[category < 150]' columns sub_category varchar(30) path 'sub_category')as px
where s.prodkey = p.prodkey;
For certain product categories, find sales of those categories and list their shelf numbersThere’s an XML index on /product/categoryp.prodkey and s.prodkey are (indexed) relational columns
25© 2009 IBM Corporation Information Management
RowsRETURN( 1)CostI/O|
2.20389e+07NLJOIN( 2)
/--+---\2.16407e+07 1.0184
HSJOIN XSCAN( 3) ( 11)
/-----+------\2.22907e+07 18882.9
TBSCAN NLJOIN( 4) ( 5)
| /-+--\2.22907e+07 9725 1.94168
TABLE: SENGLERT FETCH XSCANDAILY_SALES ( 6) ( 10)
/---+----\9725 19450
RIDSCN TABLE: SENGLERT( 7) PRODUCT
|9725
SORT( 8)
|9725
XISCAN( 9)
48|
19450XMLIN: SENGLERT
DIM_PRODCATEGORYIDX
XISCAN to evaluate /product[category < 150]
XSCAN navigation to re-evaluate /product[category < 150] and to extract /product/sub_category
HSJOIN (hash join) on s.prodkey = p.prodkey
XSCAN navigation to extract /fact/measures/shelf_number
Not real nested loop joinsNot real nested loop joins
Relational join with XML predicate and extractions (Q6)
26© 2009 IBM Corporation Information Management
The same join as Q6, with XML join keys (Q7)
Same query as before, except relational join predicate on prodkey has been replaced by XML join predicate inside XMLEXISTSRemember that $SALDOC/fact/keys/prodkey and $PRDDOC/product/prodkey have XML indexes
explain plan forselect px.sub_category, sx.shelf_number from daily_sales s, product p,xmltable('$SALDOC/fact/measures/shelf_number‘
columns shelf_number integer path '.')as sx,
xmltable('$PRDDOC/product[category < 150]' columns sub_category varchar(30) path 'sub_category')as px
where xmlexists('$SALDOC/fact/keys[prodkey = $PRDDOC/product/prodkey]');!db2exfmt -d popsser -1 -t|more;
27© 2009 IBM Corporation Information Management
Plan for the XML join query Q7RETURN( 1)
|1.72347e+07
NLJOIN( 2)
/------+------\18694.5 921.915NLJOIN TBSCAN( 3) ( 9)/-+--\ |
9725 1.92232 2.05329e+07FETCH XSCAN TEMP( 4) ( 8) ( 10)
/---+----\ |9725 19450 2.05329e+07
RIDSCN TABLE: SENGLERT NLJOIN( 5) PRODUCT ( 11)
| /---+---\9725 2.22907e+07 0.921143
SORT TBSCAN XSCAN( 6) ( 12) ( 13)
| |9725 2.22907e+07
XISCAN TABLE: SENGLERT( 7) DAILY_SALES
|19450
XMLIN: SENGLERTDIM_PRODCATEGORYIDX
This IS a real nested loop join! NLJOIN is the only option for XML joins.
XSCAN to extract /fact/keys/prodkey and /fact/measures/shelf_number
TBSCAN of DAILY_SALES table. Hmmmm… Why is the index on /fact/keys/prodkey not used?
TEMP of entire DAILY_SALES table. Why??
XISCAN to evaluate /product[category< 150]
XSCAN to retrieve product/prodkey
28© 2009 IBM Corporation Information Management
Q8: Corrected join from Q7 with casts around XML join keys!
For XML joins: need to cast both sides of the join predicate in order to enable use of the XML index(es)!Now it is possible to use index(es) on $SALDOC/fact/keys/prodkey and $PRDDOC/product/prodkey
explain plan forselect px.sub_category, sx.shelf_number from daily_sales s, product p,xmltable('$SALDOC/fact/measures/shelf_number'
columns shelf_number integer path '.') as sx,
xmltable('$PRDDOC/product[category < 150]' columns sub_category varchar(30) path 'sub_category')as px
where xmlexists('$SALDOC/fact/keys[prodkey/xs:double(.) = $PRDDOC/product/prodkey/xs:double(.)]');!db2exfmt -d popsser -1 -t|more;
29© 2009 IBM Corporation Information Management
Plan for the corrected XML join query (Q8)RETURN ( 1) Cost I/O |
3.83852e+08 NLJOIN ( 2)
/----------+----------\18694.5 20532.9 NLJOIN NLJOIN( 3) ( 9) /-+--\ /-+--\
9725 1.92232 753.287 27.2577 FETCH XSCAN FETCH XSCAN ( 4) ( 8) ( 10) ( 14)
/---+----\ /---+----\9725 19450 753.287 2.22907e+07
RIDSCN TABLE: SENGLERT RIDSCN TABLE: SENGLERT ( 5) PRODUCT ( 11) DAILY_SALES
| | 9725 753.287
SORT SORT48 5 | |
9725 753.287 XISCAN XISCAN( 7) ( 13)
| | 19450 2.22907e+07
XMLIN: SENGLERT XMLIN: SENGLERT DIM_PRODCATEGORYIDX PRODKEYIDX
This IS a real nested loop join! NLJOIN is the only option for XML joins.
XSCAN to extract /fact/measures/shelf_number and recheck /fact/keys/prodkeypredicate
Hooray, XISCAN of /fact/keys/prodkey index (join)
XISCAN to evaluate /product[category< 150]
XSCAN to retrieve product/prodkey
30© 2009 IBM Corporation Information Management
Things to remember when reading EXPLAINs
Start reading from the lower left corner – since that is (generally) where execution beginsIgnore the cost numbers► They are in units called timerons that correspond somewhat to
estimated elapsed time► They give clues about the optimizer’s expense estimates, but are of
little value to an outside observerDO watch the estimated row counts – they may help you understand the optimizer’s decisionsMost value in EXPLAINs:► Determining index use► Join order► Row count (“cardinality”) estimates
31© 2009 IBM Corporation Information Management
Further reading
►http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/bestpractices/DB2BP_Query_Tuning_0508I.pdf
►http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/bestpractices/DB2BP_XML_0508I.pdf
►http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/
►http://www.ibm.com/developerworks/data/library/techarticle/dm-0508kapoor/
►http://hoadb2ug.org/Docs/accessplans.pdf (old, but good)
32© 2009 IBM Corporation Information Management
Acknowledgements
Matthias Nicola – review and ideasWolfgang Krause – careful reviewAnjali Norwood – patient answers to random questions