1
© 2017 IBM Corporation
IBM Power Systems
14E – Achieving Good Performancewith SQL
Rob [email protected] for i Consultant
© 2017 IBM Corporation2
Power Server + IBM i
• System viewed as a database server
• DB2 for i (integrated part of IBM i)
• Data Centric focus – support for business logic in the DBMS
• SQL (DDL and DML) as primary interface to DBMS
• GUI to DBMS via System i Navigator
DB2 for i
IBM
2
© 2017 IBM Corporation3
Single LevelStorage
QUERY
MEMORY
Storage Management
Table
Single
System
Multiple CPUs and cores
N-way
SMP
Power System + IBM i Architecture
© 2017 IBM Corporation4
OSSQL
IBM i Objects
schema
table
view
index
row
column
log
library
physical file
logical file
keyed logical file
record
field
journal
IBM
3
© 2017 IBM Corporation5
IBM i Objects
• Physical file ~= Table = Data Space
– Contains the actual data (rows and columns)
• Logical file ~= View = Virtual Table
– Does not contain actual data
– Definition of a “result set”
• Keyed Logical file ~= Index = Data Structure for look up
– Contains “keys” that reference or point to rows
– Data can be extracted from keys
© 2017 IBM Corporation6
DB2DB File (PF) object
CREATE TABLE(SQL DDL)
High
Level
Language
Native I/O
SQLEmbedded SQL
ODBC, JDBC
ADO.NET, CLI …
Command Language (CL)
One Database Management System with multiple interfaces
CRTPF
SELECT...
FROM...
(SQL DML)
IBM i Objects
IBM
4
© 2017 IBM Corporation7
SQL Performance Quiz
• When discussing SQL performance, we’re primarily talking aboutimproving SQL Query performance
– Reading and manipulating data
• Queries (a.k.a. DML) are the most common and most performancesensitive parts of SQL
– Frequently invoked
– Often involve large quantities of data
• Even defining objects in SQL often requires use of queries
– CREATE TABLE… AS (SELECT…..) WITH DATA
– CREATE VIEW….
© 2017 IBM Corporation8
Traditional Database Programming
How Long Do They Take?
Anatomy of an SQL query
What are the steps?
IBM
5
© 2017 IBM Corporation9
SQL Query Processing
With Native Record Level Access…
You tell DB2 what to do AND how to do it
With SQL…
You tell DB2 what to do, NOT how to do it
(though you give it options)
© 2017 IBM Corporation10
SQL Query Processing…
With Native Record Level Access…
You think in records
With SQL…
You should think in SETs
Combine steps!
IBM
6
© 2017 IBM Corporation
DECLARE CURSOR cursor1 FORSELECT custid FROM order_tableWHERE ord_date = '2016/11/03';
OPEN cursor1;DOFETCH cursor1 INTO :v_custid;
SELECT cust_name, cust_addressINTO :v_name, :v_addressFROM cust_table WHERE custid= :v_custid;
/* Process customer data */UNTIL ( no more data );
CLOSE cursor1;
Combined SQL Request
Example – Combining SQL
Multiple SQL Statements
DECLARE CURSOR cursor1 FORSELECT c.cust_name, c.cust_addressFROM order_table o, cust_table cWHERE ord_date = '2016/11/03’ AND
o.custid = c.custid;
OPEN cursor1;DOFETCH cursor1 INTO :v_name, v_address;
/* Process customer data */UNTIL ( no more data );
CLOSE cursor1; Much betterapproach!
© 2017 IBM Corporation12
DB2 for i
SQL request Optimize RunOpen
SQL Query Processing – New Request
Time
IBM
7
© 2017 IBM Corporation13
The Optimizer
Provides the recipe
Provides the methods
Does no cooking
The Optimizer
Writes the best(?) program to fulfill your request
Optimization
TheProgrammer
in theBox
© 2017 IBM Corporation14
Access Plan = The output of query optimization
Contents:
• A control structure that contains information on the actionsnecessary to satisfy each SQL request
• These contents include:
– Access Method
– Info on associated tables and indexes
– Any applicable program and/or environment information
(Query) Access Plans
IBM
8
© 2017 IBM Corporation15
Server configuration
Server attributes
Version/Release/Modification
Level
SMP
Database design
Table sizes, number of rows
Views and Indexes (Radix, EVI)
Workmanagement
Static
Dynamic
Extended Dynamic
Interfaces
SQL Request
Job, Query attributes
Server performance
The Plan
Optimization... the intersection of various factors
© 2017 IBM Corporation16
Application Input to Optimizer and Engine
System ValuesCHGSYSVAL()
Job’s QueryAttributesCHGQRYA()
QueryOptimizationandExecution
QAQQINI file
SQL SETstatementSQL SELECT
statementclause
ConnectionAttributes
High LevelLanguage
Attributes
Examples
• Query Time Limit
• Parallel Degree
• Optimization Goal
• Optimizer Feedback
• Isolation Level
IBM
9
© 2017 IBM Corporation17
Cost Based Query Optimization
• The DB2 for i Optimizer performs "cost based" optimization
• "Cost" is defined as the estimated time it takes to run the request
• "Costing" refers to the comparison of a given set of algorithms andmethods in an attempt to identify the "fastest" plan
• Optimization is based on time estimate, not on resource utilization
• Usually the fastest plan is also the most resource efficient plan, but this isnot always true
• The goal of the optimizer is to eliminate I/O as early as possible byidentifying the best path to and through the data
• The optimizer has the ability and freedom to "rewrite" the query
Query Optimization
© 2017 IBM Corporation18
Query processing can be divided into three phases:Query Validation
– Validate the query request– Validate existing access plan– Builds internal query structures
Query Optimization
– Choose most efficient access method– Builds access plan
Query Execution
– Build the structures needed for query cursor– Build the structures for any temporary indexes (if needed)– Builds and activates query cursor (ODP)– Generate any information requested SQL Performance Monitor data Index Advised SQL Plan Cache Etc.
We can affect this...
Query Phases
We can use this...
IBM
10
© 2017 IBM Corporation19
• Many different data access methods can be used to satisfya query -- each with their own strengths and weaknesses
• What data access method should we use to find the rowsthat contain the color blue within a 100 million row DB table
...WHERE COLOR = ‘BLUE'...
• When...
– 1 row contains the color blue
– 1,000 rows contain the color blue
– 100,000 rows contain the color blue
– 100,000,000 rows contain the color blue
• How does the optimizer know which choice to make?
Most Efficient Data Access Method
© 2017 IBM Corporation20
Cost based optimization dictates that the fastest access methodfor a given table will vary based upon selectivity of the query
Number of rows searched / accessed
Few Many
Response
Time Table Scan
Low
High
Probe
Clustered andSkip Seq
Data Access Methods
IBM
11
© 2017 IBM Corporation21
Most Efficient Data Access Method
Optimizer uses the selectivity for the local selection and join predicates todetermine how many rows must be processed for each access methodconsidered
• The selectivity will always be calculated for each predicate
(e.g. WHERE column1 > 100) using:
– Default Sources (default filtering based upon the operator used)– Meta-Data Sources (existing indexes or column statistics)
• It's best to provide statistics for your most selective and least
selective columns
© 2017 IBM Corporation22
• With the correct statistics, cost-based query optimizersmore accurately estimate the number of rows to process– Better Estimates = Better Performing Plans
• All query optimizers rely on stats for plan decisions– DB2 for i relies on index stats and automatic column statistics
Statistics and Access Method Efficiency
IBM
12
© 2017 IBM Corporation23
• Meta-data sources
– Existing indexes (Radix or Encoded Vector)
More accurately describes multi-column key values
Stats available immediately as the index maintenance occurs
Selectivity estimates from radix by reading n keys
Selectivity from EVI by reading symbol table values
– Column Statistics
SQE only
Column Cardinality, Histograms & Frequent Values List
Constructed over a single column in a table
Stored internally as a part of the table object after created
Collected automatically by default for the system
Stats not immediately maintained as the table changes
Stats are refreshed as they become “stale” over time|
• Default
– No representation of actual values in columns – information is derived
Sources of Statistics
Best
Worst
© 2017 IBM Corporation24
• With an Index over column state
– Optimizer can get a very accurate estimate of the number ofrecords selected
• With Column Statistics Optimizer can get an average number of records selected per state value by
knowing the number of states (50), aka the cardinality
• With Default
– Optimizer guesses that 10% of the records in the orders tablewill be selected
Statistics example
Best
Worst
SQL statement:
SELECT * FROM ordersWHERE state = ‘NY’
IBM
13
© 2017 IBM Corporation25
A proper indexing strategy is the key to
success
More on indexing later
© 2017 IBM Corporation26
Optimization Goal
• Tells the optimizer how many rows you expect to fetch per transaction
• Optimizer builds a plan that is optimal for returning n or all rowsexpected
• Affects the query "start up" time and overall run time
Switching Gears - The Optimization Goal
IBM
14
© 2017 IBM Corporation27
Optimization Goal
First n rows
Next n rows
All rows
Time
First I/O All I/ORead by key
via an index
Read, select all
and sort
SELECT *FROM Big_TableORDER BY Col1
Ordering via:INDEX or SORT
© 2017 IBM Corporation28
The Optimization Goal
ApplicationBehavior
FETCH 10 rows
Wait
FETCH ALL rows
Wait
Result Set(1,000 rows)
Query plan for 10 rows
Query plan for 1,000 rows
DifferentPlans
IBM
15
© 2017 IBM Corporation29
The Optimization Goal
ApplicationBehavior
FETCH 10 rows
Wait
FETCH ALL rows
Wait
Result Set(10 rows)
Query plan for 10 rows
Query plan for 10 rows
SamePlan
© 2017 IBM Corporation30
• Set as granular as SQL statement clause– OPTIMIZE FOR n ROWS– OPTIMIZE FOR ALL ROWS
• Set via QAQQINI options file -or- ODBC/JDBC connection attributes– *FIRSTIO– *ALLIO
• First I/O is the default for dynamic interfaces:– ODBC, JDBC, STRSQL, dynamic SQL in programs– Only a small number of rows expected to be read
• ALL I/O is the default otherwise– Extended dynamic, RUNSQLSTM, INSERT + subSELECT, CLI, static SQL in programs– All of result set expected to be read
• Optimization goal will affect the optimizer's decisions– Use of indexes, SMP, temporary intermediate results like hash tables– Tell the optimizer as much information as possible– If the application fetches the entire result set, use *ALLIO
The Optimization Goal
IBM
16
© 2017 IBM Corporation31
Memory Pool
Application
Fetch…
Data
Memory“Fair Share”
Foot print willconstrict or expand
the flow of data tothe application
Sharing Resources
Another one – Fair Share of Memory
© 2017 IBM Corporation32
Many simultaneous queriesCorrespondingly small fair share
Few simultaneous queriesCorrespondingly large fair share
Memory Pool
Memory Pool
Sharing ResourcesFair Share of Memory
IBM
17
© 2017 IBM Corporation33
4 active queries / 15 expected
Memory Pool
In general: Fair share = Memory pool size / max active
Best practice: set max active for the pool to a realistic value and use shared pools set to *CALC
Anemic Plans
Fair Share of Memory
© 2017 IBM Corporation34
IndexesAdvised
SQE PlanCache
QueryOptimization
SQL request
Detailed
DB MonitorData
Visual
Explain
Query Optimization Feedback
SQE PlanCache
Snapshots
IBM
18
© 2017 IBM Corporation
© 2017 IBM Corporation36
SQL Plan Cache
IBM
19
© 2017 IBM Corporation37
SQL Plan Cache – Show Statements
List is initially empty to allow for user filtering
© 2017 IBM Corporation
IBM
20
© 2017 IBM Corporation39
Visual Explain
• Graphical representation of query plan
– Representation of the DB2 objects & data structures
– Representation of the methods and algorithms
– Associated environmental information
– Advice on indexes and column statistics
– Highlighting of specific query rewrites
– Highlighting of expensive methods
• Based on detailed optimizer information
– Detailed SQL Performance Monitor Data
– Plan Cache
• Plan Cache Snapshots
• Information for a Job (formerly Current SQL)
© 2017 IBM Corporation40
Visual Explain
IBM
21
© 2017 IBM Corporation41
Visual Explain
© 2017 IBM Corporation42
Visual Explain Recommendations
• Look for access methods that can perform inefficiently
– Table scans
• Size and repetition matter!
– Temporary indexes
– Hash tables
– CQE implementation (should be rare/non-existent starting with v7r2)
• Review key environmental settings
– Optimization Goal
– Memory pool size and fair share calculation
• Indexes and Statistics Advisor
IBM
22
© 2017 IBM Corporation43
Indexing
© 2017 IBM Corporation44
Index Advisor: System-wide
• System-wide Index Advisor
– Data is placed into a DB2 table (QSYS2/SYSIXADV)
• Note: each iASP has its own SYSIXADV table
– Autonomic
– No overhead
• Advice by System, or Schema, or Table
• Can create indexes directly from GUI
• Condensed advice
– Helps define fewer, more general purpose indexes
Remember: It is an ADVISOR, not a ‘DO THIS BLINDLY’!
IBM
23
© 2017 IBM Corporation45
Index Advisor – System-wide
© 2017 IBM Corporation46
Index Advisor – System-Wide
IBM
24
© 2017 IBM Corporation
Index Advisor – From Visual Explain
© 2017 IBM Corporation48
Index Advisor - Condenser
Queries:
…WHERE YEAR = 2008 AND QUARTER = 1 AND COLOR = ‘BLUE;…WHERE YEAR = 2008 AND QUARTER = 1;…WHERE YEAR = 2008;
Index advice by query:
YEAR, QUARTER, COLORYEAR, QUARTERYEAR
Condensed advice:
YEAR, QUARTER, COLOR
Can be used for allthree queries
IBM
25
© 2017 IBM Corporation49
Index Advisor Recommendations
• Combine knowledge of your query, data, and application with IndexAdvice
• Sort advice based on “Times Index Advised for Query Use”
• Utilize Index Advisor Condenser to eliminate redundant indexes
• Create advised, temporary indexes (MTI) as permanent indexes
• After creating new indexes, clear index advice to better gaugeimpact of new indexes
© 2017 IBM Corporation50
What is the optimizer's job?
What is the optimizer's output?
What are some of the elements or attributes used for optimization?
Review
IBM
26
© 2017 IBM Corporation51
IBM DB2 for i Consulting and Services
Database modernization
Database Query modernization
DB2 Web Query
Database design, features and functions
DB2 SQL performance analysis and tuning
Data warehousing and Business Intelligence
DB2 for i education and training
Contact: Mike Cain [email protected]
IBM Systems and Technology Group
DB2 for i Consulting
Rochester, MN USA
Need DB2 help?
© 2017 IBM Corporation52
Thank you!
IBM
27
© 2017 IBM Corporation53
Trademarks and DisclaimersAdobe, Acrobat, PostScript and all Adobe-based trademarks are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, othercountries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registeredtrademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.
ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Cell Broadband Engine and Cell/B.E. are trademarks of Sony Computer Entertainment, Inc., in the United States, other countries, or both and are used under licensetherefrom.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actualenvironmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and doesnot constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information,including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, orany other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance,function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented hereto communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that anyuser will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storageconfiguration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvementsequivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in yourgeography.
IBM