Download - 14E - Achieving Good Performance with SQL on DB2 i€¦ · 14E – Achieving Good Performance with SQL Rob Bestgen [email protected] ... DB2 for i SQL request Optimize Open Run SQL

1

© 2017 IBM Corporation

IBM Power Systems

14E – Achieving Good Performancewith SQL

Rob [email protected] for i Consultant

© 2017 IBM Corporation2

Power Server + IBM i

• System viewed as a database server

• DB2 for i (integrated part of IBM i)

• Data Centric focus – support for business logic in the DBMS

• SQL (DDL and DML) as primary interface to DBMS

• GUI to DBMS via System i Navigator

DB2 for i

IBM

2


Single LevelStorage

QUERY

MEMORY

Storage Management

Table

Single

System

Multiple CPUs and cores

N-way

SMP

Power System + IBM i Architecture


OSSQL

IBM i Objects

schema

table

view

index

row

column

log

library

physical file

logical file

keyed logical file

record

field

journal

IBM

3


IBM i Objects

• Physical file ~= Table = Data Space

– Contains the actual data (rows and columns)

• Logical file ~= View = Virtual Table

– Does not contain actual data

– Definition of a “result set”

• Keyed Logical file ~= Index = Data Structure for look up

– Contains “keys” that reference or point to rows

– Data can be extracted from keys


DB2DB File (PF) object

CREATE TABLE(SQL DDL)

High

Level

Language

Native I/O

SQLEmbedded SQL

ODBC, JDBC

ADO.NET, CLI …

Command Language (CL)

One Database Management System with multiple interfaces

CRTPF

SELECT...

FROM...

(SQL DML)

IBM i Objects

IBM

4


SQL Performance Quiz

• When discussing SQL performance, we’re primarily talking aboutimproving SQL Query performance

– Reading and manipulating data

• Queries (a.k.a. DML) are the most common and most performancesensitive parts of SQL

– Frequently invoked

– Often involve large quantities of data

• Even defining objects in SQL often requires use of queries

– CREATE TABLE… AS (SELECT…..) WITH DATA

– CREATE VIEW….


Traditional Database Programming

How Long Do They Take?

Anatomy of an SQL query

What are the steps?

IBM

5


SQL Query Processing

With Native Record Level Access…

You tell DB2 what to do AND how to do it

With SQL…

You tell DB2 what to do, NOT how to do it

(though you give it options)


SQL Query Processing…

With Native Record Level Access…

You think in records

With SQL…

You should think in SETs

Combine steps!

IBM

6


DECLARE CURSOR cursor1 FORSELECT custid FROM order_tableWHERE ord_date = '2016/11/03';

OPEN cursor1;DOFETCH cursor1 INTO :v_custid;

SELECT cust_name, cust_addressINTO :v_name, :v_addressFROM cust_table WHERE custid= :v_custid;

/* Process customer data */UNTIL ( no more data );

CLOSE cursor1;

Combined SQL Request

Example – Combining SQL

Multiple SQL Statements

DECLARE CURSOR cursor1 FORSELECT c.cust_name, c.cust_addressFROM order_table o, cust_table cWHERE ord_date = '2016/11/03’ AND

o.custid = c.custid;

OPEN cursor1;DOFETCH cursor1 INTO :v_name, v_address;

/* Process customer data */UNTIL ( no more data );

CLOSE cursor1; Much betterapproach!


DB2 for i

SQL request Optimize RunOpen

SQL Query Processing – New Request

Time

IBM

7


The Optimizer

Provides the recipe

Provides the methods

Does no cooking

The Optimizer

Writes the best(?) program to fulfill your request

Optimization

TheProgrammer

in theBox


Access Plan = The output of query optimization

Contents:

• A control structure that contains information on the actionsnecessary to satisfy each SQL request

• These contents include:

– Access Method

– Info on associated tables and indexes

– Any applicable program and/or environment information

(Query) Access Plans

IBM

8


Server configuration

Server attributes

Version/Release/Modification

Level

SMP

Database design

Table sizes, number of rows

Views and Indexes (Radix, EVI)

Workmanagement

Static

Dynamic

Extended Dynamic

Interfaces

SQL Request

Job, Query attributes

Server performance

The Plan

Optimization... the intersection of various factors


Application Input to Optimizer and Engine

System ValuesCHGSYSVAL()

Job’s QueryAttributesCHGQRYA()

QueryOptimizationandExecution

QAQQINI file

SQL SETstatementSQL SELECT

statementclause

ConnectionAttributes

High LevelLanguage

Attributes

Examples

• Query Time Limit

• Parallel Degree

• Optimization Goal

• Optimizer Feedback

• Isolation Level

IBM

9


Cost Based Query Optimization

• The DB2 for i Optimizer performs "cost based" optimization

• "Cost" is defined as the estimated time it takes to run the request

• "Costing" refers to the comparison of a given set of algorithms andmethods in an attempt to identify the "fastest" plan

• Optimization is based on time estimate, not on resource utilization

• Usually the fastest plan is also the most resource efficient plan, but this isnot always true

• The goal of the optimizer is to eliminate I/O as early as possible byidentifying the best path to and through the data

• The optimizer has the ability and freedom to "rewrite" the query

Query Optimization


Query processing can be divided into three phases:Query Validation

– Validate the query request– Validate existing access plan– Builds internal query structures

Query Optimization

– Choose most efficient access method– Builds access plan

Query Execution

– Build the structures needed for query cursor– Build the structures for any temporary indexes (if needed)– Builds and activates query cursor (ODP)– Generate any information requested SQL Performance Monitor data Index Advised SQL Plan Cache Etc.

We can affect this...

Query Phases

We can use this...

IBM

10


• Many different data access methods can be used to satisfya query -- each with their own strengths and weaknesses

• What data access method should we use to find the rowsthat contain the color blue within a 100 million row DB table

...WHERE COLOR = ‘BLUE'...

• When...

– 1 row contains the color blue

– 1,000 rows contain the color blue

– 100,000 rows contain the color blue

– 100,000,000 rows contain the color blue

• How does the optimizer know which choice to make?

Most Efficient Data Access Method


Cost based optimization dictates that the fastest access methodfor a given table will vary based upon selectivity of the query

Number of rows searched / accessed

Few Many

Response

Time Table Scan

Low

High

Probe

Clustered andSkip Seq

Data Access Methods

IBM

11


Most Efficient Data Access Method

Optimizer uses the selectivity for the local selection and join predicates todetermine how many rows must be processed for each access methodconsidered

• The selectivity will always be calculated for each predicate

(e.g. WHERE column1 > 100) using:

– Default Sources (default filtering based upon the operator used)– Meta-Data Sources (existing indexes or column statistics)

• It's best to provide statistics for your most selective and least

selective columns


• With the correct statistics, cost-based query optimizersmore accurately estimate the number of rows to process– Better Estimates = Better Performing Plans

• All query optimizers rely on stats for plan decisions– DB2 for i relies on index stats and automatic column statistics

Statistics and Access Method Efficiency

IBM

12


• Meta-data sources

– Existing indexes (Radix or Encoded Vector)

More accurately describes multi-column key values

Stats available immediately as the index maintenance occurs

Selectivity estimates from radix by reading n keys

Selectivity from EVI by reading symbol table values

– Column Statistics

SQE only

Column Cardinality, Histograms & Frequent Values List

Constructed over a single column in a table

Stored internally as a part of the table object after created

Collected automatically by default for the system

Stats not immediately maintained as the table changes

Stats are refreshed as they become “stale” over time|

• Default

– No representation of actual values in columns – information is derived

Sources of Statistics

Best

Worst


• With an Index over column state

– Optimizer can get a very accurate estimate of the number ofrecords selected

• With Column Statistics Optimizer can get an average number of records selected per state value by

knowing the number of states (50), aka the cardinality

• With Default

– Optimizer guesses that 10% of the records in the orders tablewill be selected

Statistics example

Best

Worst

SQL statement:

SELECT * FROM ordersWHERE state = ‘NY’

IBM

13


A proper indexing strategy is the key to

success

More on indexing later


Optimization Goal

• Tells the optimizer how many rows you expect to fetch per transaction

• Optimizer builds a plan that is optimal for returning n or all rowsexpected

• Affects the query "start up" time and overall run time

Switching Gears - The Optimization Goal

IBM

14


Optimization Goal

First n rows

Next n rows

All rows

Time

First I/O All I/ORead by key

via an index

Read, select all

and sort

SELECT *FROM Big_TableORDER BY Col1

Ordering via:INDEX or SORT


The Optimization Goal

ApplicationBehavior

FETCH 10 rows

Wait

FETCH ALL rows

Wait

Result Set(1,000 rows)

Query plan for 10 rows

Query plan for 1,000 rows

DifferentPlans

IBM

15



ApplicationBehavior

FETCH 10 rows

Wait

FETCH ALL rows

Wait

Result Set(10 rows)



SamePlan


• Set as granular as SQL statement clause– OPTIMIZE FOR n ROWS– OPTIMIZE FOR ALL ROWS

• Set via QAQQINI options file -or- ODBC/JDBC connection attributes– *FIRSTIO– *ALLIO

• First I/O is the default for dynamic interfaces:– ODBC, JDBC, STRSQL, dynamic SQL in programs– Only a small number of rows expected to be read

• ALL I/O is the default otherwise– Extended dynamic, RUNSQLSTM, INSERT + subSELECT, CLI, static SQL in programs– All of result set expected to be read

• Optimization goal will affect the optimizer's decisions– Use of indexes, SMP, temporary intermediate results like hash tables– Tell the optimizer as much information as possible– If the application fetches the entire result set, use *ALLIO


IBM

16


Memory Pool

Application

Fetch…

Data

Memory“Fair Share”

Foot print willconstrict or expand

the flow of data tothe application

Sharing Resources

Another one – Fair Share of Memory


Many simultaneous queriesCorrespondingly small fair share

Few simultaneous queriesCorrespondingly large fair share

Memory Pool

Memory Pool

Sharing ResourcesFair Share of Memory

IBM

17


4 active queries / 15 expected

Memory Pool

In general: Fair share = Memory pool size / max active

Best practice: set max active for the pool to a realistic value and use shared pools set to *CALC

Anemic Plans

Fair Share of Memory


IndexesAdvised

SQE PlanCache

QueryOptimization

SQL request

Detailed

DB MonitorData

Visual

Explain

Query Optimization Feedback

SQE PlanCache

Snapshots

IBM

18



SQL Plan Cache

IBM

19


SQL Plan Cache – Show Statements

List is initially empty to allow for user filtering


IBM

20


Visual Explain

• Graphical representation of query plan

– Representation of the DB2 objects & data structures

– Representation of the methods and algorithms

– Associated environmental information

– Advice on indexes and column statistics

– Highlighting of specific query rewrites

– Highlighting of expensive methods

• Based on detailed optimizer information

– Detailed SQL Performance Monitor Data

– Plan Cache

• Plan Cache Snapshots

• Information for a Job (formerly Current SQL)


Visual Explain

IBM

21


Visual Explain


Visual Explain Recommendations

• Look for access methods that can perform inefficiently

– Table scans

• Size and repetition matter!

– Temporary indexes

– Hash tables

– CQE implementation (should be rare/non-existent starting with v7r2)

• Review key environmental settings

– Optimization Goal

– Memory pool size and fair share calculation

• Indexes and Statistics Advisor

IBM

22


Indexing


Index Advisor: System-wide

• System-wide Index Advisor

– Data is placed into a DB2 table (QSYS2/SYSIXADV)

• Note: each iASP has its own SYSIXADV table

– Autonomic

– No overhead

• Advice by System, or Schema, or Table

• Can create indexes directly from GUI

• Condensed advice

– Helps define fewer, more general purpose indexes

Remember: It is an ADVISOR, not a ‘DO THIS BLINDLY’!

IBM

23


Index Advisor – System-wide


Index Advisor – System-Wide

IBM

24


Index Advisor – From Visual Explain


Index Advisor - Condenser

Queries:

…WHERE YEAR = 2008 AND QUARTER = 1 AND COLOR = ‘BLUE;…WHERE YEAR = 2008 AND QUARTER = 1;…WHERE YEAR = 2008;

Index advice by query:

YEAR, QUARTER, COLORYEAR, QUARTERYEAR

Condensed advice:

YEAR, QUARTER, COLOR

Can be used for allthree queries

IBM

25


Index Advisor Recommendations

• Combine knowledge of your query, data, and application with IndexAdvice

• Sort advice based on “Times Index Advised for Query Use”

• Utilize Index Advisor Condenser to eliminate redundant indexes

• Create advised, temporary indexes (MTI) as permanent indexes

• After creating new indexes, clear index advice to better gaugeimpact of new indexes


What is the optimizer's job?

What is the optimizer's output?

What are some of the elements or attributes used for optimization?

Review

IBM

26


IBM DB2 for i Consulting and Services

Database modernization

Database Query modernization

DB2 Web Query

Database design, features and functions

DB2 SQL performance analysis and tuning

Data warehousing and Business Intelligence

DB2 for i education and training

Contact: Mike Cain [email protected]

IBM Systems and Technology Group

DB2 for i Consulting

Rochester, MN USA

Need DB2 help?


Thank you!

IBM

27


Trademarks and DisclaimersAdobe, Acrobat, PostScript and all Adobe-based trademarks are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, othercountries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registeredtrademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.

ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Cell Broadband Engine and Cell/B.E. are trademarks of Sony Computer Entertainment, Inc., in the United States, other countries, or both and are used under licensetherefrom.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actualenvironmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and doesnot constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information,including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, orany other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance,function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented hereto communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that anyuser will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storageconfiguration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvementsequivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in yourgeography.

IBM