Date post: | 26-Mar-2015 |
Category: |
Documents |
Upload: | julia-reed |
View: | 220 times |
Download: | 2 times |
You’re Smarter than a Database
Overcoming the optimizer’s bad cardinality estimates
About me
Bobby Durrett US Foodservice [email protected] Scripts in
http://www.geocities.com/bobbyandmarielle/sqltuning.zip
What you know
What the database knows
Before SQL Example - mainframe Datacom/DB COBOL
List index names Write loops
read a from one index i1 where one.c=10
while more table one rows exist get next row
read b from two index i2 where two.a = one.a
while more table two rows exist get next row
print one.a,two.b
end while
end while
SQL Tell what you want, not how to get it
select one.a,two.b
from
one,two
where
one.c=10 and one.a=two.a;
Pre-SQL versus SQL
Pre-SQL code very efficient – runs in megabytes – VSE mainframe COBOL
Labor intensive SQL can be inefficient – runs in
gigabytes (if you are lucky!) Much more productive – do in
minutes what took hours before – create tables
What the database doesn’t know
Optimizer has a limited set of statistics that describe the data
It can miscalculate the number of rows a query will return, its cardinality
A cardinality error can lead optimizer to choose a slow way to run the SQL
Example plan/Cardinality-------------------------------------------------| Id | Operation | Name | Rows | Cost-------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 3|* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3------------------------------------------------- Plan = how Oracle will run your query Rows = how many rows optimizer thinks
that step will return Cost = estimate of time query will take, a
function of the number of rows
How to fix cardinality problems Find out if it really is a cardinality
issue Determine the reason it occurred
Single column Multiple columns
Choose a strategy Give the optimizer more information Override optimizer decision Change the application
Four examples
Four examples of how the optimizer calculates cardinality
Full scripts and their outputs on portal, pieces on slides – edited for simplicity
Step 1: Find out if it really is a cardinality issue
Example 1 Data
select a,count(*) from test1 group by a;
A COUNT(*) ---------- ---------- 1 10
Query
select * from test1 where a=1;
Step 1: Find out if it really is a cardinality issue
Get estimated cardinality from plan-------------------------------------------| Id | Operation | Name | Rows |-------------------------------------------| 0 | SELECT STATEMENT | | 10 ||* 1 | TABLE ACCESS FULL| TEST1 | 10 |-------------------------------------------
Do query for actual number of rowsselect count(*) from test1 where a=1;
Step 1: Find out if it really is a cardinality issue
Plan is a tree – find cardinality and select count(*) on part of query represented by that part of plan. join
table join
table table
Step 2: Understand the reason for the wrong cardinality
Unequal distribution of data: Within a single column
Last name “Smith” or “Jones”
Among multiple columns – Address Zipcode and State
Step 2: Understand the reason for the wrong cardinality
Example 2 - Unequal distribution of values in a single column 1,000,000 rows with value 1 1 row with value 2
select a,count(*) from TEST2 group by a;
A COUNT(*)
---------- ----------
1 1000000
2 1
Step 2: Understand the reason for the wrong cardinality
SQL statement – returns one row
select * from TEST2 where a=2;
Step 2: Understand the reason for the wrong cardinality
Plan with wrong number of rows = 500,000 Full scan instead of range scan – 100 times
slower
---------------------------------------------| Operation | Name | Rows |---------------------------------------------| SELECT STATEMENT | | 500K|| INDEX FAST FULL SCAN| TEST2INDEX | 500K|---------------------------------------------
Step 2: Understand the reason for the wrong cardinality
Column statistics – two distinct values
LOW HIGH NUM_DISTINCT---------- ---------- ------------ 1 2 2
Table statistic – total # of rows – 1,000,001
NUM_ROWS---------- 1000001
Step 2: Understand the reason for the wrong cardinality
Rows in plan = (rows in table)/ (distinct values of column)
500000=1000001/2 Optimizer knew that there were
only two values – assumed they had equal number of rows
Step 2: Understand the reason for the wrong cardinality
Example 3 - Combinations of column values not equally distributed 1,000,000 rows with values 1,1 1,000,000 rows with values 2,2 1 row with value 1,2 ~ Equal numbers of 1’s and 2’s in each
column A B COUNT(*)---------- ---------- ----------
1 1 1000000
1 2 1
2 2 1000000
Step 2: Understand the reason for the wrong cardinality
SQL statement – retrieves one row
select sum(a+b)
from TEST3
where
a=1 and b=2;
Step 2: Understand the reason for the wrong cardinality
Plan with wrong number of rows = 500,000 Inefficient full scan
----------------------------------------------
| Operation | Name | Rows |
----------------------------------------------
| SELECT STATEMENT | | 1 |
| SORT AGGREGATE | | 1 |
| INDEX FAST FULL SCAN| TEST3INDEX | 500K|
----------------------------------------------
Step 2: Understand the reason for the wrong cardinality
Column statistics
C LOW HIGH NUM_DISTINCT- ---------- ---------- ------------A 1 2 2B 1 2 2
Table statistic – total # of rows – 2,000,001
NUM_ROWS---------- 2000001
Step 2: Understand the reason for the wrong cardinality
Rows in plan = (rows in table)/ (distinct values A * distinct values B)
500000=2000001/(2 * 2) Optimizer assumes all four
combinations (1,1),(1,2),(2,1),(2,2) equally likely
Step 2: Understand the reason for the wrong cardinality
How to tell which assumption is in play? Select count(*) each column
select a,count(*) from TEST3 group by a;
select b,count(*) from TEST3 group by b;
count(*) each column combination
select a,b,count(*) from TEST3
group by a,b;
Step 3: Choose the best strategy for fixing the cardinality problem
Giving the optimizer more information Histograms SQL Profiles
Overriding optimizer decisions Hints
Changing the application
Try to use optimizer as much as possible to minimize development work
Step 3: Choose the best strategy for fixing the cardinality problem
Giving the optimizer more information – using histograms Works for unequal distribution within
a single column A histogram records the distribution
of values within a column in up to 254 “buckets”
Works best on columns with fewer than 255 distinct values
Step 3: Choose the best strategy for fixing the cardinality problem
Run gather_table_stats command to get histograms on the column – 254 is max number of buckets
method_opt=>'FOR ALL COLUMNS SIZE 254'
Step 3: Choose the best strategy for fixing the cardinality problem
Plan for Example 2 with correct number of rows with histogram Uses range scan-----------------------------------------
| Operation | Name | Rows |
-----------------------------------------
| SELECT STATEMENT | | 1 |
| INDEX RANGE SCAN| TEST2INDEX | 1 |
-----------------------------------------
Step 3: Choose the best strategy for fixing the cardinality problem
Column statistics – two buckets
LOW HIGH NUM_DISTINCT NUM_BUCKETS---------- ---------- ------------ ----------- 1 2 2 2
Table statistic – unchanged
NUM_ROWS---------- 1000001
Step 3: Choose the best strategy for fixing the cardinality problem
Time without histograms (1 second):
Elapsed: 00:00:01.00
Time with histograms(1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy for fixing the cardinality problem
Giving the optimizer more information – using SQL Profiles Works for unequal distribution among
multiple columns Includes information about the relationship
between columns in the SQL – correlated columns or predicates
Step 3: Choose the best strategy for fixing the cardinality problem
SQL Tuning Advisor gathers statistics on the columns
...DBMS_SQLTUNE.CREATE_TUNING_TASK(...
...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...
Accept the SQL Profile it creates to use the new statistics
...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...
Step 3: Choose the best strategy for fixing the cardinality problem
Example 3 plan with correct number of rows = 1 using SQL profile
--------------------------------------------------| Operation | Name | Rows | Bytes |--------------------------------------------------| SELECT STATEMENT | | 1 | 6 || SORT AGGREGATE | | 1 | 6 || INDEX RANGE SCAN| TEST3INDEX | 1 | 6 |-------------------------------------------------|
Step 3: Choose the best strategy for fixing the cardinality problem
Time without a profile (1 second):
Elapsed: 00:00:01.09
Time with a profile(1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy for fixing the cardinality problem
Overriding optimizer decisions – using hints Example 4 has unequal distribution of
column values across two tables – histograms and SQL Profiles don’t work
Hint forces index range scan Small amount of additional code – not like
Cobol on mainframe
Step 3: Choose the best strategy for fixing the cardinality problem
Example 4 - SMALL table MANY relates to 1 – there are many rows
with value 1 FEW relates to 2 – there are few with value 2
insert into SMALL values ('MANY',1);
insert into SMALL values ('FEW',2);
Step 3: Choose the best strategy for fixing the cardinality problem
Example 4 - LARGE table: 1,000,000 rows with value 1 1 row with value 2 NUM COUNT(*)---------- ----------
1 1000000
2 1
Step 3: Choose the best strategy for fixing the cardinality problem
SQL statement – returns one row
select B.NUM
from SMALL A,LARGE B
where
A.NUM=B.NUM and
A.NAME='FEW';
Step 3: Choose the best strategy for fixing the cardinality problem
Plan with wrong number of rows = 125,000
----------------------------------------------| Operation | Name | Rows |----------------------------------------------| SELECT STATEMENT | | 125K|| HASH JOIN | | 125K|| TABLE ACCESS FULL | SMALL | 1 || INDEX FAST FULL SCAN| LARGEINDEX | 1000K|----------------------------------------------
Step 3: Choose the best strategy for fixing the cardinality problem
Column statistics – two buckets on all columns – using histograms
LOW HIGH NUM_DISTINCT NUM_BUCKETS---------- ---------- ------------ -----------
1 2 2 2
LOW HIGH NUM_DISTINCT NUM_BUCKETS
---- ---- ------------ -----------
FEW MANY 2 2
Step 3: Choose the best strategy for fixing the cardinality problem
Table statistics – SMALL has 2 rows, LARGE 1000001
NUM_ROWS----------
2
NUM_ROWS----------
1000001
Step 3: Choose the best strategy for fixing the cardinality problem
125000=1000001/8 Optimizer appears to assume all
eight combinations of the three columns’ values are equally likely
Can’t verify formula – references don’t include formula with histograms
Even worse without histograms – cardinality is 500000
Step 3: Choose the best strategy for fixing the cardinality problem
No SQL profile from SQL Tuning Advisor:
There are no recommendations to improve the statement.
Neither histograms nor SQL profiles help example 4
Step 3: Choose the best strategy for fixing the cardinality problem
Statement with hints: Use index Don’t do full scan
select /*+ INDEX(B LARGEINDEX) NO_INDEX_FFS(B LARGEINDEX) */ B.NUMfrom SMALL A,LARGE Bwherea.NUM=B.NUM andA.NAME='FEW';
Step 3: Choose the best strategy for fixing the cardinality problem
Time without a hint (1 second):
Elapsed: 00:00:01.03
Time with a hint (1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy for fixing the cardinality problem
Changing the application Change your tables so that the optimizer gets
your SQL’s cardinality right Requires more work designing tables, but
keeps productivity benefits of SQL
Step 3: Choose the best strategy for fixing the cardinality problem
Example 4 – moved NAME column to LARGE table and split table in two One million (‘MANY’,1) rows in LARGEA One (‘FEW’,2) row in LARGEB Query:
select NUMfrom (select * from largeaunionselect * from largeb)whereNAME='FEW';
Step 3: Choose the best strategy for fixing the cardinality problem
Plan is just as efficient as with hint: Number of rows = 2 (reality is 1) Range Scan
--------------------------------------------------------------| Id | Operation | Name | Rows |--------------------------------------------------------------| 0 | SELECT STATEMENT | | 2 || 1 | VIEW | | 2 || 2 | SORT UNIQUE | | 2 || 3 | UNION-ALL | | || 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 ||* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 || 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 ||* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 |--------------------------------------------------------------
Step 3: Choose the best strategy for fixing the cardinality problem
Time without table change (1 second):
Elapsed: 00:00:01.03
Time with table change (1/100th second):
Elapsed: 00:00:00.01
Conclusion
SQL improves productivity, optimizer has limits Identify cases where cardinality is wrong Understand why the database got it wrong
One column Multiple columns
Choose best strategy to fix Give optimizer more info Override optimizer’s choices Redesign tables
References Cost Based Optimizer Fundamentals, Jonathan Lewis
Metalink Note:212809.1, Limitations of the Oracle Cost Based Optimizer
Metalink Note:68992.1, Predicate Selectivity
Histograms – Myths and Facts, Wolfgang Breitling Select Journal, Volume 13, Number 3