Date post: | 18-Apr-2015 |
Category: |
Documents |
Upload: | prakash6849 |
View: | 109 times |
Download: | 7 times |
© 2012 IBM Corporation
Information Management
Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query ParallelismJohn Hornibrook IBM Canada
© 2012 IBM Corporation2
Information Management
2
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
© 2012 IBM Corporation3
Information Management
Agenda
New DB2 10.1 features
Star schema query optimization–Zig-zag join
Multi-core query parallelism–Intra-partition query parallelism–Existing functionality, significantly improved
© 2012 IBM Corporation4
Information Management
Star Schema Query Optimization
Provides improved performance for ‘star schema’ queries
Star schemas are typically found in data marts or some data warehouses
Introduces new star schema join method: (zig-zag join)–Complimentary to existing star schema join methods
Improves existing star schema detection algorithms–Supports wider range of queries
© 2012 IBM Corporation5
Information Management
custkey
name
address
promokey
promotype
promodesc
perkey
year
month
prodkey
category
upc_number
storekey
storenumber
region
ProductCustomer
Promotion
perkey
prodkey
storekey
promokey
custkey
quantity_sold
price
cost
Period
Store
•Logical DB design resembles a star•Central table contains business ‘facts’
•Sales prices, cost, quantities, etc.
•Surrounding tables contain ‘dimensional’ data
•Time, location,
characteristics, etc.•Each dimension is a ‘parent’ of the fact table
•1:N from a dimension to the fact
Daily Sales
Star Schemas
© 2012 IBM Corporation6
Information Management
Star joins
Queries performed against star schemas
SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST)
FROM PERIOD, DAILY_SALES, PRODUCT, STORE
WHERE
PERIOD.PERKEY=DAILY_SALES.PERKEY AND
PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND
STORE.STOREKEY=DAILY_SALES.STOREKEY AND
CALENDAR_DATE BETWEEN AND
'01/01/2005' AND '04/28/2005' AND
STORE_NUMBER='03' AND
CATEGORY=72
GROUP BY ITEM_DESC
Aggregate on dimension attribute,
sum on fact measures
Join fact to some subset of the dimensions
Join fact foreign keys todimension primary keys
Constrain on dimension attributes
© 2012 IBM Corporation7
Information Management
Star join dilemma
No single dimension may filter the fact table well
But a combination of dimensions may filter well– 840,000 category 72 products sold during Jan. to April 2005– 53,000 category 72 products sold during Jan. to April 2005 in store #3
How do we filter with a combination of dimensions?
Daily Sales
750M rowsProduct
Period
Store
50M
20M
30M
CALENDAR_DATE BETWEEN
'01/01/2005' AND '04/28/2005'
' STORE_NUMBER='03'
CATEGORY=72
© 2012 IBM Corporation8
Information Management
Star join solutions
Specialized join methods (pre-DB2 10.1):–Semi-join with index ANDing
• Use combinations of fact table indexes to avoid accessing data pages
–Hub join• Use Cartesian product of dimension rows to provide better
fact keys–Query must meet star join criteria–Both methods can be built and competed–Regular join plans are still built and competed with either
method• Costing decides• Specialized methods aren’t always the winners
© 2012 IBM Corporation9
Information Management
Produce filtered fact table (Daily_Sales) with foreign key indices
ƒExecute "semi-join" with each dimension that filters the fact table
ƒ"AND" RID-maps from each semi-join with next semi-join
ƒRetrieve fact table columns via RIDsFETCH
Daily Sales
rid bitmap –> each semi-join eliminates bits ->
101101101001111011011001
semi-join semi-join semi-join
ProductDaily Sales Daily Sales Daily Sales
Period Store
100100101001001010011001 000100001000000010001000
NLJOIN NLJOINNLJOIN
Semi-join index ANDing star join
© 2012 IBM Corporation10
Information Management
Hub star join
Form a Cartesian join of filtering dimensions– Cartesian join -> no join predicates– Cartesian join result should be small to be effective
Join the Cartesian join result to the fact table using a multi-column fact table index.
Product
Daily Sales
Store
PeriodPRODKEY1020
STOREKEY3040
PRODKEY STOREKEY10 3020 3010 4020 40
PERIODKEY50
PRODKEY STOREKEY PERIODKEY10 30 5020 30 5010 40 5020 40 50
Probe fact table with multi-column index on:
PRODKEY,STOREKEY,PERIODKEY
NLJOIN
NLJOIN
NLJOIN
© 2012 IBM Corporation11
Information Management
Hub star join
Works well if Cartesian result is small
Cartesian may contain many key combinations that don’t exist in the fact table– Results in unnecessary fact table index probes.
PRODKEY STOREKEY PERIODKEY10 30 5020 30 5010 40 5020 40 5010 30 6020 30 6010 40 6020 40 60 10 30 7020 30 7010 40 7020 40 70
Daily Sales
NLJOIN
© 2012 IBM Corporation12
Information Management
Introduces a new zigzag join method that builds upon the zigzag join technology available in Redbrick that has proven unique performance advantagein the industry.
Provides consistent performance for warehouse queries.
Adds a new star detection method that is more reliable.
Supports star schema queries in single and multiple subject areas with snowflakes.
Exploits indexes even when there is a gap in probing key, reducing the number of indexes that need to be created.
Works seamlessly for range partitioned tables and in serial, SMP and DPFenvironments.
Can use MDC block indexes on the fact table for enabling zigzag join.
Recommends multi-column indexes to enable zigzag join through explain diagnostics and index advisor in Optim Query Tuner (OQT)
DB2 10.1 Star Schema Highlights
© 2012 IBM Corporation13
Information Management
DB2 (pre-10.1) recognizes a star
–By analysis of sizes of tables and join predicates.
–A star is detected after application of local filtering and snowflake joins.
The New Star Detection in DB2 10.1:
–Only requirement: joining dimension column(s) must be unique
–Detects multiple stars per query block
–Allows a star to be detected with fewer restrictions
–Much more reliable
–The new star detection method also enables pre-DB2 10.1 star schema plans.
–Pre-DB2 10.1 detection is invoked if the new star detection fails to detect any star.
Enhancing the star detection in DB2 pre-10.1
New
© 2012 IBM Corporation14
Information Management
Comparison of old and new star detection methods:
Star can be formed in the query block in the presence of these features and may includethe feature in the star
Star can not be formed in the query block in the presence of this SQL feature.
Non-deterministic or side-effect predicates
5
UnlimitedOneNumber of fact tables allowed4
Can be included in the star.
Excluded from the star.
Derived (non-base) tables10
Simple XML predicates9
Correlation among tables in a snowflake
8
Sub-query predicates7
Non-equijoin predicates6
Used by the Zigzag join plan, if available.
Used by the Cartesian Hub plan, if available.
Multi-column index on fact table3
Necessary to form a star.Necessary to form a star.
Minimum of two equijoin predicates
2
Necessary to form a star.Necessary to form a star.
Minimum of three base tables1
DB2 10.1Before DB2 10.1Requirement/RestrictionNo.
© 2012 IBM Corporation15
Information Management
NewThe new zigzag join method for star schema based queriesHow does it work?
–First forms the virtual Cartesian product of dimensions.
–Avoids most non-productive probes from the Cartesian product into the fact table.
–Fact table index provides feedback to dimensions.
–zigzags through the dimensions and the fact table.
Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.
© 2012 IBM Corporation16
Information Management
Using a multi-column index in a zigzag join
Pre-requisite – Columns that participate in the join are included in the index – Index columns from at least two dimension tables are completely covered by join
predicates
Consider this star schema based query:– D1 has primary key A– D2 has a composite primary key (B,C)– D3 has primary key D– These PK columns are used in
equi-join operations with the fact table
FactD1(A)
A D2(B,C)
B,C
D3(D)
D
The columns B and C in the composite index are not in contiguous positions in the index.
The index does not completely cover the dimension D2.
The index completely covers three dimensions.
The index completely covers two dimensions.
Why?
NONOYESYESQualified?
(B,A,C)(A,B), (C,D)(A,B,C,D), (A,C,B,D)
(A,D), (A,B,C),(B,C,D),(C,B,D)
Fact table index definition
© 2012 IBM Corporation17
Information Management
Zigzag join with index key gap processing Gap processing allows a single multi-column index to be used for a bigger set of
queries.
Greatly reduces the number of fact table indexes
E.g., a fact table index on (A, C, B) allows zigzag join when there is no join on C
Gap processing is implemented using new jump scan technology
Explain facility indicates when gap processing is used– New JUMPSCAN argument on IXSCAN operator– Gap columns identified
FactD1(A)
A D2(B)
B
Gap Info: Status--------------------- ---------------------Index Column 0: No Gap Index Column 1: Positioning Gap Index Column 2: No Gap
Arguments:--------------JUMPSCAN: (JumpScan Plan)
TRUE
© 2012 IBM Corporation18
Information Management
Multi-column index recommendations
New explain diagnostic message recommending multi-column fact table indexes
The optimizer performs analysis of primary/unique keys and equi-join predicates in the query that and detects that:
– the query is based on a star schema and – a multi-column index does not exist or a different multi-column index might provide better
performance
Extended Diagnostic Information:------------------------------------------------Diagnostic Identifier: 1Diagnostic Details: EXP0256I Analysis of the query shows that the
query might execute faster if an additional indexwas created. Schema name: "STAR". Table name:"FACT". Column list: "(F3, F2, F1, F0)".
Optim Query Tuner provides a workload based index advisor that uses the above feature to determine a consolidated set of index recommendations.
© 2012 IBM Corporation19
Information Management
ZZJOIN(1)
TEMP
TBSCAN
plan for
snowflake 1
TEMP
TBSCAN
plan for
snowflake 2
access plan
for fact table
SORT
FETCH
RIDSCAN
ZZJOIN(2)
Snowflake plans could either be:1) Access of a single table or2) Joins of multiple tables
Could be one of the following:1) Index scan2) Single-probe list-prefetch3) All-probes list-prefetch
Builds either:1) Index over temp or2) Fast integer sort
Scans either:1) Index over temp
or2) Fast integer sort
array
Performs the zigzag join operation1) Last leg is the fact table2) Preceding legs are dimensions
Performs data prefetch of the fact tablefor an all-probes List-Prefetch.
Understanding ZZJOIN plan components
Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.
© 2012 IBM Corporation20
Information Management
Accessing a dimension in a zigzag join plan
A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.
ZZJOIN(1)
TEMP
TBSCAN
plan for
snowflake 1
TEMP
TBSCAN
plan for
snowflake 2
access plan
for fact table
The operator shows the following information (new operator argument):
RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).
To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.
TEMP
© 2012 IBM Corporation21
Information Management
Fast integer sort and index-over-temp
Two new dimension access methods are implemented to ensure efficient random access of the dimensions by the zigzag join operator.
– An index is created over the TEMP operator (IOT) using dimension join columns. Additional columns may be included in the index as ‘include’ columns
– A fast integer sort (FIS) data structure is built using the join key from the dimension. This method has an extension to allow additional columns if the join key is of type INTEGER.
In order for the optimizer to pick fast integer sort, the dimension must not have a composite key and the joining column must be of type INTEGER or BIGINT.
– If the join column is of type BIGINT, fast integer sort can be used only if no other dimension column is required for subsequent operations.
The operator (input to ZZJOIN(1) operator) shows the following:
IDXOVTMP: (A temporary index will be created and used on this temp)
• TRUE - the scan builds an index over the temporary table for random access.
• FALSE - the scan builds a fast integer sort structure for random access.
– The feedback predicates applicable to that dimension are displayed in the form of start-stop key conditions.
TBSCAN
© 2012 IBM Corporation22
Information Management
Fact table index access strategies
Index scan and data page fetch
Single-probe list-prefetch
All-probes list-prefetch
© 2012 IBM Corporation23
Information Management
Fact table index access
IXSCAN-FETCH plan:
– The index scan accesses the index over the fact table to retrieve RIDs from the fact table matching the input probe values.
– These fact table RIDs are then used to fetch the necessary fact table data.
Any access onD2
Any access onD1
ZZJOIN
FETCH
IXSCAN FACT
© 2012 IBM Corporation24
Information Management
Fact table access using single-probe list-prefetch plan
The list prefetch plan executes for every probe row from the combination of dimension tables/snowflakes.
The index scan over the fact table finds fact table RIDs matching the input probe values.
The SORT, RIDSCAN and FETCH operators sort RIDs according to data page ids and start off list prefetchers to get the fact table data.
ZZJOIN
FETCHAny access onD1
Any access onD2
RIDSCAN
SORT
IXSCAN
FACT
© 2012 IBM Corporation25
Information Management
Fact table access using all-probes list-prefetch plan
All matching RIDs from all the probes are sorted together in the order of the fact table data pages and the list prefetchers started to retrieve the necessary fact table data .
The benefit of sorting all the RIDs in this fashion is that it helps achieve better prefetching and can lower the number of physical I/Os.
A back-join with each of the dimension tables is necessary to retrieve the dimension table columns required for subsequent operations
– Dimension columns do not flow through list-prefetch operation
– Back-join represented as a 2nd ZZJOIN operator
ZZJOIN(1)
IXSCAN on FACT
Any access onD1
Any access onD2
SORT
RIDSCN
FETCH
ZZJOIN(2)
© 2012 IBM Corporation26
Information Management
Multi-core Query Parallelism
Also known as ‘intra-partition parallelism’
Supported in DB2 since V5
Query parallelism within a database partition
Parallelism achieved without the use of the database partitioning feature–Does not require any form of data partitioning
Exploits symmetric multi-processor and/or multi-core processors
DB2 10.1:– Extend the existing implementation– Remove scalability bottlenecks
© 2012 IBM Corporation27
Information Management
Multi-core Query Parallelism Use Cases
Large OLTP reporting systems–Reporting jobs can often be a large part of the batch processing–Workloads are normally running on large multi-processor
machines• SMP, with multiple cores, sometimes with hyper-threading
–Improve multi-core query parallelism to reduce the time the reporting jobs take within the batch windows
C-Class warehouse workloads–Targeting warehouse and marts that are up to 4-5 TB–Will be running on x or p servers with anywhere from 8 to 32 cores–Simple setup using ESE (i.e. no database partitioned)–Improve query response through multi-core parallelism
© 2012 IBM Corporation28
Information Management
Current intra-partition parallelism architecture
Combination of data and functional parallelism
Data parallelism–Dynamically partition data
• Assign partition to query task• Easier to load balance• User not required to partition data
e.g. range, hash, etc– Data dynamically assigned to query tasks
• Assign range of pages or rows
(Range is a fixed size prior to DB2 10.1)
Assign new range when range is consumed• Provides dynamic load balancing• Support table and index scans
© 2012 IBM Corporation
Information Management
Pages 2-3
Pages 0-1
Pages 4-5
Pages 6-7
Pages 8-9
etc...
Subagent 1
Subagent 2
Subagent 3
Subagent 4
Subagent 3
Subagent 2
Degree=4
Dynamic data allocation – “straw scans”
© 2012 IBM Corporation30
Information Management
Functional parallelism
Functional parallelism– Divide query task by function– Assign functional task to different execution units– Doesn't require data partitioning– Harder to load balance
• Must ensure execution units are equally busy
DB2 implementation–Single co-ordinator process services application requests–Multiple sub-agent processes return data through local table queue
–Only 1 parallelized functional unit (section)
© 2012 IBM Corporation
Information Management
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN TBSCAN
(3) (6)
| |
SORT SORT
(2) (5)
| |
TBSCAN TBSCAN
(1) (4)
| |
PRODUCT PRODATR
Subagent 1 Subagent 2 Subagent 3 Subagent 4
RETURN
(9)
|
LTQ
(8)
Co-ordinator
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN TBSCAN
(3) (6)
| |
SORT SORT
(2) (5)
| |
TBSCAN TBSCAN
(1) (4)
| |
PRODUCT PRODATR
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN TBSCAN
(3) (6)
| |
SORT SORT
(2) (5)
| |
TBSCAN TBSCAN
(1) (4)
| |
PRODUCT PRODATR
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN TBSCAN
(3) (6)
| |
SORT SORT
(2) (5)
| |
TBSCAN TBSCAN
(1) (4)
| |
PRODUCT PRODATR
Functional parallelism
•Query contains only 2 subsections and 1 local table queue
•Runtime operators coordinated using latches, semaphores, shared memory controls blocks
© 2012 IBM Corporation
Information Management
select p.name, p.prod_id, pa.attribute from product p, prodatr pa where p.prod_id = pa.prod_id;
Parallel table scans ("straw" scans)
Hash partitioned sorts on prod_id
one partition per agent
Each agent scans a sort partition
Join processed in parallel by each agent by joining corresponding partitions
Results returned via shared memory table queue to co-ordinator agent
LTQ
(8)
|
MSJOIN
(7)
/----+----\
TBSCAN TBSCAN
(3) (6)
| |
SORT SORT
(2) (5)
| |
TBSCAN TBSCAN
(1) (4)
| |
PRODUCT PRODATR
Intra-partition parallelism example
© 2012 IBM Corporation33
Information Management
Intra-partition parallelism architecture
Single query involves– 1 coordinating agent– n sub agents– m prefetchers (shared)– All executing in parallel on available
processors
Combination of...– Data parallelism
• Each agent works on subset of data• Data dynamically assigned so user
not required to partition data– Functional parallelism
• Each agent works on different query function, e.g. scan, sort
User can control "degree" of parallelism
Also benefits I/O bound uniprocessors
SQL Query
Query Optimizer
Best Query Plan
Threaded Code
Compile -Time
Run -Time
AgentAgent
Agent Prefetchers
© 2012 IBM Corporation34
Information Management
DB2 10.1 Multi-core query parallelism
Improved scalability
–Within the current architecture
–Scale near-linearly to degree 32
–Achieved by:
1.Improved load balance
New rebalance (REBAL) access plan operator
2.More efficient parallelization techniques
Move LTQ ‘higher’ in the access plan
3.Reduce latch contention
© 2012 IBM Corporation35
Information Management
Improved scalability
6.77122e+06 NLJOIN( 6)713706
63 /---------+----------\
292.2 23173.3 REBAL FETCH
( 7) ( 9)325.265 2456.85
11 2 | /---+----\
292.2 23173.3 6.77122e+07 TBSCAN IXSCAN TABLE: DB2USER ( 8) ( 10) DAILY_SALES325.265 1605.23 Q1
11 1 | |
2922 6.77122e+07 TABLE: DB2USER INDEX: SYSIBM
PERIOD SQL091218161022180Q2 Q1
•Load imbalance results in poor scalability
•REBAL redistributes rows to ensure all subagents do equal work
•Optimizer performs load balance analysis to determine REBAL placement
Before
After
degree
degree
Multi-core Query Parallelism
© 2012 IBM Corporation36
Information Management
Improved scalability
More efficient parallelization techniques–Partial-final UNIQUE–GRPBY on unique key
• Can perform complete GRPBY without a partitioned SORT–Improved access plan parallelization transformation costing–Improved exploitation of stream partitioning
• Avoid partitioned SORT
Reduce latch contention–Dynamic straw scan unit (straw “gulp” size)–Improved NLJOIN inner access–Improved HSJOIN–Improved partitioned SORT–Prefetcher queues–Various others
© 2012 IBM Corporation37
Information Management
DB2 10.1 Multi-core query parallelism externals
Support mixed workloads
–Parallelize report queries in an OLTP system
–Reduce parallel ‘infrastructure’ overhead on OLTP queries
• Pre DB2 10.1 there is a 10-15% impact just by setting INTRA_PARALLEL=ON
In ESE only. DPF unconditionally enables parallel infrastructure
• DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload
–Improved automatic degree determination
• degree=ANY
• Avoid parallelizing queries that won’t benefit
• Improved automatic runtime degree reduction
© 2012 IBM Corporation38
Information Management
Controlling query parallelism
WLM workload control:– An OLTP workload that doesn’t use parallelism
• =1 INTRA_PARALLEL=NOCREATE WORKLOAD banking_wl APPLNAME (‘banking’) MAXIMUM DEGREE 1;
– A BI workload using parallelism • >1 INTRA_PARALLEL=YES• Also specifies the degree upper limit• The application specifies the requested degree using existing external
controlsCREATE WORKLOAD report_wl APPLNAME (‘cognos’) MAXIMUM DEGREE 8;ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;
Application control:CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(‘YES’)
Toggles intra-partition parallelism at transaction boundaries– Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors
© 2012 IBM Corporation39
Information Management
Pre-DB2 10.1 intra-partition parallelism external controls
CLP command, the degree of parallelism allowed at runtime for any access plans (dynamic or static SQL)
Application1~32,767SET RUNTIME DEGREE command
DB2 bind option, the degree of parallelism considered by the SQL compiler for static SQL access plans
PackageDFT_DEGREEANY,
1~32,767
Bind DEGREE
Special register, the degree of parallelism considered by the SQL compiler for dynamic SQL access plans
ApplicationDFT_DEGREEANY,
1~32,767
CURRENT DEGREE
DB configuration,
Initial value for CURRENT DEGREE special register or package bind DEGREE option
Database1ANY,
1~32,767
DFT_DEGREE
DBM configuration,
Valid only if INTRA_PARALLEL is ON
InstanceANYANY,
1~32,767
MAX_QUERYDEGREE
DBM configurationInstanceNONO,YESINTRA_PARALLEL
CommentScopeDefaultValueParameter
© 2012 IBM Corporation40
Information Management
Appendix
Additional material
© 2012 IBM Corporation41
Information Management
Star schemas
Dimension tables–Contain descriptive information to augment fact rows–Used to filter fact rows–Query results are aggregated on dimension attributes–Contains a primary key
•possibly multiple columns•generated, meaningless numeric value
–Typically contains much fewer rows than the fact table–May be represented as a hierarchy of tables or a ‘snowflake’
•e.g. product is further normalized to product, brand and category•but this requires extra joins
Product
Brand
Category
© 2012 IBM Corporation42
Information Management
Star schemas
Fact table– Contains numeric measures of business information– Queries perform computation (sum, avg, etc.) on measures– Contains primary key columns from each dimension
• Represent foreign keys referencing each parent dimension• Can have explicit referential integrity, but not necessary for DB2
– May have a primary key • Composite of the foreign keys or• Single, generated, meaningless numeric value
– Number of rows depends on fact granularity• hourly, daily, etc.• finer granularity -> more rows• coarser granularity -> limits drill down ability
– Typically, local predicates aren’t applied directly
© 2012 IBM Corporation43
Information Management
Star schemas
Data Marts–Can contain multiple fact tables–Each fact usually denotes a separate star–Dimensions can be shared across stars
• e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions
–Queries may join multiple fact tables
© 2012 IBM Corporation44
Information Management
ZZJOIN(1) operator An n-ary join method that joins together the dimension table/snowflakes and the fact table.
Drives the process of forming probe rows from dimension tables/snowflakes,
– Probes the fact table to find matching fact table rows
– Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes.
Feedback predicates identified in explain information
– New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK
– Displayed in the ZZJOIN operator details in db2exfmt
Predicates:----------2) Feedback Predicate used in Join,
Comparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.25
Predicate Text:--------------(Q3.D2FK = Q1.D2PK)
3) Feedback Predicate used in Join, Comparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.25
Predicate Text:--------------(Q3.D1FK = Q2.D1PK)
© 2012 IBM Corporation45
Information Management
ZZJOIN (2) operator
Only required for all-probes list-prefetch.
Uses the join columns to locate the matching row in the temporary table so that the required non-join columns from the dimension table can be retrieved.
Makes use of the efficient random access method such as FIS or IOT to retrieve the dimension table columns required for subsequent operations.
– Also known as ‘backjoin’
Indicated in explain by BACKJOIN argument of ZZJOIN operator
© 2012 IBM Corporation46
Information Management
Star schema plans in DB2 pre-10.1
None.Most likely plan is to:• Join the most filtering dimension
with the fact table first.• Join in rest of the dimensions
using a suitable join method suchas hash join.
Other plans are possible.
Regular (2-way) join
Indexes on the fact table on each of the columns that joins with the dimensions (typically, the foreign keys)
• Pre-filtering of the fact table bydimensions (semi-joins).
• Index ANDing the results of thedimension filtering.
• Completing the dimension join.
Semi-join with index ANDing
Multi-column index on the fact table on columns that join with the dimensions.
• Cartesian product of dimensions.• Each row in Cartesian product
probes the multi-column fact tableindex.
Hub join
Pre-requisiteHow does the plan work?Type of plan