+ All Categories
Home > Documents > DB2 LUW 10 Star Schema and MCP

DB2 LUW 10 Star Schema and MCP

Date post: 18-Apr-2015
Category:
Upload: prakash6849
View: 109 times
Download: 7 times
Share this document with a friend
46
© 2012 IBM Corporation Information Management Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query Parallelism John Hornibrook IBM Canada
Transcript
Page 1: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation

Information Management

Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query ParallelismJohn Hornibrook IBM Canada

Page 2: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation2

Information Management

2

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

Page 3: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation3

Information Management

Agenda

New DB2 10.1 features

Star schema query optimization–Zig-zag join

Multi-core query parallelism–Intra-partition query parallelism–Existing functionality, significantly improved

Page 4: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation4

Information Management

Star Schema Query Optimization

Provides improved performance for ‘star schema’ queries

Star schemas are typically found in data marts or some data warehouses

Introduces new star schema join method: (zig-zag join)–Complimentary to existing star schema join methods

Improves existing star schema detection algorithms–Supports wider range of queries

Page 5: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation5

Information Management

custkey

name

address

promokey

promotype

promodesc

perkey

year

month

prodkey

category

upc_number

storekey

storenumber

region

ProductCustomer

Promotion

perkey

prodkey

storekey

promokey

custkey

quantity_sold

price

cost

Period

Store

•Logical DB design resembles a star•Central table contains business ‘facts’

•Sales prices, cost, quantities, etc.

•Surrounding tables contain ‘dimensional’ data

•Time, location,

characteristics, etc.•Each dimension is a ‘parent’ of the fact table

•1:N from a dimension to the fact

Daily Sales

Star Schemas

Page 6: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation6

Information Management

Star joins

Queries performed against star schemas

SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST)

FROM PERIOD, DAILY_SALES, PRODUCT, STORE

WHERE

PERIOD.PERKEY=DAILY_SALES.PERKEY AND

PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND

STORE.STOREKEY=DAILY_SALES.STOREKEY AND

CALENDAR_DATE BETWEEN AND

'01/01/2005' AND '04/28/2005' AND

STORE_NUMBER='03' AND

CATEGORY=72

GROUP BY ITEM_DESC

Aggregate on dimension attribute,

sum on fact measures

Join fact to some subset of the dimensions

Join fact foreign keys todimension primary keys

Constrain on dimension attributes

Page 7: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation7

Information Management

Star join dilemma

No single dimension may filter the fact table well

But a combination of dimensions may filter well– 840,000 category 72 products sold during Jan. to April 2005– 53,000 category 72 products sold during Jan. to April 2005 in store #3

How do we filter with a combination of dimensions?

Daily Sales

750M rowsProduct

Period

Store

50M

20M

30M

CALENDAR_DATE BETWEEN

'01/01/2005' AND '04/28/2005'

' STORE_NUMBER='03'

CATEGORY=72

Page 8: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation8

Information Management

Star join solutions

Specialized join methods (pre-DB2 10.1):–Semi-join with index ANDing

• Use combinations of fact table indexes to avoid accessing data pages

–Hub join• Use Cartesian product of dimension rows to provide better

fact keys–Query must meet star join criteria–Both methods can be built and competed–Regular join plans are still built and competed with either

method• Costing decides• Specialized methods aren’t always the winners

Page 9: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation9

Information Management

Produce filtered fact table (Daily_Sales) with foreign key indices

ƒExecute "semi-join" with each dimension that filters the fact table

ƒ"AND" RID-maps from each semi-join with next semi-join

ƒRetrieve fact table columns via RIDsFETCH

Daily Sales

rid bitmap –> each semi-join eliminates bits ->

101101101001111011011001

semi-join semi-join semi-join

ProductDaily Sales Daily Sales Daily Sales

Period Store

100100101001001010011001 000100001000000010001000

NLJOIN NLJOINNLJOIN

Semi-join index ANDing star join

Page 10: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation10

Information Management

Hub star join

Form a Cartesian join of filtering dimensions– Cartesian join -> no join predicates– Cartesian join result should be small to be effective

Join the Cartesian join result to the fact table using a multi-column fact table index.

Product

Daily Sales

Store

PeriodPRODKEY1020

STOREKEY3040

PRODKEY STOREKEY10 3020 3010 4020 40

PERIODKEY50

PRODKEY STOREKEY PERIODKEY10 30 5020 30 5010 40 5020 40 50

Probe fact table with multi-column index on:

PRODKEY,STOREKEY,PERIODKEY

NLJOIN

NLJOIN

NLJOIN

Page 11: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation11

Information Management

Hub star join

Works well if Cartesian result is small

Cartesian may contain many key combinations that don’t exist in the fact table– Results in unnecessary fact table index probes.

PRODKEY STOREKEY PERIODKEY10 30 5020 30 5010 40 5020 40 5010 30 6020 30 6010 40 6020 40 60 10 30 7020 30 7010 40 7020 40 70

Daily Sales

NLJOIN

Page 12: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation12

Information Management

Introduces a new zigzag join method that builds upon the zigzag join technology available in Redbrick that has proven unique performance advantagein the industry.

Provides consistent performance for warehouse queries.

Adds a new star detection method that is more reliable.

Supports star schema queries in single and multiple subject areas with snowflakes.

Exploits indexes even when there is a gap in probing key, reducing the number of indexes that need to be created.

Works seamlessly for range partitioned tables and in serial, SMP and DPFenvironments.

Can use MDC block indexes on the fact table for enabling zigzag join.

Recommends multi-column indexes to enable zigzag join through explain diagnostics and index advisor in Optim Query Tuner (OQT)

DB2 10.1 Star Schema Highlights

Page 13: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation13

Information Management

DB2 (pre-10.1) recognizes a star

–By analysis of sizes of tables and join predicates.

–A star is detected after application of local filtering and snowflake joins.

The New Star Detection in DB2 10.1:

–Only requirement: joining dimension column(s) must be unique

–Detects multiple stars per query block

–Allows a star to be detected with fewer restrictions

–Much more reliable

–The new star detection method also enables pre-DB2 10.1 star schema plans.

–Pre-DB2 10.1 detection is invoked if the new star detection fails to detect any star.

Enhancing the star detection in DB2 pre-10.1

New

Page 14: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation14

Information Management

Comparison of old and new star detection methods:

Star can be formed in the query block in the presence of these features and may includethe feature in the star

Star can not be formed in the query block in the presence of this SQL feature.

Non-deterministic or side-effect predicates

5

UnlimitedOneNumber of fact tables allowed4

Can be included in the star.

Excluded from the star.

Derived (non-base) tables10

Simple XML predicates9

Correlation among tables in a snowflake

8

Sub-query predicates7

Non-equijoin predicates6

Used by the Zigzag join plan, if available.

Used by the Cartesian Hub plan, if available.

Multi-column index on fact table3

Necessary to form a star.Necessary to form a star.

Minimum of two equijoin predicates

2

Necessary to form a star.Necessary to form a star.

Minimum of three base tables1

DB2 10.1Before DB2 10.1Requirement/RestrictionNo.

Page 15: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation15

Information Management

NewThe new zigzag join method for star schema based queriesHow does it work?

–First forms the virtual Cartesian product of dimensions.

–Avoids most non-productive probes from the Cartesian product into the fact table.

–Fact table index provides feedback to dimensions.

–zigzags through the dimensions and the fact table.

Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.

Page 16: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation16

Information Management

Using a multi-column index in a zigzag join

Pre-requisite – Columns that participate in the join are included in the index – Index columns from at least two dimension tables are completely covered by join

predicates

Consider this star schema based query:– D1 has primary key A– D2 has a composite primary key (B,C)– D3 has primary key D– These PK columns are used in

equi-join operations with the fact table

FactD1(A)

A D2(B,C)

B,C

D3(D)

D

The columns B and C in the composite index are not in contiguous positions in the index.

The index does not completely cover the dimension D2.

The index completely covers three dimensions.

The index completely covers two dimensions.

Why?

NONOYESYESQualified?

(B,A,C)(A,B), (C,D)(A,B,C,D), (A,C,B,D)

(A,D), (A,B,C),(B,C,D),(C,B,D)

Fact table index definition

Page 17: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation17

Information Management

Zigzag join with index key gap processing Gap processing allows a single multi-column index to be used for a bigger set of

queries.

Greatly reduces the number of fact table indexes

E.g., a fact table index on (A, C, B) allows zigzag join when there is no join on C

Gap processing is implemented using new jump scan technology

Explain facility indicates when gap processing is used– New JUMPSCAN argument on IXSCAN operator– Gap columns identified

FactD1(A)

A D2(B)

B

Gap Info: Status--------------------- ---------------------Index Column 0: No Gap Index Column 1: Positioning Gap Index Column 2: No Gap

Arguments:--------------JUMPSCAN: (JumpScan Plan)

TRUE

Page 18: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation18

Information Management

Multi-column index recommendations

New explain diagnostic message recommending multi-column fact table indexes

The optimizer performs analysis of primary/unique keys and equi-join predicates in the query that and detects that:

– the query is based on a star schema and – a multi-column index does not exist or a different multi-column index might provide better

performance

Extended Diagnostic Information:------------------------------------------------Diagnostic Identifier: 1Diagnostic Details: EXP0256I Analysis of the query shows that the

query might execute faster if an additional indexwas created. Schema name: "STAR". Table name:"FACT". Column list: "(F3, F2, F1, F0)".

Optim Query Tuner provides a workload based index advisor that uses the above feature to determine a consolidated set of index recommendations.

Page 19: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation19

Information Management

ZZJOIN(1)

TEMP

TBSCAN

plan for

snowflake 1

TEMP

TBSCAN

plan for

snowflake 2

access plan

for fact table

SORT

FETCH

RIDSCAN

ZZJOIN(2)

Snowflake plans could either be:1) Access of a single table or2) Joins of multiple tables

Could be one of the following:1) Index scan2) Single-probe list-prefetch3) All-probes list-prefetch

Builds either:1) Index over temp or2) Fast integer sort

Scans either:1) Index over temp

or2) Fast integer sort

array

Performs the zigzag join operation1) Last leg is the fact table2) Preceding legs are dimensions

Performs data prefetch of the fact tablefor an all-probes List-Prefetch.

Understanding ZZJOIN plan components

Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.

Page 20: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation20

Information Management

Accessing a dimension in a zigzag join plan

A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.

ZZJOIN(1)

TEMP

TBSCAN

plan for

snowflake 1

TEMP

TBSCAN

plan for

snowflake 2

access plan

for fact table

The operator shows the following information (new operator argument):

RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).

To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.

TEMP

Page 21: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation21

Information Management

Fast integer sort and index-over-temp

Two new dimension access methods are implemented to ensure efficient random access of the dimensions by the zigzag join operator.

– An index is created over the TEMP operator (IOT) using dimension join columns. Additional columns may be included in the index as ‘include’ columns

– A fast integer sort (FIS) data structure is built using the join key from the dimension. This method has an extension to allow additional columns if the join key is of type INTEGER.

In order for the optimizer to pick fast integer sort, the dimension must not have a composite key and the joining column must be of type INTEGER or BIGINT.

– If the join column is of type BIGINT, fast integer sort can be used only if no other dimension column is required for subsequent operations.

The operator (input to ZZJOIN(1) operator) shows the following:

IDXOVTMP: (A temporary index will be created and used on this temp)

• TRUE - the scan builds an index over the temporary table for random access.

• FALSE - the scan builds a fast integer sort structure for random access.

– The feedback predicates applicable to that dimension are displayed in the form of start-stop key conditions.

TBSCAN

Page 22: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation22

Information Management

Fact table index access strategies

Index scan and data page fetch

Single-probe list-prefetch

All-probes list-prefetch

Page 23: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation23

Information Management

Fact table index access

IXSCAN-FETCH plan:

– The index scan accesses the index over the fact table to retrieve RIDs from the fact table matching the input probe values.

– These fact table RIDs are then used to fetch the necessary fact table data.

Any access onD2

Any access onD1

ZZJOIN

FETCH

IXSCAN FACT

Page 24: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation24

Information Management

Fact table access using single-probe list-prefetch plan

The list prefetch plan executes for every probe row from the combination of dimension tables/snowflakes.

The index scan over the fact table finds fact table RIDs matching the input probe values.

The SORT, RIDSCAN and FETCH operators sort RIDs according to data page ids and start off list prefetchers to get the fact table data.

ZZJOIN

FETCHAny access onD1

Any access onD2

RIDSCAN

SORT

IXSCAN

FACT

Page 25: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation25

Information Management

Fact table access using all-probes list-prefetch plan

All matching RIDs from all the probes are sorted together in the order of the fact table data pages and the list prefetchers started to retrieve the necessary fact table data .

The benefit of sorting all the RIDs in this fashion is that it helps achieve better prefetching and can lower the number of physical I/Os.

A back-join with each of the dimension tables is necessary to retrieve the dimension table columns required for subsequent operations

– Dimension columns do not flow through list-prefetch operation

– Back-join represented as a 2nd ZZJOIN operator

ZZJOIN(1)

IXSCAN on FACT

Any access onD1

Any access onD2

SORT

RIDSCN

FETCH

ZZJOIN(2)

Page 26: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation26

Information Management

Multi-core Query Parallelism

Also known as ‘intra-partition parallelism’

Supported in DB2 since V5

Query parallelism within a database partition

Parallelism achieved without the use of the database partitioning feature–Does not require any form of data partitioning

Exploits symmetric multi-processor and/or multi-core processors

DB2 10.1:– Extend the existing implementation– Remove scalability bottlenecks

Page 27: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation27

Information Management

Multi-core Query Parallelism Use Cases

Large OLTP reporting systems–Reporting jobs can often be a large part of the batch processing–Workloads are normally running on large multi-processor

machines• SMP, with multiple cores, sometimes with hyper-threading

–Improve multi-core query parallelism to reduce the time the reporting jobs take within the batch windows

C-Class warehouse workloads–Targeting warehouse and marts that are up to 4-5 TB–Will be running on x or p servers with anywhere from 8 to 32 cores–Simple setup using ESE (i.e. no database partitioned)–Improve query response through multi-core parallelism

Page 28: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation28

Information Management

Current intra-partition parallelism architecture

Combination of data and functional parallelism

Data parallelism–Dynamically partition data

• Assign partition to query task• Easier to load balance• User not required to partition data

e.g. range, hash, etc– Data dynamically assigned to query tasks

• Assign range of pages or rows

(Range is a fixed size prior to DB2 10.1)

Assign new range when range is consumed• Provides dynamic load balancing• Support table and index scans

Page 29: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation

Information Management

Pages 2-3

Pages 0-1

Pages 4-5

Pages 6-7

Pages 8-9

etc...

Subagent 1

Subagent 2

Subagent 3

Subagent 4

Subagent 3

Subagent 2

Degree=4

Dynamic data allocation – “straw scans”

Page 30: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation30

Information Management

Functional parallelism

Functional parallelism– Divide query task by function– Assign functional task to different execution units– Doesn't require data partitioning– Harder to load balance

• Must ensure execution units are equally busy

DB2 implementation–Single co-ordinator process services application requests–Multiple sub-agent processes return data through local table queue

–Only 1 parallelized functional unit (section)

Page 31: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation

Information Management

LTQ

(8)

|

MSJOIN

(7)

/----+----\

TBSCAN TBSCAN

(3) (6)

| |

SORT SORT

(2) (5)

| |

TBSCAN TBSCAN

(1) (4)

| |

PRODUCT PRODATR

Subagent 1 Subagent 2 Subagent 3 Subagent 4

RETURN

(9)

|

LTQ

(8)

Co-ordinator

LTQ

(8)

|

MSJOIN

(7)

/----+----\

TBSCAN TBSCAN

(3) (6)

| |

SORT SORT

(2) (5)

| |

TBSCAN TBSCAN

(1) (4)

| |

PRODUCT PRODATR

LTQ

(8)

|

MSJOIN

(7)

/----+----\

TBSCAN TBSCAN

(3) (6)

| |

SORT SORT

(2) (5)

| |

TBSCAN TBSCAN

(1) (4)

| |

PRODUCT PRODATR

LTQ

(8)

|

MSJOIN

(7)

/----+----\

TBSCAN TBSCAN

(3) (6)

| |

SORT SORT

(2) (5)

| |

TBSCAN TBSCAN

(1) (4)

| |

PRODUCT PRODATR

Functional parallelism

•Query contains only 2 subsections and 1 local table queue

•Runtime operators coordinated using latches, semaphores, shared memory controls blocks

Page 32: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation

Information Management

select p.name, p.prod_id, pa.attribute from product p, prodatr pa where p.prod_id = pa.prod_id;

Parallel table scans ("straw" scans)

Hash partitioned sorts on prod_id

one partition per agent

Each agent scans a sort partition

Join processed in parallel by each agent by joining corresponding partitions

Results returned via shared memory table queue to co-ordinator agent

LTQ

(8)

|

MSJOIN

(7)

/----+----\

TBSCAN TBSCAN

(3) (6)

| |

SORT SORT

(2) (5)

| |

TBSCAN TBSCAN

(1) (4)

| |

PRODUCT PRODATR

Intra-partition parallelism example

Page 33: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation33

Information Management

Intra-partition parallelism architecture

Single query involves– 1 coordinating agent– n sub agents– m prefetchers (shared)– All executing in parallel on available

processors

Combination of...– Data parallelism

• Each agent works on subset of data• Data dynamically assigned so user

not required to partition data– Functional parallelism

• Each agent works on different query function, e.g. scan, sort

User can control "degree" of parallelism

Also benefits I/O bound uniprocessors

SQL Query

Query Optimizer

Best Query Plan

Threaded Code

Compile -Time

Run -Time

AgentAgent

Agent Prefetchers

Page 34: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation34

Information Management

DB2 10.1 Multi-core query parallelism

Improved scalability

–Within the current architecture

–Scale near-linearly to degree 32

–Achieved by:

1.Improved load balance

New rebalance (REBAL) access plan operator

2.More efficient parallelization techniques

Move LTQ ‘higher’ in the access plan

3.Reduce latch contention

Page 35: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation35

Information Management

Improved scalability

6.77122e+06 NLJOIN( 6)713706

63 /---------+----------\

292.2 23173.3 REBAL FETCH

( 7) ( 9)325.265 2456.85

11 2 | /---+----\

292.2 23173.3 6.77122e+07 TBSCAN IXSCAN TABLE: DB2USER ( 8) ( 10) DAILY_SALES325.265 1605.23 Q1

11 1 | |

2922 6.77122e+07 TABLE: DB2USER INDEX: SYSIBM

PERIOD SQL091218161022180Q2 Q1

•Load imbalance results in poor scalability

•REBAL redistributes rows to ensure all subagents do equal work

•Optimizer performs load balance analysis to determine REBAL placement

Before

After

degree

degree

Multi-core Query Parallelism

Page 36: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation36

Information Management

Improved scalability

More efficient parallelization techniques–Partial-final UNIQUE–GRPBY on unique key

• Can perform complete GRPBY without a partitioned SORT–Improved access plan parallelization transformation costing–Improved exploitation of stream partitioning

• Avoid partitioned SORT

Reduce latch contention–Dynamic straw scan unit (straw “gulp” size)–Improved NLJOIN inner access–Improved HSJOIN–Improved partitioned SORT–Prefetcher queues–Various others

Page 37: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation37

Information Management

DB2 10.1 Multi-core query parallelism externals

Support mixed workloads

–Parallelize report queries in an OLTP system

–Reduce parallel ‘infrastructure’ overhead on OLTP queries

• Pre DB2 10.1 there is a 10-15% impact just by setting INTRA_PARALLEL=ON

In ESE only. DPF unconditionally enables parallel infrastructure

• DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload

–Improved automatic degree determination

• degree=ANY

• Avoid parallelizing queries that won’t benefit

• Improved automatic runtime degree reduction

Page 38: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation38

Information Management

Controlling query parallelism

WLM workload control:– An OLTP workload that doesn’t use parallelism

• =1 INTRA_PARALLEL=NOCREATE WORKLOAD banking_wl APPLNAME (‘banking’) MAXIMUM DEGREE 1;

– A BI workload using parallelism • >1 INTRA_PARALLEL=YES• Also specifies the degree upper limit• The application specifies the requested degree using existing external

controlsCREATE WORKLOAD report_wl APPLNAME (‘cognos’) MAXIMUM DEGREE 8;ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;

Application control:CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(‘YES’)

Toggles intra-partition parallelism at transaction boundaries– Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors

Page 39: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation39

Information Management

Pre-DB2 10.1 intra-partition parallelism external controls

CLP command, the degree of parallelism allowed at runtime for any access plans (dynamic or static SQL)

Application1~32,767SET RUNTIME DEGREE command

DB2 bind option, the degree of parallelism considered by the SQL compiler for static SQL access plans

PackageDFT_DEGREEANY,

1~32,767

Bind DEGREE

Special register, the degree of parallelism considered by the SQL compiler for dynamic SQL access plans

ApplicationDFT_DEGREEANY,

1~32,767

CURRENT DEGREE

DB configuration,

Initial value for CURRENT DEGREE special register or package bind DEGREE option

Database1ANY,

1~32,767

DFT_DEGREE

DBM configuration,

Valid only if INTRA_PARALLEL is ON

InstanceANYANY,

1~32,767

MAX_QUERYDEGREE

DBM configurationInstanceNONO,YESINTRA_PARALLEL

CommentScopeDefaultValueParameter

Page 40: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation40

Information Management

Appendix

Additional material

Page 41: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation41

Information Management

Star schemas

Dimension tables–Contain descriptive information to augment fact rows–Used to filter fact rows–Query results are aggregated on dimension attributes–Contains a primary key

•possibly multiple columns•generated, meaningless numeric value

–Typically contains much fewer rows than the fact table–May be represented as a hierarchy of tables or a ‘snowflake’

•e.g. product is further normalized to product, brand and category•but this requires extra joins

Product

Brand

Category

Page 42: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation42

Information Management

Star schemas

Fact table– Contains numeric measures of business information– Queries perform computation (sum, avg, etc.) on measures– Contains primary key columns from each dimension

• Represent foreign keys referencing each parent dimension• Can have explicit referential integrity, but not necessary for DB2

– May have a primary key • Composite of the foreign keys or• Single, generated, meaningless numeric value

– Number of rows depends on fact granularity• hourly, daily, etc.• finer granularity -> more rows• coarser granularity -> limits drill down ability

– Typically, local predicates aren’t applied directly

Page 43: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation43

Information Management

Star schemas

Data Marts–Can contain multiple fact tables–Each fact usually denotes a separate star–Dimensions can be shared across stars

• e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions

–Queries may join multiple fact tables

Page 44: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation44

Information Management

ZZJOIN(1) operator An n-ary join method that joins together the dimension table/snowflakes and the fact table.

Drives the process of forming probe rows from dimension tables/snowflakes,

– Probes the fact table to find matching fact table rows

– Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes.

Feedback predicates identified in explain information

– New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK

– Displayed in the ZZJOIN operator details in db2exfmt

Predicates:----------2) Feedback Predicate used in Join,

Comparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.25

Predicate Text:--------------(Q3.D2FK = Q1.D2PK)

3) Feedback Predicate used in Join, Comparison Operator: Equal (=)Subquery Input Required: NoFilter Factor: 0.25

Predicate Text:--------------(Q3.D1FK = Q2.D1PK)

Page 45: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation45

Information Management

ZZJOIN (2) operator

Only required for all-probes list-prefetch.

Uses the join columns to locate the matching row in the temporary table so that the required non-join columns from the dimension table can be retrieved.

Makes use of the efficient random access method such as FIS or IOT to retrieve the dimension table columns required for subsequent operations.

– Also known as ‘backjoin’

Indicated in explain by BACKJOIN argument of ZZJOIN operator

Page 46: DB2 LUW 10 Star Schema and MCP

© 2012 IBM Corporation46

Information Management

Star schema plans in DB2 pre-10.1

None.Most likely plan is to:• Join the most filtering dimension

with the fact table first.• Join in rest of the dimensions

using a suitable join method suchas hash join.

Other plans are possible.

Regular (2-way) join

Indexes on the fact table on each of the columns that joins with the dimensions (typically, the foreign keys)

• Pre-filtering of the fact table bydimensions (semi-joins).

• Index ANDing the results of thedimension filtering.

• Completing the dimension join.

Semi-join with index ANDing

Multi-column index on the fact table on columns that join with the dimensions.

• Cartesian product of dimensions.• Each row in Cartesian product

probes the multi-column fact tableindex.

Hub join

Pre-requisiteHow does the plan work?Type of plan


Recommended